@tarkowski@101010.pl
@tarkowski@101010.pl avatar

tarkowski

@tarkowski@101010.pl

digital sociologist / strategist / activist. Open Future (EU), Communia (EU), Creative Commons (World), Centrum Cyfrowe (PL)

This profile is from a federated server and may be incomplete. Browse more on the original instance.

tarkowski, to random
@tarkowski@101010.pl avatar

Latest beta of WhatsApp shows that Meta is preparing the messaging app to offer cross-platform messaging.

In other words, #interoperability

Over the last year, the discussion of #DMA rules for inteoperability largely felt like doubtful whether this would really happen. And the conversation focused on potential challenges to make messaging that's both interoperable and secure.

It's good to see that - if all goes well - interoperable messaging will become a reality.

And even more importantly, that European regulation works and does shape markets.

I'm looking forward to January 2024 and cross-platform chats.

https://www.theverge.com/2023/9/10/23866912/whatsapp-cross-platform-messaging-eu-dma-meta

tarkowski, to random
@tarkowski@101010.pl avatar

A new French national strategy for AI development bets on #opensourceAI and “national champions” like recently founded Mistral.AI - good coverage in Politico.

https://www.politico.eu/article/open-source-artificial-intelligence-france-bets-big/

The piece also highlights the ongoing #AIAct policy debate on regulating foundation models, including ones that are open-source (something that we have been working on at @openfuture . Depending on how the regulation will ultimately look, it will either create a supportive policy environment - so that open-source becomes Europe’s preferred approach to AI development; or it will stifle open-source development.

tarkowski, to ai
@tarkowski@101010.pl avatar

This piece by Chris Stokel-Walker in the Guardian gives a good overview of the environmental footprint of #AI - and argues that it’s just as bad as that of #crypto / #web3.

“Technology analysts Gartner believe that by 2025, unless a radical rethink takes place in how we develop AI systems to better account for their environmental impact, the energy consumption of AI tools will be greater than that of the entire human workforce”.

So it’s interesting that this gets so much less attention than the environmental impact of Web3.

Side note: Chris highlights the work done by Sasha Luccioni at Hugging Face, calling the company the “de facto conscience of the AI industry”, which is correct.

https://www.theguardian.com/technology/2023/aug/01/techscape-environment-cost-ai-artificial-intelligence

tarkowski, to opensource
@tarkowski@101010.pl avatar

I've just discovered an opinion by @ed on the Open Source Initiative blog, highlighting how Llama 2 is not #opensource. (I wish I spotted it earlier, when writing my piece on the issue).

The voice of OSI obviously counts, as the organization stewards the open source definition.

The piece states that OSI asked Meta to fix its description of the Llama 2 release, and avoid #openwashing. After a month, it does not seem like it did much. Meta still presents itself as a champion of open AI, thinning down the standard in the process.

https://blog.opensource.org/metas-llama-2-license-is-not-open-source/

tarkowski, to random
@tarkowski@101010.pl avatar

In a month (7-8 December) I will be speaking at a conference on data governance and AI, organized in Washington, DC by the Digital Trade and Data Governance Hub. I am excited about this for two reasons:

first of all, we need to connect the policy debates on data governance and AI governance. The space of AI development offers new opportunities to develop, at scale, commons-based approaches that have been much theorized and advocated for, but not yet implemented.

and secondly, I am a deep believer in dialogue between the US and the EU. US is leading in terms of AI development itself, while EU will most probably be the first country to innovate in terms of AI regulation.

Please consider joining, either in-person or remotely (it's a hybrid event).

https://www.linkedin.com/events/datagovernanceintheageofgenerat7127306901125521408/comments/

tarkowski, to random
@tarkowski@101010.pl avatar

In a new opinion for @communia , @paulk , @senficon and Teresa Nobre write about a latest CJEU case that could establish new legal rules for sampling - based on a 30 year-long case that concerns two seconds of a Kraftwerk song.

Sampling cases are the most fun! it never ceases to amaze me that guardrails for creativity are defined by lawyers obsessing over the characteristics of miniscule bits of music.

https://communia-association.org/2023/09/20/do-90s-rappers-dream-of-electric-pastiche/

tarkowski, to ai
@tarkowski@101010.pl avatar

I've finally read the super-long piece on Wikipedia and #AI, by Jon Gertner in the NYT.

It's a cool text, that offers a good overview of issues related both to #generativeAI and #LLMs, and of how #Wikimedia has been developing.

And it shows a blind spot in how Wikimedian's have been thinking about interacting with AI (something I tackled in my two pieces on the topic on the Open Future blog).

A lot of attention is given to the issue of synthetic generation of Wikimedia content - rightly so, as it is a possible vector for low quality content or disinformation.

And then the piece shows well the wrok done this year to better connect Wikimedia with chatbots. In practice, this means connecting with ChatGPT, and WMF has done impressive, agile work by building a ChatGPT plugin. Which demonstrates how Wikimedia content can improve the quality of chatbot answers.

What worries me is that in this way Wikimedia is focusing on improving the ChatGPT stack. I understand that it's identified as a major source of future traffic for Wikimedia.

The piece shows well how Wikimedia has been in symbiosis with Google Search and other search engines. Feels like WMF now wants to build a similar setup with commercial chatbots.

And this is the blind spot, and the third thing Wikimedians should pay more attention to: building it's own AI stack, the #WikiAI.

There's increasing talk of a "public option AI". We need to also be talking about "civic option AI" - and Wikimedia comes to mind, as the flagship for the knowledge commons - as a prime candidate to work on this.

It might sound daunting, but there's an ecosystem of open-source AI development that's building tools that Wikimedia could use.

The piece mentions work on open-source LLMs that could be deployed to help Wikimedia editors. I think that Wikimedia should also offer this as a user-facing service.

(@nickmvincent , @selenamarie I’m curious what’s your take on this?)

https://www.nytimes.com/2023/07/18/magazine/wikipedia-ai-chatgpt.html

tarkowski, to ai
@tarkowski@101010.pl avatar

Next week, @opensource is running a series of webinars on open source #AI / #ML. Together with @zwarso we will be kicking off the series with a talk on the importance of data governance, and treating datasets as commons. #aicommons
https://opensource.org/events/deep-dive-ai-webinar-series-2023/

tarkowski, to mastodon
@tarkowski@101010.pl avatar

Upgrade Democracy has published a short, and very good report, on #Mastodon. More precisely, on what needs to be done to make a decentralised space truly democratic.

Short answer is, move beyond a “democracy of the admins”.

The authors (who include @matthiaskettemann and others who don’t seem to be here) provide a nuance look and set of recommendations that range from introducing democratic structures for governing infrastructures, through improved and co-created content moderation mechanisms, to technical empowerment of users.

“Now’s the time to up the fediverse’s game with regard to participation, empowerment and inclusion – we need to seize the moment and turn positive insights into action” - reads the conclusion.

The problem: it’s unclear who is the audience of this. Or rather, are the admins and developers listening? The shift that this report hopes for will not come from outside the #Fediverse.

https://upgradedemocracy.de/en/impulse/decentralization-as-democratization-mastodon-instead-of-platform-power/

tarkowski, to random
@tarkowski@101010.pl avatar

The Chan-Zuckerberg Initiative announced that in order to support non-profit medical research they are building "computing infrastructure" - that is, purchasing over a 1000 state of the art GPUs.

This is super interesting, in an AI-powered world compute is not a commodity, but a currency.

So if a private foundation can do it, why can't governments do the same? Seems that providing public interest compute infrastructure is one of the simpler move that can be made, as the comples governance issues are solved in parallel.

#aicommons #publicai

https://archive.ph/DL0PO

tarkowski, to opensource
@tarkowski@101010.pl avatar

Meta's Llama 2 model is touted as open, but it's more a "mirage of #opensource" - licensing conditions break key tenets of open release.

I've analysed in detail the release model, as it offers useful lessons for open source #AI / #LLM governance.

https://openfuture.eu/blog/the-mirage-of-open-source-ai-analyzing-metas-llama-2-release-strategy/

tarkowski, to ai
@tarkowski@101010.pl avatar

In the UK, NESTA is launching a Civic Observatory, with the goal of “to talk calmly and collaboratively about the potential civic applications of powerful technologies like AI” - in contrast to a “breathless and polarised AI discourse”.

I really like the focus on calmness, it’s something much needed in tech debates, and not often seen. ( @jamestplunkett , who’s leading this, has in recent months done some great writing about technology in a broader social context).

One question remains: will this be only UK focused, or broader? There are good reasons to keep such focus - largely to keep complexity at bay. But the AI debate is also fragmented between regions. There is a strong network of actors and a public debate in the UK that often feels just a bit insular. I hope this observatory will bridge this gap.

https://medium.com/@jamestplunkett/announcing-the-civic-ai-observatory-2c43b21cbf0e

tarkowski, to random
@tarkowski@101010.pl avatar

In this (relatively) recent piece, Venkat Rao compares generative AI systems to the Webb telescope, and argues that AI are not machines that produce something, but rather discover things. And the thing that they discover is information / intelligence that is inherent to data.

The argument - as often with Venkat's writing - often gets quite complicated. But the core argument is worth noting also for much less philosophical discussions about generative AI: that ulimately it's the data, and not the model that are crucial.

In the last months, I've been spending much time thinking about dataset governance and developing a commons-based framework for such governance. So Venkat's piece offers a useful theoretical underpinning, a story explaining why this is important.

There's been a lot of progress in 2023 on AI models, with dev teams playing the game of "who can count more billions of parameters?". It was also a year where there few positive developments in terms of dataset development and governance.

Hopefully, in 2024 this trend will reverse.

Venkat's piece:
https://studio.ribbonfarm.com/p/a-camera-not-an-engine

tarkowski, to ai
@tarkowski@101010.pl avatar

New piece from @halcyene and Michael Birtwistle from Ada Lovelace argues for a more inclusive UK #AI Safety Summit.

https://www.adalovelaceinstitute.org/blog/ai-safety-summit/?cmid=36b02cc7-2de8-4b1a-bde2-3cde2b1b718d

The reason for this, they argue, is that "AI safety" is a very broad category. And since many risks are socio-technical, the governance debate needs to include the society, especially those affected by risk. "Nothing about us without us".

It's interesting to observe how UK-based civic actors are attempting to pry open a policy platform that currently is designed as a conversation between business and the state (with a sprinkling of just a few, selected, civic / academic actors). I hope it's succesful and sets a precedent.

And I like the way Ada Lovelace frames risks, and highlights that there are structural harms, risk of market concentration in particular.

This risk is often ignored, and it's the one that can be addressed by policies that support open, commons-based governance of AI.

Also, it's a risk that - since it's structural - affects the policy debate itself: there is a risk of regulatory capture by the largest players, in whose corporate hands power is concentrated. One more reason to make the AI policy debate more inclusive.

#aicommons

tarkowski, to random
@tarkowski@101010.pl avatar

Wikimedia Foundation (@wikimedia has just released a ChatGPT plugin, as part of "Future Audiences" work.

This is important news, it's good to see Wikimedia explore new channels for making wiki-resources available: chatbots are clearly one of such key channels.

And it's not surprising that WMF started with a ChatGPT plugin, as the chatbot is the most popular one, and one that offers such plugins.

As a next step it would be great to see at the same time a commitment to supporting open alternatives, like the OpenAssistant.

The “Future Audiences” program is crucial, but it provides a limited framing for AI/ML work in terms of accessibility. Wikimedia Movt needs a broader mission - that aligns with ongoing work to build open(ish) AI services.

https://diff.wikimedia.org/2023/07/13/exploring-paths-for-the-future-of-free-knowledge-new-wikipedia-chatgpt-plugin-leveraging-rich-media-social-apps-and-other-experiments/

  • there’s still no WMF account here, which is a pity!
tarkowski, to random
@tarkowski@101010.pl avatar

„We're trying to reverse 40 years' worth of Reagonmics here. It won't happen overnight.” - great piece by @pluralistic about the amazing Lina Khan and the market competition work she is puahing ahead in the US.

(If you’re in Europe like me, then yes, the anti-competitive vibe in the EU is different and better. But this is still a globalny interconndcted world, what happens in the US matters)

https://pluralistic.net/2023/07/14/making-good-trouble/#the-peoples-champion

tarkowski, to random
@tarkowski@101010.pl avatar

Mark Surman from @mozilla , in a recent op-ed for Fast Company, argues that the recent OpenAI debacle confirms the need for nonprofit control over key technologies. Surman points to Linux, Apache, Mozilla as prior examples.

I really like his argument, and wish that it will travel far. Will add that there's also room for public institutions to play a stronger role in these technological ecosystems.

The OpenAI case was slightly different than the examples that Mark mentions. In these cases, nonprofits established alternatives that succesfully carved out niches or even took over markets. With OpenAI, there was a slim hope that a nonprofit will be the leading organization developing an emergent tech.

I guess that this hope died several weeks ago.

https://www.fastcompany.com/90992180/this-is-the-right-lesson-to-take-from-the-openai-debacle

tarkowski, to llm
@tarkowski@101010.pl avatar

I'm reading the white paper for the open(ish) Llama 2 #LLM and it strikes me how English-centric the development work has been.

Meta basically decided that building a model for just one of the world's languages is good enough.

(And admittedly, having read a fair share of the #AI bias literature, I don't see linguistic bias as raised very often, if at all - happy to learn that I'm wrong).

Coming back to Llama 2: it's trained on 90% English language data, and the other 10% is mainly code. All other major languages constitute together around 1-2% of data.

And here's a kicker: when the model was "red teamed" - tested for vulnerabilities, the testers would use prompts in different languages. Because these are typical "attack vectors".

So here we are. A major new model is shared with the world. Usable in one language, and with no roadmap to expand linguistic scope. With other languages seen mainly as ways to mess with the model.

https://ai.meta.com/llama/

tarkowski, to ai
@tarkowski@101010.pl avatar

Use of synthetic data to train #AI models degrades their quality and leads to model colapse, according to new research. Why is this important? Because it means that AI development will need human-generated content. (via Jack Clark’s ImpactAI newsletter) https://arxiv.org/abs/2305.17493v2

(By the way, what's the opposite of synthetic data and content? Human-generated sounds technical, maybe genuine is a good term?)

(And the paper frames it as a yes/no choice, while in fact there will be shades of genuinity, and shades of syntheticity)

Researchers note that access to genuine content will be a source of competitive advantage. And suggests that instead AI devs coordinate and share info on data provenance. Which sounds like managing this data as #AIcommons.

On our blog, @paulk wrote recently about the need to introduce measures that force AI companies to give back to the commons, which they are now exploiting. Paul discussed this in the context of the #copyright debate. https://openfuture.eu/blog/ai-the-commons-and-the-limits-of-copyright/

This research shows that the issue is more fundamental: we need to sustain human creativity in order for synthetic creativity to remain sustainable - and fight the urge of corporations to reduce the former with the use of the latter, for purpose of profit.

And also suggests that stewardship of the cultural and knowledge commons will soon need to include ways of identifying genuine vs synthetic content.

tarkowski, to llm
@tarkowski@101010.pl avatar

The Falcon 180B follows Llama2 as a #LLM that is "open-source", but actually more open-sourceish.

This is becoming a trend: releases of major models framed as "open-source", but in fact released in ways that don't meet the open-source definition.

Falcon models are described by its creators as "open source or open access". I wish they clarified what they mean!

On a positive side, the release includes Falcon RefinedWeb, a publicly available dataset based on Common Crawl (thus with all sorts of copyright issues potentially lurking inside - and tellingly not mentioned in quite extensive "considerations for using the data".

Then there are the two major limitations: an acceptable use policy - this one with four very general categories of non-acceptable uses.

And a restriction on hosting: offering the model as a shared instance or a managed service. Also applies to fine-tuned derivatives)

Which is interesting, because feels like the equivalent of a Non-Commercial clause for #AI models.

So while the licensing conditions are different, Falcon follows the path charted by Llama2: a non-competitive, non-interoperable flavor of "open source"

https://falconllm.tii.ae

#ai #opensourceai #llm #opensource #aicommons

tarkowski, to random
@tarkowski@101010.pl avatar

Recently, @creativecommons launched a survey that explores values that underpin the idea of Open Culture.

I took the survey and it's super interesting - in general, any exploration that clarifies shared visions are really important for our movement.

And Creative Commons will use the insights that they gather to build a strong, shared position on Open Culture, in order to push for policies that support sharing of culture.

So please consider sharing, if you are in any way connected with the idea of open culture.

https://docs.google.com/forms/d/e/1FAIpQLSfRL0Y3AZYxGAZ_6-FUAs8YYgxF7-b2Yo4_ZU7uVBuBblzl8g/viewform

tarkowski, to ai
@tarkowski@101010.pl avatar

Open Future's newest white paper, authored by @zwarso and myself, addresses the governance of data sets used for #AI training.

Over the past two years, it has become evident that shared datasets are necessary to create a level playing field and support AI solutions in the public interest. Without these shared datasets, companies with vast proprietary data reserves will always have the winning hand.

However, data sharing in the era of AI poses new challenges. Thus, we need to build upon established methods like #opendata refining them and integrating innovative ideas for data governance.

Our white paper proposes that data sets should be governed as commons, shared and responsibly managed collectively. We outline six principles for commons-based governance, complemented by real-life examples of these principles in action.

https://openfuture.eu/publication/commons-based-data-set-governance-for-ai/

#aicommons #datacommons #commons #opensource

tarkowski, to ai
@tarkowski@101010.pl avatar

. @mweinberg wrote recently a short post titled "Licenses are Not Proxies for Openness in AI Models".

The title basically says it all, and is spot on.

https://michaelweinberg.org/blog/2024/03/26/ntia-open-ai/

Mike writes that as long as we don't have consensus on what "open" means in the #AI space, "any definition of open should require a more complex analysis than simply looking at a license".

Good point! Funnily enough, the European AI Act drafters did exactly that, which Mike suggests should be avoided: defined open source AI as a pile of AI stuff under a "free and open-source license".

(I wrote about it on our blog: https://openfuture.eu/blog/ai-act-fails-to-set-meaningful-dataset-transparency-standards-for-open-source-ai/)

Mike also distinguishes hardware and software. Hardware is more complex, and it's therefore the hardware space where licensing will not serve as a simple proxy of openness.

I would argue that this point can be made also for software, and other types of content. Mike is right that licenses grew to be powerful proxies of openness. But there have always been other factors - less visible, and not so easily standardized. For example collaborative practices, standards for platforms that support open sharing, etc.

There seems to be a growing sense that we need to look beyond the license proxies, and identify other factors as core to open frameworks. The #AI space is one where such thinking is most visible, but I'm expecting spillover beyond the AI debates.

tarkowski, to generativeAI
@tarkowski@101010.pl avatar

. henryfarrell@mastodon.social wrote a great essay, outlining a political economy of #generativeAI.

My thinking aligns with his in a lot of ways, and I especially like:

✦ how he takes the "Shoggoth" metaphor, often used to incite moral panic about AGI, and shows that corporations are the real Shoggoths that we should be worried about
✦ how he deploys the "map and territory" metaphor to describe political stakes related to genAI - the struggle is for control of technologies, with the help of which there are increasing attempts to replace maps for real territories
✦ how he notes a reconfiguration of political positions of activists and organizations like Open Future Foundation - and signals the need for a new advocacy agenda based on a good understanding of emergent ways of creating synthetic knowledge and culture, and focused on supporting and protecting human knowledge.

https://www.programmablemutter.com/p/the-political-economy-of-ai

tarkowski, to ai
@tarkowski@101010.pl avatar

Interesting data from a new edition of the Foundation Model Transaprency Index - collected six months after the initial index was released.

Overall, there's big improvement, with average score jumping from 37 to 58 point (out of a 100). That's a lot!

The interesting fact is that researchers contacted developers and solicited data - interactions count.

More importantly, there is little improvement, and little overall transparency in a category that researchers describe as "upstream": on data, labour and compute that goes into training. And "data access" gets the lowest score of all the parameters.

More at Tech Policy Press: https://www.techpolicy.press/the-foundation-model-transparency-index-what-changed-in-6-months/

  • All
  • Subscribed
  • Moderated
  • Favorites
  • JUstTest
  • ngwrru68w68
  • everett
  • InstantRegret
  • magazineikmin
  • thenastyranch
  • rosin
  • cubers
  • Durango
  • Youngstown
  • slotface
  • khanakhh
  • kavyap
  • DreamBathrooms
  • megavids
  • GTA5RPClips
  • osvaldo12
  • tacticalgear
  • modclub
  • cisconetworking
  • mdbf
  • tester
  • ethstaker
  • Leos
  • normalnudes
  • provamag3
  • anitta
  • lostlight
  • All magazines