@tarkowski@101010.pl
@tarkowski@101010.pl avatar

tarkowski

@tarkowski@101010.pl

digital sociologist / strategist / activist. Open Future (EU), Communia (EU), Creative Commons (World), Centrum Cyfrowe (PL)

This profile is from a federated server and may be incomplete. Browse more on the original instance.

tarkowski, to ai
@tarkowski@101010.pl avatar

Good new piece from Tech Policy Review arguing that #AI regulation should build on frameworks for social media regulation. Instead of inventing the wheel anew.

The piece focuses on the US context - so it's worth to translate these ideas into European context, where connections are increasingly made between #AIAct and the #DSA. Ideas about platform responsibility and regulation, especially around systemic risks, can be translated into the #AI debate (and this is actually happening).

This, by the way, is an interesting case of policymakers being quite agile. Regulation is criticised for being too slow - here there's a chance for quite an adaptive approach.

https://techpolicy.press/to-move-forward-with-ai-look-to-the-fight-for-social-media-reform/

luis_in_brief, to random
@luis_in_brief@social.coop avatar

@ldodds @tarkowski do you have a good sense of why European Data Spaces never get mentioned in open data circles? Am I just in the wrong watering holes or what?

https://digital-strategy.ec.europa.eu/en/policies/data-spaces

tarkowski,
@tarkowski@101010.pl avatar

@ldodds @luis_in_brief

Leigh, regarding your point that

"- as a primarily (AIUI) B2B data sharing solution its not relevant to the goals of many in the open data community (open science, transparency, open govt, etc)"

this actually creates an opportunity to push for approaches to commercial / business data. @timdavies reminded me a while ago that open data initially was meant to also cover business data, but overtime focused on just public data.

tarkowski, to threads
@tarkowski@101010.pl avatar

On the Mastodon blog, @Gargron published "What to know about Threads" - part FAQ about possible impact of Meta's on the . And part manifesto.

https://blog.joinmastodon.org/2023/07/what-to-know-about-threads/

First, a caveat. Threads that are interoperable with are for now but a figment of collective imagination. Making it a good time to consider this scenario - but who knows whether it will ultimately happen?

I'm happy that Rochko / Mastodon org voice strong support for an interoperable , including interop with large commercial platforms.

I want to react to two things:

"Well, even if Threads abandoned ActivityPub down the line, where we would end up is exactly where we are now."

Well, not really. Lots of Fediverse devs/admin have a very technical take, which thinks just in terms of a network of servers. A server federates, a server defederates, life goes on.

What changes is the user experience. A world where Threads federeate and then abandon is one of many broken experiences and broken social graphs.

Federated services - even in a space that sees defederation as an option, just as open source sees forks as a possibility - ultimately require a commitment to continuity. And this commitment is something that cannot be governed by a protocol. It requires a social contract, a pact.

How do you make a contract of this sort with Meta? Is the 100 EUR question.

Secondly,

"If you are not happy with their decision, you can move your account to a different Mastodon server while keeping all of your followers. Since Mastodon is open-source, you can even host your own server and be entirely in charge."

This requires a separate post to unpack, but seriously? Again, this vision of user experience is flawed. Moving between servers is relatively easy, but feels difficult. (And why are there two Gargron accounts? How does this make sense as UX?). And the offer "set up your own server, its open source" is a type of tech elitism that really needs to end.

tarkowski, to llm
@tarkowski@101010.pl avatar

I'm reading the white paper for the open(ish) Llama 2 and it strikes me how English-centric the development work has been.

Meta basically decided that building a model for just one of the world's languages is good enough.

(And admittedly, having read a fair share of the bias literature, I don't see linguistic bias as raised very often, if at all - happy to learn that I'm wrong).

Coming back to Llama 2: it's trained on 90% English language data, and the other 10% is mainly code. All other major languages constitute together around 1-2% of data.

And here's a kicker: when the model was "red teamed" - tested for vulnerabilities, the testers would use prompts in different languages. Because these are typical "attack vectors".

So here we are. A major new model is shared with the world. Usable in one language, and with no roadmap to expand linguistic scope. With other languages seen mainly as ways to mess with the model.

https://ai.meta.com/llama/

tarkowski, to mastodon
@tarkowski@101010.pl avatar

Upgrade Democracy has published a short, and very good report, on #Mastodon. More precisely, on what needs to be done to make a decentralised space truly democratic.

Short answer is, move beyond a “democracy of the admins”.

The authors (who include @matthiaskettemann and others who don’t seem to be here) provide a nuance look and set of recommendations that range from introducing democratic structures for governing infrastructures, through improved and co-created content moderation mechanisms, to technical empowerment of users.

“Now’s the time to up the fediverse’s game with regard to participation, empowerment and inclusion – we need to seize the moment and turn positive insights into action” - reads the conclusion.

The problem: it’s unclear who is the audience of this. Or rather, are the admins and developers listening? The shift that this report hopes for will not come from outside the #Fediverse.

https://upgradedemocracy.de/en/impulse/decentralization-as-democratization-mastodon-instead-of-platform-power/

tarkowski, to random
@tarkowski@101010.pl avatar

Wikimedia Foundation (@wikimedia has just released a ChatGPT plugin, as part of "Future Audiences" work.

This is important news, it's good to see Wikimedia explore new channels for making wiki-resources available: chatbots are clearly one of such key channels.

And it's not surprising that WMF started with a ChatGPT plugin, as the chatbot is the most popular one, and one that offers such plugins.

As a next step it would be great to see at the same time a commitment to supporting open alternatives, like the OpenAssistant.

The “Future Audiences” program is crucial, but it provides a limited framing for AI/ML work in terms of accessibility. Wikimedia Movt needs a broader mission - that aligns with ongoing work to build open(ish) AI services.

https://diff.wikimedia.org/2023/07/13/exploring-paths-for-the-future-of-free-knowledge-new-wikipedia-chatgpt-plugin-leveraging-rich-media-social-apps-and-other-experiments/

  • there’s still no WMF account here, which is a pity!
tarkowski, to ai
@tarkowski@101010.pl avatar

Open source has been a key issue in policy debates, and includes provisions that regulate development and sharing of AI development.
We've been following the proposed rules as they meandered through several different approaches. Now elements of the final version have been made public.

The big question was whether transparency and other obligations need to be mandated for open source AI, or they can be self-regulated – under the assumption that open source developers ensure these elements based on principles of open development.

The agreed upon wording of the AI Act assumes the later, and makes open source exempt from regulation of general purpose AI models, including transparency obligations.

We think that this is a problem, especially that lack of agreed standards that define open source AI means that there is an risk of open washing.

You can read more about this on our blog: Paul Keller wrote a detailed analysis of the provisions:
https://openfuture.eu/blog/a-frankenstein-like-approach-open-source-in-the-ai-act/

tarkowski, to ai
@tarkowski@101010.pl avatar

We like to throw around adoption data as proof of a given technology's / trend's significance.

I read recently that, according to US polling, 20% of people in the US tried out ChatGPT while only 9% tried crypto and 2% owned NFT.

This data is rubbish to be honest. Trying out ChatGPT requires setting up an account - you can't really compare that with the complex steps needs to obtain crypto / NFTs.

But ... I've fallen into this trap too, using ChatGPT data to argue for the importance of anticipatory governance work for #AI .

So I enjoyed being served the right perspective by @itforchange, who - in the latest issue of their brilliant DatSyn newsletter - remind everyone that Threads had an even steeper adoption curve.

Obviously, because it tapped into an app-enabled network, and simple interface did all the onboarding work.

(By the way, I highly recommend the DataSyn newsletter: https://botpopuli.net)

tarkowski, to ChatGPT
@tarkowski@101010.pl avatar

The term „chatGPT” has been among the top 10 viewed #Wikipedia entries in the last 3 weeks. This is interesting in many ways. Especially that “AI” is not on the list. It’s #chatGPT that people want to know about, the mysterious acronym. #AI is either not on their minds - despite all the media and industry hype - or it is like magic, not to be investigated. ChatGPT is a pretty bad name, as far as names go, and it’s getting normalised quickly (but not yet commodified)

https://en.wikipedia.org/wiki/Wikipedia:Wikipedia_Signpost/2023-05-22/Traffic_report

luis_in_brief, to random
@luis_in_brief@social.coop avatar

Very useful find from @tarkowski on the theorized relationship of and data spaces in the EU.
https://101010.pl/@tarkowski/112398533782766536

tarkowski,
@tarkowski@101010.pl avatar

@Di4na @luis_in_brief you mean about data spaces in general, or a specific one? I agree that "data spaces" feel right now to still lack substance, and to be mainly policy-based ideas that are being elaborated by large consortia, without clear results.
but I think this also creates an opportunity to propose a meaningful approach to data sharing, that fits the "data spaces" frameworks

tarkowski, to random
@tarkowski@101010.pl avatar

A new French national strategy for AI development bets on #opensourceAI and “national champions” like recently founded Mistral.AI - good coverage in Politico.

https://www.politico.eu/article/open-source-artificial-intelligence-france-bets-big/

The piece also highlights the ongoing #AIAct policy debate on regulating foundation models, including ones that are open-source (something that we have been working on at @openfuture . Depending on how the regulation will ultimately look, it will either create a supportive policy environment - so that open-source becomes Europe’s preferred approach to AI development; or it will stifle open-source development.

tarkowski, to random
@tarkowski@101010.pl avatar

A16Z announced that they are supporting the "Open Source AI" community by giving several grants to individual developers.

Anyone here has an idea what's their definition of open source (in AI relation to AI)?

I'm curious because - as OSI runs a process to collaboratively understand and define "open source AI", others set their own standard. Meta with their own take on "open sourcing" Llama2, obviously.

So would be good to know how a VC firm understands this.

At least one of the funded projects is fine-tuning Llama2, which I wouldn't call open source. But which probably will end up being treated as such, by sheer force of Meta's position.

https://a16z.com/supporting-the-open-source-ai-community/

@ed @luis_in_brief @savi

tarkowski, to random
@tarkowski@101010.pl avatar

„We're trying to reverse 40 years' worth of Reagonmics here. It won't happen overnight.” - great piece by @pluralistic about the amazing Lina Khan and the market competition work she is puahing ahead in the US.

(If you’re in Europe like me, then yes, the anti-competitive vibe in the EU is different and better. But this is still a globalny interconndcted world, what happens in the US matters)

https://pluralistic.net/2023/07/14/making-good-trouble/#the-peoples-champion

tarkowski, to random
@tarkowski@101010.pl avatar

@ainali hi Jan this is Alek from Open Future. I would be keen to learn your views on wiki enterprise and „wiki ai”. Maybe you would have time for a short call at the end of the week?

tarkowski, to random
@tarkowski@101010.pl avatar

In this (relatively) recent piece, Venkat Rao compares generative AI systems to the Webb telescope, and argues that AI are not machines that produce something, but rather discover things. And the thing that they discover is information / intelligence that is inherent to data.

The argument - as often with Venkat's writing - often gets quite complicated. But the core argument is worth noting also for much less philosophical discussions about generative AI: that ulimately it's the data, and not the model that are crucial.

In the last months, I've been spending much time thinking about dataset governance and developing a commons-based framework for such governance. So Venkat's piece offers a useful theoretical underpinning, a story explaining why this is important.

There's been a lot of progress in 2023 on AI models, with dev teams playing the game of "who can count more billions of parameters?". It was also a year where there few positive developments in terms of dataset development and governance.

Hopefully, in 2024 this trend will reverse.

Venkat's piece:
https://studio.ribbonfarm.com/p/a-camera-not-an-engine

tarkowski, to threads
@tarkowski@101010.pl avatar

I keep thinking about #Threads / #ActivityPub / #Mastodon interoperability - the deliciously virtual federation that Meta made happen purely through media announcements.

There's a the Verge interview with Adam Mosseri, the head of Instagram, where he says:

"I do think that more and more people are going to be interested in and appreciate more open systems. And I think that’s the direction of travel for the industry."

and adds:

"Creators are a really good example. Creators are becoming more and more savvy. They’re using more and more platforms. It’s becoming rarer that a creator is completely attached to one platform because they’re always worried about the risk of being overly beholden to one company that they obviously can’t control".

So we have Meta betting - adventurously - on something that they see as the future, but also a major challenge for platforms. And also addressing the needs of creators, who no longer want to be bound by platforms (like Instagram or Facebook).

This is either very exciting or quite ridiculous. If Threads do indeed federate, Meta will be a company that will have in its portfolio both closed and open platforms.

Final quote from Mosseri:

"We know that we need to evolve, or else we run the risk of becoming irrelevant."

There's a saying in Poland: "honey for your heart", the nicest words that you could hear from someone. Let's see if this becomes reality, or remains in the virtual realm.

https://www.theverge.com/2023/7/5/23784870/instagram-threads-adam-mosseri-interview-twitter-competitor

ps. And then there's the interesting bit about DMs - where Mosseri declares that these will not be supported - in order not to further fragmen the DM space. But I wonder... what if it is a play that's meant to "protect" Threads from #DMA #interoperability requirements?

tarkowski, to random
@tarkowski@101010.pl avatar

"Jim @openfuture.

And the quote is from a foreword to a recent conversation between Jim Lovelock and Hans Ulrich Obrist, published by Isolarii.

https://www.isolarii.com/forewords/gaia-polycrisis-tim-lenton

tarkowski, to cyberpunk
@tarkowski@101010.pl avatar

LifeLong is an NGO that trains Syrian refugees to annotate data for companies like Google and Amazon, to fuel their AI training needs.

This is something that ideally would be a sentence out of a bleak #cyberpunk novel.

More examples in a great Rest of the World piece by Phil Jones

https://restofworld.org/2021/refugees-machine-learning-big-tech/

tarkowski, to opensource
@tarkowski@101010.pl avatar

This recent piece from @kaythaney strikes all the right ideas in framing today’s challenges faced by #openaccess / #openscience. And it’s in line with the #ParadoxofOpen framing that I’ve been working on.

Kaitlin argues that Open Access has become big business, and that revenues flow more quickly for the companies than the benefits for the communities producing knowledge. That’s the gist of the Paradox of Open: the commons are being exploited.

I like the point that thanks to OA, and through the exploitative modes of publishers, “OA is free to read but not free or affordable to publish”.

And Kaitlin argues that publishers would ideally dedicate a portion of profits back into the communities (the commons). +1 to that. That’s the reason that I really like the Wikimedia Enterprise project, which shows that payments back to the commons can be part of an open framework.

The problem wiki Wikimedia ENterprise is that it is voluntary - so the big question is, how can such redistribution be mandated?

A starting point would be for #openX communities (not just #OA / #openscience but also #opensource, for example, have a shared position on the need for such redistribution.

https://blogs.lse.ac.uk/impactofsocialsciences/2023/07/20/open-access-at-any-cost-cannot-support-scholarly-publishing-communities/#OpenScience

tarkowski, to ai
@tarkowski@101010.pl avatar

I've finally read the super-long piece on Wikipedia and #AI, by Jon Gertner in the NYT.

It's a cool text, that offers a good overview of issues related both to #generativeAI and #LLMs, and of how #Wikimedia has been developing.

And it shows a blind spot in how Wikimedian's have been thinking about interacting with AI (something I tackled in my two pieces on the topic on the Open Future blog).

A lot of attention is given to the issue of synthetic generation of Wikimedia content - rightly so, as it is a possible vector for low quality content or disinformation.

And then the piece shows well the wrok done this year to better connect Wikimedia with chatbots. In practice, this means connecting with ChatGPT, and WMF has done impressive, agile work by building a ChatGPT plugin. Which demonstrates how Wikimedia content can improve the quality of chatbot answers.

What worries me is that in this way Wikimedia is focusing on improving the ChatGPT stack. I understand that it's identified as a major source of future traffic for Wikimedia.

The piece shows well how Wikimedia has been in symbiosis with Google Search and other search engines. Feels like WMF now wants to build a similar setup with commercial chatbots.

And this is the blind spot, and the third thing Wikimedians should pay more attention to: building it's own AI stack, the #WikiAI.

There's increasing talk of a "public option AI". We need to also be talking about "civic option AI" - and Wikimedia comes to mind, as the flagship for the knowledge commons - as a prime candidate to work on this.

It might sound daunting, but there's an ecosystem of open-source AI development that's building tools that Wikimedia could use.

The piece mentions work on open-source LLMs that could be deployed to help Wikimedia editors. I think that Wikimedia should also offer this as a user-facing service.

(@nickmvincent , @selenamarie I’m curious what’s your take on this?)

https://www.nytimes.com/2023/07/18/magazine/wikipedia-ai-chatgpt.html

tarkowski, to random
@tarkowski@101010.pl avatar

I've been reading up on recent developments, including proposals for new approaches like .

The discussions are fascinating, and it's increasing clear to me that Open Access ecosystems are facing a case of what we call (at @openfuture ) the : OA faces the challenge of value extraction, in this case by the largest academic publishers.

It's just as fascinating to see solutions on the table: proposals for alternative publishing models that are more sovereign, civic, community-led. Basically: digital public infrastructures.

This recent news feature from Nature is a good starting point:

https://www.nature.com/articles/d41586-023-03342-6

tarkowski, to ai
@tarkowski@101010.pl avatar

Next week, @opensource is running a series of webinars on open source / . Together with @zwarso we will be kicking off the series with a talk on the importance of data governance, and treating datasets as commons.
https://opensource.org/events/deep-dive-ai-webinar-series-2023/

tarkowski, to random
@tarkowski@101010.pl avatar

The Chan-Zuckerberg Initiative announced that in order to support non-profit medical research they are building "computing infrastructure" - that is, purchasing over a 1000 state of the art GPUs.

This is super interesting, in an AI-powered world compute is not a commodity, but a currency.

So if a private foundation can do it, why can't governments do the same? Seems that providing public interest compute infrastructure is one of the simpler move that can be made, as the comples governance issues are solved in parallel.

https://archive.ph/DL0PO

tarkowski, to ai
@tarkowski@101010.pl avatar

Interesting data from a new edition of the Foundation Model Transaprency Index - collected six months after the initial index was released.

Overall, there's big improvement, with average score jumping from 37 to 58 point (out of a 100). That's a lot!

The interesting fact is that researchers contacted developers and solicited data - interactions count.

More importantly, there is little improvement, and little overall transparency in a category that researchers describe as "upstream": on data, labour and compute that goes into training. And "data access" gets the lowest score of all the parameters.

More at Tech Policy Press: https://www.techpolicy.press/the-foundation-model-transparency-index-what-changed-in-6-months/

tarkowski, to books
@tarkowski@101010.pl avatar

Dan Cohen and Dave Hansen wrote recently a really good piece on books, libraries and AI training (the piece refers to the paper on Books Data Commons that I co-authored).

They start with a well-known argument about levelling the field: without offering public access to training resources, AI monopolies will benefit from information asymmetries. Google already has access to 40 million scanned books.

They add to this a key point about libraries' public interest stance - and suggest that libraries could actively govern / gatekeep access to books.

This reminds me of the recent paper by Melanie Dulong de Rosnay and Yaniv Benhamou, which for me is groundbreaking - it proposes that license-based approaches to sharing are combined with trusted institutions that offer more fine-grained access governance.

So it's good to see that this line of thinking is getting traction.

https://www.authorsalliance.org/2024/05/13/books-are-big-ais-achilles-heel/

  • All
  • Subscribed
  • Moderated
  • Favorites
  • JUstTest
  • ngwrru68w68
  • everett
  • InstantRegret
  • magazineikmin
  • thenastyranch
  • rosin
  • GTA5RPClips
  • Durango
  • Youngstown
  • slotface
  • khanakhh
  • kavyap
  • DreamBathrooms
  • provamag3
  • tacticalgear
  • osvaldo12
  • tester
  • cubers
  • cisconetworking
  • mdbf
  • ethstaker
  • modclub
  • Leos
  • anitta
  • normalnudes
  • megavids
  • lostlight
  • All magazines