@tarkowski@101010.pl
@tarkowski@101010.pl avatar

tarkowski

@tarkowski@101010.pl

digital sociologist / strategist / activist. Open Future (EU), Communia (EU), Creative Commons (World), Centrum Cyfrowe (PL)

This profile is from a federated server and may be incomplete. Browse more on the original instance.

luis_in_brief, to random
@luis_in_brief@social.coop avatar

Very useful find from @tarkowski on the theorized relationship of and data spaces in the EU.
https://101010.pl/@tarkowski/112398533782766536

tarkowski,
@tarkowski@101010.pl avatar

@Di4na @luis_in_brief you mean about data spaces in general, or a specific one? I agree that "data spaces" feel right now to still lack substance, and to be mainly policy-based ideas that are being elaborated by large consortia, without clear results.
but I think this also creates an opportunity to propose a meaningful approach to data sharing, that fits the "data spaces" frameworks

luis_in_brief, to random
@luis_in_brief@social.coop avatar

@ldodds @tarkowski do you have a good sense of why European Data Spaces never get mentioned in open data circles? Am I just in the wrong watering holes or what?

https://digital-strategy.ec.europa.eu/en/policies/data-spaces

#opendata

tarkowski,
@tarkowski@101010.pl avatar

@luis_in_brief @ldodds my first thought: because the idea is confusing and not yet clearly implemented? and for Open Data people means more business as usual, along Open Data strategies defined years ago?

My second thought is that it's an issue, there's a need to connect the two.

And I just found this, seems relevant: https://data.europa.eu/en/publications/datastories/when-open-data-meets-data-spaces

tarkowski,
@tarkowski@101010.pl avatar

@ldodds @luis_in_brief

Leigh, regarding your point that

"- as a primarily (AIUI) B2B data sharing solution its not relevant to the goals of many in the open data community (open science, transparency, open govt, etc)"

this actually creates an opportunity to push for #opendata approaches to commercial / business data. @timdavies reminded me a while ago that open data initially was meant to also cover business data, but overtime focused on just public data.

tarkowski, to generativeAI
@tarkowski@101010.pl avatar

. henryfarrell@mastodon.social wrote a great essay, outlining a political economy of #generativeAI.

My thinking aligns with his in a lot of ways, and I especially like:

✦ how he takes the "Shoggoth" metaphor, often used to incite moral panic about AGI, and shows that corporations are the real Shoggoths that we should be worried about
✦ how he deploys the "map and territory" metaphor to describe political stakes related to genAI - the struggle is for control of technologies, with the help of which there are increasing attempts to replace maps for real territories
✦ how he notes a reconfiguration of political positions of activists and organizations like Open Future Foundation - and signals the need for a new advocacy agenda based on a good understanding of emergent ways of creating synthetic knowledge and culture, and focused on supporting and protecting human knowledge.

https://www.programmablemutter.com/p/the-political-economy-of-ai

tarkowski, to ai
@tarkowski@101010.pl avatar

I enjoyed reading @mweinberg 's comments to the NTIA on and . Mike's argument is simple: in a space as complex and emergent as AI, we cannot consider free / open licenses a good proxy for openness.

Particularly valuable is Mike's analysis of how openness played out in the field of , a good analogous setting for conversations about AI.

I wrote a short note with my thoughts on this: https://openfuture.eu/note/mike-weinberg-on-open-licensing/

tarkowski, to ai
@tarkowski@101010.pl avatar

. @mweinberg wrote recently a short post titled "Licenses are Not Proxies for Openness in AI Models".

The title basically says it all, and is spot on.

https://michaelweinberg.org/blog/2024/03/26/ntia-open-ai/

Mike writes that as long as we don't have consensus on what "open" means in the #AI space, "any definition of open should require a more complex analysis than simply looking at a license".

Good point! Funnily enough, the European AI Act drafters did exactly that, which Mike suggests should be avoided: defined open source AI as a pile of AI stuff under a "free and open-source license".

(I wrote about it on our blog: https://openfuture.eu/blog/ai-act-fails-to-set-meaningful-dataset-transparency-standards-for-open-source-ai/)

Mike also distinguishes hardware and software. Hardware is more complex, and it's therefore the hardware space where licensing will not serve as a simple proxy of openness.

I would argue that this point can be made also for software, and other types of content. Mike is right that licenses grew to be powerful proxies of openness. But there have always been other factors - less visible, and not so easily standardized. For example collaborative practices, standards for platforms that support open sharing, etc.

There seems to be a growing sense that we need to look beyond the license proxies, and identify other factors as core to open frameworks. The #AI space is one where such thinking is most visible, but I'm expecting spillover beyond the AI debates.

tarkowski, to ai
@tarkowski@101010.pl avatar

Open Future's newest white paper, authored by @zwarso and myself, addresses the governance of data sets used for training.

Over the past two years, it has become evident that shared datasets are necessary to create a level playing field and support AI solutions in the public interest. Without these shared datasets, companies with vast proprietary data reserves will always have the winning hand.

However, data sharing in the era of AI poses new challenges. Thus, we need to build upon established methods like refining them and integrating innovative ideas for data governance.

Our white paper proposes that data sets should be governed as commons, shared and responsibly managed collectively. We outline six principles for commons-based governance, complemented by real-life examples of these principles in action.

https://openfuture.eu/publication/commons-based-data-set-governance-for-ai/

tarkowski, to random
@tarkowski@101010.pl avatar

Recently, @creativecommons launched a survey that explores values that underpin the idea of Open Culture.

I took the survey and it's super interesting - in general, any exploration that clarifies shared visions are really important for our movement.

And Creative Commons will use the insights that they gather to build a strong, shared position on Open Culture, in order to push for policies that support sharing of culture.

So please consider sharing, if you are in any way connected with the idea of open culture.

https://docs.google.com/forms/d/e/1FAIpQLSfRL0Y3AZYxGAZ_6-FUAs8YYgxF7-b2Yo4_ZU7uVBuBblzl8g/viewform

tarkowski, to random
@tarkowski@101010.pl avatar

Andy Baio wrote a thoughtful op-ed (or maybe an obituary?) for Ello, the alternative social network that for me seemed like a thing, for a moment, around 2015.

Apparently, it existed much longer, and Andy charts how it failed to stay true to its vision of an jndependent, sustainable network for creators. Andy argues it was because of VC funding.

But I read it as an overall sad story about how hard it is to build such a network. Obviously, this is written on a network that’s just what Ello failed to be

https://waxy.org/2024/01/the-quiet-death-of-ellos-big-dreams/

tarkowski, to random
@tarkowski@101010.pl avatar

In this (relatively) recent piece, Venkat Rao compares generative AI systems to the Webb telescope, and argues that AI are not machines that produce something, but rather discover things. And the thing that they discover is information / intelligence that is inherent to data.

The argument - as often with Venkat's writing - often gets quite complicated. But the core argument is worth noting also for much less philosophical discussions about generative AI: that ulimately it's the data, and not the model that are crucial.

In the last months, I've been spending much time thinking about dataset governance and developing a commons-based framework for such governance. So Venkat's piece offers a useful theoretical underpinning, a story explaining why this is important.

There's been a lot of progress in 2023 on AI models, with dev teams playing the game of "who can count more billions of parameters?". It was also a year where there few positive developments in terms of dataset development and governance.

Hopefully, in 2024 this trend will reverse.

Venkat's piece:
https://studio.ribbonfarm.com/p/a-camera-not-an-engine

tarkowski, to random
@tarkowski@101010.pl avatar

Mark Surman from @mozilla , in a recent op-ed for Fast Company, argues that the recent OpenAI debacle confirms the need for nonprofit control over key technologies. Surman points to Linux, Apache, Mozilla as prior examples.

I really like his argument, and wish that it will travel far. Will add that there's also room for public institutions to play a stronger role in these technological ecosystems.

The OpenAI case was slightly different than the examples that Mark mentions. In these cases, nonprofits established alternatives that succesfully carved out niches or even took over markets. With OpenAI, there was a slim hope that a nonprofit will be the leading organization developing an emergent tech.

I guess that this hope died several weeks ago.

https://www.fastcompany.com/90992180/this-is-the-right-lesson-to-take-from-the-openai-debacle

DAIR, to random
@DAIR@dair-community.social avatar

404 Media reports that "Largest Dataset Powering AI Images Removed After Discovery of Child Sexual Abuse Material" 🧵

However, in 2021, a preprint by @abebab, Vinay Uday Prabhu & Emmanuel Kahembwe found a number issues in the dataset including " troublesome and explicit images and text pairs of rape, pornography, malign stereotypes, racist and ethnic slurs, and other extremely problematic content."

The preprint can be found here: https://arxiv.org/abs/2110.01963

https://www.404media.co/laion-datasets-removed-stanford-csam-child-abuse/

tarkowski,
@tarkowski@101010.pl avatar

@ed @DAIR @abebab I would be surprised to learn that there is a patching culture for datasets like LAION.
The story shared by 404 shows that dataset maintenance standards are badly needed. I think it’s also a cultural change that’s needed: from a culture of data dumps to one of data care

openfuture, to ai
@openfuture@eupolicy.social avatar

At a time when many rush to deploy #AI at any cost, the level-headed and principle-based approach adopted by the #Wikimedia movement is refreshing”, says @tarkowski in our latest opinion 👉 https://openfuture.eu/blog/ai-and-the-commons-the-wikimedia-movement/ CC @wikimediafoundation

tarkowski,
@tarkowski@101010.pl avatar

@ainali Jan I agree, and made this point in an earlier op-ed on our blog.

I understand that the small WMF AI team is being opportunistic and looking for quick, frugal experiments. And also feel they need to explore (potential) dominant channels - the way they accepted that Wikipedia is intermediated through commercial search engines.

Stiil, I would also like to see a stronger commitment to open(ish) solutions. That's why I like a lot more the translation tool.

@openfuture @wikimediafoundation

CyberneticForests, to random
@CyberneticForests@assemblag.es avatar

I'm an AI skeptic but I am deep in its weeds anyway. I can't tell the future, but I can tell you what I learned this year - and where I expect it to go in 2024. https://open.substack.com/pub/cyberneticforests/p/what-i-learned-about-ai-in-2023

tarkowski,
@tarkowski@101010.pl avatar

@CyberneticForests Eryk I appreciated a lot your personal foresight, especially the way it shifts the focus of the narrative, from AI development to developments around AI. The layoffs of safety teams are a good choice ad a starting data point for talking about AI AD2023.

One thing that I noticed is the difficulty of talking about the climate footprint. I agree that additional transparency and efforts to measure it are crucial. But the numbers themselves are abstract, I don’t know how to feel about the “five cellphones”, sounds like a bit but also something we do all the time. So it’s an interesting question what we need to do for people to pause before they press that chatbot reply button

tarkowski, to ai
@tarkowski@101010.pl avatar

Open source has been a key issue in policy debates, and includes provisions that regulate development and sharing of AI development.
We've been following the proposed rules as they meandered through several different approaches. Now elements of the final version have been made public.

The big question was whether transparency and other obligations need to be mandated for open source AI, or they can be self-regulated – under the assumption that open source developers ensure these elements based on principles of open development.

The agreed upon wording of the AI Act assumes the later, and makes open source exempt from regulation of general purpose AI models, including transparency obligations.

We think that this is a problem, especially that lack of agreed standards that define open source AI means that there is an risk of open washing.

You can read more about this on our blog: Paul Keller wrote a detailed analysis of the provisions:
https://openfuture.eu/blog/a-frankenstein-like-approach-open-source-in-the-ai-act/

tarkowski,
@tarkowski@101010.pl avatar

@miklo this depends on how the AIAct will be enforced, let’s hope that the exception will be protected from such circumvention. And a more clear, precise definition is the starting point

tarkowski, to random
@tarkowski@101010.pl avatar

I've been reading up on recent developments, including proposals for new approaches like .

The discussions are fascinating, and it's increasing clear to me that Open Access ecosystems are facing a case of what we call (at @openfuture ) the : OA faces the challenge of value extraction, in this case by the largest academic publishers.

It's just as fascinating to see solutions on the table: proposals for alternative publishing models that are more sovereign, civic, community-led. Basically: digital public infrastructures.

This recent news feature from Nature is a good starting point:

https://www.nature.com/articles/d41586-023-03342-6

tarkowski,
@tarkowski@101010.pl avatar

There's one more really interesting angle, in terms of global open movement dynamics and open policies: it turns out that most advanced work on these new approaches to community-led publishing are happening in Latin America.

@MelissaHagemann wrote a great piece on this topic for the recently finished Open Access Week, and this quote stands out for me:

"The OA movement is at a critical juncture as the troubled APC model, developed in the Global North, is being exported to Latin America and elsewhere around the world".

https://www.openaccessweek.org/blog/2023/latin-america-exemplifies-what-can-be-accomplished-when-community-is-prioritized-over-commercialization

ed, to ai
@ed@opensource.org avatar

Good recognition of @osi: Matsuo Lab University of Tokyo corrected their press release announcing the release of a new #AI model. Initially claimed it was #OpenSource but later corrected recognizing OSI's role. Auto-translation

Following the definition of OSI (Open Source Initiative), some wording in the release issued on August 18, 2023 has been changed as Weblab-10B does not fall under the definition of “open source” as it cannot be used for commercial purposes  

https://weblab.t.u-tokyo.ac.jp/wp-content/uploads/2023/08/set%e8%a8%82%e6%ad%a3%e7%89%8820230822%e3%83%95%e3%82%9a%e3%83%ac%e3%82%b9%e3%83%aa%e3%83%aa%e3%83%bc%e3%82%b9.pdf

tarkowski,
@tarkowski@101010.pl avatar

@ed @osi this is great, if only the industry could also self-regulate itself better.

tarkowski, to llm
@tarkowski@101010.pl avatar

Rest of the World offers, as is often the case, a healthy antidote to some of the mainstream spins on tech development - in this case, on chatbots and #LLM.

This short interview is with Asmelash Teka Hadgu, a developer of an LLM for Ethiopian languages.

"If you ask ChatGPT in Tigrinya or Amharic the simplest and most frequently asked questions, it gives you gibberish, a mix of Tigrinya and Amharic, or even made-up words".

Sounds obvious, but we forget this as we discuss #ChatGPT, a chat optimised for English and some other major languages. Here in Poland we lack a local, Polish LLM, but everyone loves talking about ChatGPT.

Here's the quote that I find mosst striking:

"Most of the data that powers them is basically internet data, and there is not enough data online for these languages."

once again, the discussion about #AI needs to be one about data. and in this case there's a major digital / linguistic divide between the haves (Major languages of countries where the majority of LLM development is located) and have nots. The rest: Ethiopia, Poland, you name it.

Kudos, by the way, to organizations like Eleuther.ai that try to bridge this divide.

https://restofworld.org/2023/3-minutes-with-asmelash-teka-hadgu/

tarkowski, to ai
@tarkowski@101010.pl avatar

I participated yesterday in an expert workshop on Public-Private Partnerships in Global Data Governance, organized by the United Nations University Centre for Policy Research (UNU-CPR) and the International Chamber of Commerce (ICC).

I was also invited to prepare a policy brief that presented how the Public Data Commons model, which we have been advocating for, could be applied at global level for dealing with emergencies, and the broader poly-crisis.

It is exciting to see UNU explore data sharing policies within the context of the policy debate on the UN Global Digital Compact.

Worth noting is also the recent report of the High-Level Advisory Board on Effective Multilateralism, "A Breakthrough for People and Planet". One of the transofrmative shifts, "the just digital transition", includes a recommendation for a global data impact hub.

In my brief, I show how this impact hub could be designed as a Public Data Commons. I also highly recommend other briefs presented at the event, by Alex Novikau, Isabel Rocha de Siqueira, Michael Stampfer and Stefaan Verhulst.

You can find the report and all the briefs on the UNU webpage: https://unu.edu/cpr/project/breakthrough-people-and-planet

tarkowski, to random
@tarkowski@101010.pl avatar

In a month (7-8 December) I will be speaking at a conference on data governance and AI, organized in Washington, DC by the Digital Trade and Data Governance Hub. I am excited about this for two reasons:

first of all, we need to connect the policy debates on data governance and AI governance. The space of AI development offers new opportunities to develop, at scale, commons-based approaches that have been much theorized and advocated for, but not yet implemented.

and secondly, I am a deep believer in dialogue between the US and the EU. US is leading in terms of AI development itself, while EU will most probably be the first country to innovate in terms of AI regulation.

Please consider joining, either in-person or remotely (it's a hybrid event).

https://www.linkedin.com/events/datagovernanceintheageofgenerat7127306901125521408/comments/

tarkowski, to random
@tarkowski@101010.pl avatar

we've released in recent weeks a series of publications on #digitalpublicspace - the final one is a primer that covers all the basic. this our 2nd publication of this type, following one on #datacommons. we hope it will help with designing relevant policies.
https://openfuture.eu/publication/digital-public-space-primer/

tarkowski, to random
@tarkowski@101010.pl avatar

Our October newsletter is out, with updates on our and work. I'm especially proud of several publications that expand our policy ideas on Digital Public Space - check them out here: https://mailchi.mp/openfuture/digital_public_space_explained

tarkowski, to random
@tarkowski@101010.pl avatar

Adobe and the "Coalition for Content Provenance and Authenticity" release a symbol (and a standard) for marking AI generated content.

The symbol feels like a riff on the #creativecommons logo, and indeed this new symbol should be seen as similar in nature - not in legal terms, but in the sense of being a voluntary, visual standard for signalling the character of content.

Still, the symbol is really confusing - why would you mark #synthetic content with a speech cloud that says "CR"?

https://www.theverge.com/2023/10/10/23911381/adobe-ai-generated-content-symbol-watermark

  • All
  • Subscribed
  • Moderated
  • Favorites
  • anitta
  • magazineikmin
  • Youngstown
  • osvaldo12
  • khanakhh
  • slotface
  • ethstaker
  • InstantRegret
  • ngwrru68w68
  • kavyap
  • DreamBathrooms
  • thenastyranch
  • everett
  • rosin
  • JUstTest
  • Durango
  • tacticalgear
  • Leos
  • provamag3
  • mdbf
  • GTA5RPClips
  • tester
  • cisconetworking
  • modclub
  • megavids
  • cubers
  • normalnudes
  • lostlight
  • All magazines