Replies

This profile is from a federated server and may be incomplete. Browse more on the original instance.

luis_in_brief, to random
@luis_in_brief@social.coop avatar

@ldodds @tarkowski do you have a good sense of why European Data Spaces never get mentioned in open data circles? Am I just in the wrong watering holes or what?

https://digital-strategy.ec.europa.eu/en/policies/data-spaces

tarkowski,
@tarkowski@101010.pl avatar

@luis_in_brief @ldodds my first thought: because the idea is confusing and not yet clearly implemented? and for Open Data people means more business as usual, along Open Data strategies defined years ago?

My second thought is that it's an issue, there's a need to connect the two.

And I just found this, seems relevant: https://data.europa.eu/en/publications/datastories/when-open-data-meets-data-spaces

tarkowski,
@tarkowski@101010.pl avatar

@ldodds @luis_in_brief

Leigh, regarding your point that

"- as a primarily (AIUI) B2B data sharing solution its not relevant to the goals of many in the open data community (open science, transparency, open govt, etc)"

this actually creates an opportunity to push for approaches to commercial / business data. @timdavies reminded me a while ago that open data initially was meant to also cover business data, but overtime focused on just public data.

luis_in_brief, to random
@luis_in_brief@social.coop avatar

Very useful find from @tarkowski on the theorized relationship of and data spaces in the EU.
https://101010.pl/@tarkowski/112398533782766536

tarkowski,
@tarkowski@101010.pl avatar

@Di4na @luis_in_brief you mean about data spaces in general, or a specific one? I agree that "data spaces" feel right now to still lack substance, and to be mainly policy-based ideas that are being elaborated by large consortia, without clear results.
but I think this also creates an opportunity to propose a meaningful approach to data sharing, that fits the "data spaces" frameworks

DAIR, to random
@DAIR@dair-community.social avatar

404 Media reports that "Largest Dataset Powering AI Images Removed After Discovery of Child Sexual Abuse Material" 🧵

However, in 2021, a preprint by @abebab, Vinay Uday Prabhu & Emmanuel Kahembwe found a number issues in the dataset including " troublesome and explicit images and text pairs of rape, pornography, malign stereotypes, racist and ethnic slurs, and other extremely problematic content."

The preprint can be found here: https://arxiv.org/abs/2110.01963

https://www.404media.co/laion-datasets-removed-stanford-csam-child-abuse/

tarkowski,
@tarkowski@101010.pl avatar

@ed @DAIR @abebab I would be surprised to learn that there is a patching culture for datasets like LAION.
The story shared by 404 shows that dataset maintenance standards are badly needed. I think it’s also a cultural change that’s needed: from a culture of data dumps to one of data care

openfuture, to ai
@openfuture@eupolicy.social avatar

At a time when many rush to deploy at any cost, the level-headed and principle-based approach adopted by the movement is refreshing”, says @tarkowski in our latest opinion 👉 https://openfuture.eu/blog/ai-and-the-commons-the-wikimedia-movement/ CC @wikimediafoundation

tarkowski,
@tarkowski@101010.pl avatar

@ainali Jan I agree, and made this point in an earlier op-ed on our blog.

I understand that the small WMF AI team is being opportunistic and looking for quick, frugal experiments. And also feel they need to explore (potential) dominant channels - the way they accepted that Wikipedia is intermediated through commercial search engines.

Stiil, I would also like to see a stronger commitment to open(ish) solutions. That's why I like a lot more the translation tool.

@openfuture @wikimediafoundation

CyberneticForests, to random
@CyberneticForests@assemblag.es avatar

I'm an AI skeptic but I am deep in its weeds anyway. I can't tell the future, but I can tell you what I learned this year - and where I expect it to go in 2024. https://open.substack.com/pub/cyberneticforests/p/what-i-learned-about-ai-in-2023

tarkowski,
@tarkowski@101010.pl avatar

@CyberneticForests Eryk I appreciated a lot your personal foresight, especially the way it shifts the focus of the narrative, from AI development to developments around AI. The layoffs of safety teams are a good choice ad a starting data point for talking about AI AD2023.

One thing that I noticed is the difficulty of talking about the climate footprint. I agree that additional transparency and efforts to measure it are crucial. But the numbers themselves are abstract, I don’t know how to feel about the “five cellphones”, sounds like a bit but also something we do all the time. So it’s an interesting question what we need to do for people to pause before they press that chatbot reply button

tarkowski, to ai
@tarkowski@101010.pl avatar

Open source has been a key issue in policy debates, and includes provisions that regulate development and sharing of AI development.
We've been following the proposed rules as they meandered through several different approaches. Now elements of the final version have been made public.

The big question was whether transparency and other obligations need to be mandated for open source AI, or they can be self-regulated – under the assumption that open source developers ensure these elements based on principles of open development.

The agreed upon wording of the AI Act assumes the later, and makes open source exempt from regulation of general purpose AI models, including transparency obligations.

We think that this is a problem, especially that lack of agreed standards that define open source AI means that there is an risk of open washing.

You can read more about this on our blog: Paul Keller wrote a detailed analysis of the provisions:
https://openfuture.eu/blog/a-frankenstein-like-approach-open-source-in-the-ai-act/

tarkowski,
@tarkowski@101010.pl avatar

@miklo this depends on how the AIAct will be enforced, let’s hope that the exception will be protected from such circumvention. And a more clear, precise definition is the starting point

tarkowski, to random
@tarkowski@101010.pl avatar

I've been reading up on recent developments, including proposals for new approaches like .

The discussions are fascinating, and it's increasing clear to me that Open Access ecosystems are facing a case of what we call (at @openfuture ) the : OA faces the challenge of value extraction, in this case by the largest academic publishers.

It's just as fascinating to see solutions on the table: proposals for alternative publishing models that are more sovereign, civic, community-led. Basically: digital public infrastructures.

This recent news feature from Nature is a good starting point:

https://www.nature.com/articles/d41586-023-03342-6

tarkowski,
@tarkowski@101010.pl avatar

There's one more really interesting angle, in terms of global open movement dynamics and open policies: it turns out that most advanced work on these new approaches to community-led publishing are happening in Latin America.

@MelissaHagemann wrote a great piece on this topic for the recently finished Open Access Week, and this quote stands out for me:

"The OA movement is at a critical juncture as the troubled APC model, developed in the Global North, is being exported to Latin America and elsewhere around the world".

https://www.openaccessweek.org/blog/2023/latin-america-exemplifies-what-can-be-accomplished-when-community-is-prioritized-over-commercialization

ed, to ai
@ed@opensource.org avatar

Good recognition of @osi: Matsuo Lab University of Tokyo corrected their press release announcing the release of a new model. Initially claimed it was but later corrected recognizing OSI's role. Auto-translation

Following the definition of OSI (Open Source Initiative), some wording in the release issued on August 18, 2023 has been changed as Weblab-10B does not fall under the definition of “open source” as it cannot be used for commercial purposes  

https://weblab.t.u-tokyo.ac.jp/wp-content/uploads/2023/08/set%e8%a8%82%e6%ad%a3%e7%89%8820230822%e3%83%95%e3%82%9a%e3%83%ac%e3%82%b9%e3%83%aa%e3%83%aa%e3%83%bc%e3%82%b9.pdf

tarkowski,
@tarkowski@101010.pl avatar

@ed @osi this is great, if only the industry could also self-regulate itself better.

OpenAccessElder, to random

I've been working with some peers on a (finally) soon-to-be-released call for book chapters on connections between and differences among open movements, and today I was ecstatic to find that @openfuture has been working in a similar vein! Kudos to you all! I will make sure that the call goes out to your team once it's up! I'm so glad to learn that there are other people exploring this topic in earnest!

tarkowski,
@tarkowski@101010.pl avatar

Hi @OpenAccessElder ! Thanks for the kind words. We're likewise excited to learn that someone is looking how the various open fields / movements align (or not). Please let u s know once you launch your CfP. Also if you have any thoughts about our mapping.
@openfuture

openfuture, to ai
@openfuture@eupolicy.social avatar

Can licensing mitigate the negative implications of web scraping in #AI training?

We're co-organizing a #CSCW virtual workshop to answer this question.

CfP is open until this Thursday:
https://www.licenses.ai/data-licensing-workshop

#CSCW2023

tarkowski,
@tarkowski@101010.pl avatar

@luis_in_brief you should definitely do that! also it would be fun if someone submitted a talk on licensing of AI outfits, as envisioned by OpenAI, for instance

@openfuture

tarkowski, to random
@tarkowski@101010.pl avatar

A new French national strategy for AI development bets on #opensourceAI and “national champions” like recently founded Mistral.AI - good coverage in Politico.

https://www.politico.eu/article/open-source-artificial-intelligence-france-bets-big/

The piece also highlights the ongoing #AIAct policy debate on regulating foundation models, including ones that are open-source (something that we have been working on at @openfuture . Depending on how the regulation will ultimately look, it will either create a supportive policy environment - so that open-source becomes Europe’s preferred approach to AI development; or it will stifle open-source development.

tarkowski,
@tarkowski@101010.pl avatar

Background reading: this op-ed from June, arguing for a “national sovereign #AI program" that includes 1) open training data, 2) explainable algorithms and 3) permissive licensing of models.

It’s interesting to see France, together with French companies that aim to become “national AI champions”, bet on #opensourceAI - at a time when the US AI Big Tech companies are to various degrees opposed / reluctant to share LLMs (with the exception of Meta, although it’s take on open sourcing models is controversial).

https://www.lepoint.fr/debats/l-open-source-chance-unique-de-creer-une-ia-de-confiance-europeenne-14-06-2023-2524434_2.php#11

tarkowski, to llm
@tarkowski@101010.pl avatar

I'm reading the white paper for the open(ish) Llama 2 and it strikes me how English-centric the development work has been.

Meta basically decided that building a model for just one of the world's languages is good enough.

(And admittedly, having read a fair share of the bias literature, I don't see linguistic bias as raised very often, if at all - happy to learn that I'm wrong).

Coming back to Llama 2: it's trained on 90% English language data, and the other 10% is mainly code. All other major languages constitute together around 1-2% of data.

And here's a kicker: when the model was "red teamed" - tested for vulnerabilities, the testers would use prompts in different languages. Because these are typical "attack vectors".

So here we are. A major new model is shared with the world. Usable in one language, and with no roadmap to expand linguistic scope. With other languages seen mainly as ways to mess with the model.

https://ai.meta.com/llama/

tarkowski,
@tarkowski@101010.pl avatar

thanks for this info, @ggdupont - it’s good to know. I remember that BLOOM is to some extent multilingual and I found that great. I get the point about advantages in working in a single language, we all know that English is ubiquitous, etc. still, in the name of diversity and global reach it would be good to see a major LLM project at least commit to a roadmap that includes other languages.
And I imagine that for many languages the issue of native speakers could be easily solved.
(The Llama white paper declares that its hard to obtain content for other languages - that sounds bogus to me)

@asbjorn

kissane, to random
@kissane@mas.to avatar

I’ve been doing online product work forever and my brain is just weird, so to me user feedback is just a tool. You extract what’s useful, offer thanks, move on. 🖖

I’m trying to empathize more with the (many, many) people who see it as a one-sided mean attack on something they like and therefore an attack on them.

A deeply human response, but it has to be exhausting. I’d like to get better at framing critique in ways that that deescalate those feelings without being patronizing or weird.

tarkowski,
@tarkowski@101010.pl avatar

@slothrop @kissane
I also think that you hit the right tonę with your piece, Erin.

And thinking about the responses, I am worried not just by the angry ones (and this is something that can be addressed only to some extent with proper tone of voice of the critique - at some point it becomes a matter of the responsibility of those responding to a good dialog).

What worries me more is that there seem to be few spaces where this kind of research could serve as input to design / governance discussions.

tarkowski, to ai
@tarkowski@101010.pl avatar

We like to throw around adoption data as proof of a given technology's / trend's significance.

I read recently that, according to US polling, 20% of people in the US tried out ChatGPT while only 9% tried crypto and 2% owned NFT.

This data is rubbish to be honest. Trying out ChatGPT requires setting up an account - you can't really compare that with the complex steps needs to obtain crypto / NFTs.

But ... I've fallen into this trap too, using ChatGPT data to argue for the importance of anticipatory governance work for #AI .

So I enjoyed being served the right perspective by @itforchange, who - in the latest issue of their brilliant DatSyn newsletter - remind everyone that Threads had an even steeper adoption curve.

Obviously, because it tapped into an app-enabled network, and simple interface did all the onboarding work.

(By the way, I highly recommend the DataSyn newsletter: https://botpopuli.net)

tarkowski,
@tarkowski@101010.pl avatar

@JorgeStolfi @itforchange Usefulness for people is a great metric, it's a pity that no one is using it.

  • All
  • Subscribed
  • Moderated
  • Favorites
  • JUstTest
  • ngwrru68w68
  • everett
  • InstantRegret
  • magazineikmin
  • thenastyranch
  • rosin
  • GTA5RPClips
  • Durango
  • Youngstown
  • slotface
  • khanakhh
  • kavyap
  • DreamBathrooms
  • provamag3
  • ethstaker
  • osvaldo12
  • tester
  • cubers
  • cisconetworking
  • mdbf
  • tacticalgear
  • modclub
  • Leos
  • anitta
  • normalnudes
  • megavids
  • lostlight
  • All magazines