CEDO, to ai
@CEDO@mastodon.nl avatar

"Alles online is voer voor onze " En met de diensten van dat bedrijf geven we onze kinderen les. Ouders die bezwaar maken kunnen de boom in. AutoriteitPersoonsgegevens doet niets.
https://gizmodo.com/google-says-itll-scrape-everything-you-post-online-for-1850601486

gimulnautti,
@gimulnautti@mastodon.green avatar

@CEDO One more reason to quickly draft legislation that would mandate public scrutiny of #AI training models before they are released.

This scrutiny would have to extend to the training data, in order to identify potential #copyright infringement and #derivative works.

Research is great, but public deployment should be a different matter.

jonny, to random
@jonny@neuromatch.social avatar

Glad to formally release my latest work - Surveillance Graphs: Vulgarity and Cloud Orthodoxy in Linked Data Infrastructures.

web: https://jon-e.net/surveillance-graphs
hcommons: https://doi.org/10.17613/syv8-cp10

A bit of an overview and then I'll get into some of the more specific arguments in a thread:

This piece is in three parts:

First I trace the mutation of the liberatory ambitions of the #SemanticWeb into #KnowledgeGraphs, an underappreciated component in the architecture of #SurveillanceCapitalism. This mutation plays out against the backdrop of the broader platform capture of the web, rendering us as consumer-users of information services rather than empowered people communicating over informational protocols.

I then show how this platform logic influences two contemporary public information infrastructure projects: the NIH's Biomedical Data Translator and the NSF's Open Knowledge Network. I argue that projects like these, while well intentioned, demonstrate the fundamental limitations of platformatized public infrastructure and create new capacities for harm by their enmeshment in and inevitable capture by information conglomerates. The dream of a seamless "knowledge graph of everything" is unlikely to deliver on the utopian promises made by techno-solutionists, but they do create new opportunities for algorithmic oppression -- automated conversion therapy, predictive policing, abuse of bureacracy in "smart cities," etc. Given the framing of corporate knowledge graphs, these projects are poised to create facilitating technologies (that the info conglomerates write about needing themselves) for a new kind of interoperable corporate data infrastructure, where a gradient of public to private information is traded between "open" and quasi-proprietary knowledge graphs to power derivative platforms and services.

When approaching "AI" from the perspective of the semantic web and knowledge graphs, it becomes apparent that the new generation of #LLMs are intended to serve as interfaces to knowledge graphs. These "augmented language models" are joint systems that combine a language model as a means of interacting with some underlying knowledge graph, integrated in multiple places in the computing ecosystem: eg. mobile apps, assistants, search, and enterprise platforms. I concretize and extend prior criticism about the capacity for LLMs to concentrate power by capturing access to information in increasingly isolated platforms and expand surveillance by creating the demand for extended personalized data graphs across multiple systems from home surveillance to your workplace, medical, and governmental data.

I pose Vulgar Linked Data as an alternative to the infrastructural pattern I call the Cloud Orthodoxy: rather than platforms operated by an informational priesthood, reorienting our public infrastructure efforts to support vernacular expression across heterogeneous #p2p mediums. This piece extends a prior work of mine: Decentralized Infrastructure for (Neuro)science) which has more complete draft of what that might look like.

(I don't think you can pre-write threads on masto, so i'll post some thoughts as I write them under this) /1

#SurveillanceGraphs

jonny,
@jonny@neuromatch.social avatar

The essential feature of knowledge graphs that makes them coproductive with surveillance capitalism is how they allow for a much more fluid means of data integration. Most contemporary corporations are data corporations, and their operation increasingly requires integrating far-flung and heterogeneous datasets, often stitched together from decades of acquisitions. While they are of course not universal, and there is again a large amount of variation in their deployment and use, knowledge graphs power many of the largest information conglomerates. The graph structure of KGs as well as the semantic constraints that can be imposed by controlled ontologies and schemas make them particularly well-suited to the sprawling data conglomerate that typifies contemporary surveillance capitalism.

I give a case study in RELX, parent of Elsevier and LexisNexis, among others, which is relatively explicit about how it operates as a gigantic graph of data with various overlay platforms.

/3

#SurveillanceGraphs

In contrast, merging graphs is more straightforward - the data is just triplets, so in an idealized case9 it is possible to just concatenate them and remove duplicates (eg. for a short example, see [35, 36]). The graph can be operated on locally, with more global coordination provided by ontologies and schemas, which themselves have a graph structure [37]. Discrepancies between graphlike schema can be resolved by, you guessed it, making more graph to describe the links and transformations between them. Long-range operations between data are part of the basic structure of a graph - just traverse nodes and edges until you get to where you need to go - and the semantic structure of the graph provides additional constraints to that traversal. Again, a technical description is out of scope here, graphs are not magic, but they are well-suited to merging, modifying, and analyzing large quantities of heterogeneous data10. So if you are a data broker, and you just made a hostile acquisition of another data broker who has additional surveillance information to fill the profiles of the people in your existing dataset, you can just stitch those new properties on like a fifth arm on your nightmarish data Frankenstein.
What does this look like in practice? While in a bygone era Elsevier was merely a rentier holding publicly funded research hostage for profit, its parent company RELX is paradigmatic of the transformation of a more traditional information rentier into a sprawling, multimodal surveillance conglomerate (see [38]). RELX proudly describes itself as a gigantic haunted graph of data: Technology at RELX involves creating actionable insights from big data – large volumes of data in different formats being ingested at high speeds. We take this high-quality data from thousands of sources in varying formats – both structured and unstructured. We then extract the data points from the content, link the data points and enrich them to make it analysable. Finally, we apply advanced statistics and algorithms, such as machine learning and natural language processing, to provide professional customers with the actionable insights they need to do their jobs. We are continually building new products and data and technology platforms, re-using approaches and technologies across the company to create platforms that are reliable, scalable and secure. Even though we serve different segments with different content sets, the nature of the problems solved and the way we apply technology has commonalities across the company. [39] Alt text for figure: https://jon-e.net/surveillance-graphs/#in-its-2022-annual-report-relx-describes-its-business-model-as-i
Text from: https://jon-e.net/surveillance-graphs/#derivative-platforms-beget-derivative-platforms-as-each-expands Derivative platforms beget derivative platforms, as each expands the surface of dependence and provides new opportunities for data to capture. Its integration into clinical systems by way of reference material is growing to include electronic health record (EHR) systems, and they are “developing clinical decision support applications […] leveraging [their] proprietary health graph” [39]. Similarly, their integration into Apple’s watchOS to track medications indicates their interest in directly tracking personal medical data. That’s all within biomedical sciences, but RELX’s risk division also provides “comprehensive data, analytics, and decision tools for […] life insurance carriers” [39], so while we will never have the kind of external visibility into its infrastructure to say for certain, it’s not difficult to imagine combining its diverse biomedical knowledge graph with personal medical information in order to sell risk-assessment services to health and life insurance companies. LexisNexis has personal data enough to serve as an “integral part” of the United States Immigration and Customs Enforcement’s (ICE) arrest and deportation program [42, 43], including dragnet location data [44], driving behavior data from internet-connected cars [45], and payment and credit data as just a small sample from its large catalogue [46] [...]

  • All
  • Subscribed
  • Moderated
  • Favorites
  • JUstTest
  • everett
  • magazineikmin
  • mdbf
  • thenastyranch
  • khanakhh
  • rosin
  • Youngstown
  • ethstaker
  • slotface
  • modclub
  • kavyap
  • DreamBathrooms
  • Durango
  • provamag3
  • ngwrru68w68
  • InstantRegret
  • tacticalgear
  • GTA5RPClips
  • cubers
  • normalnudes
  • osvaldo12
  • tester
  • anitta
  • cisconetworking
  • megavids
  • Leos
  • lostlight
  • All magazines