Activity - These knowledge graph powered platform giants represent the capture of...

jonny, 1 year ago

These knowledge graph powered platform giants represent the capture of information infrastructures broadly, but what would public infrastructure look like? The notion of openness is complicated when it comes to the business models of information conglomerates. In adjacent domains of open source, peer production, and open standards, "openness" is used both to challenge and to reinforce systems of informational dominance.

In particular, Google's acquisition of the peer-production platform Freebase was the precipitating event that ushered in the era of knowledge graphs in the first place, and its tight relationship with its successor, Wikidata, is instructive of the role of openness: public information is crowdsourced to farm the commons and repackaged in derivative platforms.

The information conglomerates in multiple places have expressed a desire for "neutral" exchange schemas and technologies to be able to rent, trade, and otherwise link their proprietary schemas to make a gradient of "factual" public information to contextual information like how a particular company operates, through to personal information often obtained through surveillance. It looks like the NIH and the NSF are set to serve that role for several domains...

/4

#SurveillanceGraphs

text from https://jon-e.net/surveillance-graphs/#%E2%80%9Cpeer-production%E2%80%9D-models-a-more-generic-term-for-public-collabor “Peer production” models, a more generic term for public collaboration that includes FOSS, has similar discontents. The related term “crowdsource [footnote 13]” quite literally describes a patronizing means of harvesting free labor via some typically gamified platform. Wikipedia is perhaps the most well-known example of peer production [footnote 14], and it too struggles with its position as a resource to be harvested by information conglomerates. In 2015, the increasing prevalence of Google’s information boxes caused a substantial decline in Wikipedia page views [68, 69] as its information was harvested into Google’s knowledge graph, and a “will she, won’t she” search engine arguably intended to avoid dependence on Google was at the heart of its 2014-2016 leadership crisis [70, 71]. While shuttering Freebase, Google donated a substantial amount of money to kick-start its successor [72] Wikidata, presumably as a means of crowdsourcing the curation of its knowledge graph [73, 74, 75]. [footnote 13]: For critical work on crowdsourcing in the context of “open science,” see [229], and in the semantic web see [230] [footnote 14]: I have written about the peculiar structure of Wikipedia among wikis previously, section 3.4.1 - “The Wiki Way” [1]
Clearly, on its own, mere “openness” is no guarantee of virtue, and socio-technological systems must always be evaluated in their broader context: what is open? why? who benefits? Open source, open standards, and peer production models do not inherently challenge the rent-seeking behavior of information conglomerates, but can instead facilitate it. In particular, the maintainers of corporate knowledge graphs want to reduce labor duplication by making use of some public knowledge graph that they can then “add value” to with shades of proprietary and personal data (emphasis mine): [blockquote]: In a case like IBM clients, who build their own custom knowledge graphs, the clients are not expected to tell the graph about basic knowledge. For example, a cancer researcher is not going to teach the knowledge graph that skin is a form of tissue, or that St. Jude is a hospital in Memphis, Tennessee. This is known as “general knowledge,” captured in a general knowledge graph. The next level of information is knowledge that is well known to anybody in the domain—for example, carcinoma is a form of cancer or NHL more often stands for non-Hodgkin lymphoma than National Hockey League in some contexts it may still mean that—say, in the patient record of an NHL player). The client should need to input only the private and confidential knowledge or any knowledge that the system does not yet know. [26]
Having such standards be under the stewardship of ostensibly neutral and open third-parties provides cover for powerful actors exerting their influence and helps overcome the initial energy barrier to realizing network effects from their broad use [83, 84]. Peter Mika, the director of Semantic Search at Yahoo Labs, describes this need for third-party intervention in domain-specific standards: A natural next step for Knowledge Graphs is to extend beyond the boundaries of organisations, connecting data assets of companies along business value chains. This process is still at an early stage, and there is a need for trade associations or industry-specific standards organisations to step in, especially when it comes to developing shared entity identifier schemes. [85] As with search, we should be particularly wary of information infrastructures that are technically open [footnote 17] but embed design logics that preserve the hegemony of the organizations that have the resources to make use of them. The existing organization of industrial knowledge graphs as chimeric “data + compute” models give a hint at what we might look for in public knowledge graphs: the data is open, but to make use of it we have to rely on some proprietary algorithm or cloud infrastructure. [footnote 17]: Go ahead, try and make your own web crawler to compete with Google - all the information is just out there in public on the open web!

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...