so we have been batting around the idea of some kinda paper bot for awhile re:... - Random

jonny, 5 months ago (edited 5 months ago)

so we have been batting around the idea of some kinda paper bot for awhile re: the question "how do we track discussions around scholarly work" and I am starting to think this paper-feeds project is the way to do it.

So say it is an AP instance and it has one primary bot user, you follow it and it follows you back. When you make a post with something that resolves to a DOI, then that post is linked to that work. Any hashtags used in that post are added to that papers keywords (assuming some basic moderation and word ban lists). Then keyword feeds are also represented as AP actors that can be followed and make a post per paper. I wonder if we can spoof the "in reply to" field to present all those posts as being replies to that paper.

So say the bot also has some simple microsyntax for linking your account to an ORCID - either directly in a profile field, or by @'ing the bot and checking a rel=me, or hell even oauth. Then you could also relate when the authors of given works talk about other works and use that as another proximity measure. Then you could make an author RSS feed/AP actor that is just the works someone publishes and optionally that they talk mention - so eg I could make an aggregate feed for the papers my friends are reading.

Then you could have instances of this feed generator follow one another and broadcast aggregated similarity information at a paper level not linked to personal information, and also opt-in info like the fedi account <-> ORCID link. Since youre on AP already you basically get that for free.

Thinking about what would be useful for social discovery of scholarly works, and there are a lot of really interesting ideas once you start actually yno doing it starting from a place of not having a product to sell or a platform to run so you avoid some of the scale and liability probs.

Edit: prior post here: https://neuromatch.social/@jonny/111688727690129033
And repo here: https://github.com/sneakers-the-rat/paper-feeds/
And ill start tagging these with #PaperFeeds but that last post has too many interactions to edit now

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ smallcircles

Image

Image alternative text

hochstenbach, 5 months ago

@jonny I am working in a Mellon funded project on specs that do things like that. See https://www.eventnotifications.net/ The bot we created is called ‘Koreographeye’ and explained in code4lib using a related use-case to yours: https://journal.code4lib.org/articles/17823

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

hochstenbach, 5 months ago

@jonny the bot is speced here https://mellonscholarlycommunication.github.io/spec-orchestrator/ the whole architecture is fully decentralized. The idea is that every researcher has an own data pod with an LDN Inbox speaking AS2.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

jonny, 5 months ago

@hochstenbach very interesting, lemme take a read

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

jonny, 5 months ago

@hochstenbach ah yes an ever-elusive SOLID project. always good to find one. I am gonna try to boot up a node of this demo project now. The question I always have with SOLID-like things is whether everyone is supposed to have their own domain and run their own pod, and it seems like in this code4lib paper you are answering that with a 'yes.' The interop with other systems with action vocabularies is very much like something i have submitted grants on before and intend to build, so v interesting to see your take on it. lemme see if i can get this thing to run

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

hochstenbach, 5 months ago

@jonny Well, in the paper it is 'yes’ because in PHD research one tries to go to the limit ;) “what is the most decentral architecture one can think of”. In reality we see in the project a huge opportunity for universities to host these pods for researchers. The domain name is there because if you want to switch institutions you move the data to a new pod but keep the URL-s working. In the paper I used SOLID because it provides out of the box LDN support.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

jonny, 5 months ago

@hochstenbach yes yes - i have ultimately come to the conclusion that the LDP needs to be #P2P in order for it to do the things it wants to do. Fluid ontologies indexed by DNS have basically never worked, and so most of RDF world just treats them like non-dereferencing IRIs, which is sad - it's just intrinsically fragile, and really the only #LinkedData vocabularies you can really rely on still being there are the ones that w3c hosts because they're the only ones that really care about URLs staying the same forever.

I really like the design of what you're working on here - just operating on files is great, rules syntax took a bit to read but makes sense and seems amenable to interface design, and i especially like the plugin approach to 'just pull and push from anywhere'. The problem i have with thinking about the longevity or deployability of things like this are not really intrinsic to your project at all, but about the imo naive assumptions that LD makes about DNS: it is genuinely expensive and complicated to put something on the 'net for your average bear (timbl said as much). All the (necessary) placeholder example.com's in the demos are a reflection of that - since of course the rule isn't actually at example.com, presumably it isn't actually dereferencing there, and so it becomes just an IRI slug that is simultaneously necessarily bound to a URL but can't use it.

my longest lasting question in studying LD is "where is #SOLID?" I have tried and failed dozens of times to just run something from the project and have never managed to do it and have never heard of someone actually using it day-to-day. millions of people run bittorrent clients though, so it's not just an intrinsic "people don't want to run software" problem. The barrier to 'how do i actually put my stuff online' has to be a lot lower than 'rent a domain, manage a bunch of paths, and run an always-on server forever'.

The federated approach like the fedi and eg. institutions hosting pods is promising for many things, but it is sort of a nonstarter for anything with arbitrary clearweb user-generated content for liability and security reasons, so I think that would be super dope for things like notifications for scholarly work, but I think institutions will balk at an eventing framework that requires arbitrary code to run on an institutionally managed server, and especially can result in arbitrary content being available on their domain.

I think we should take advantage of existing infrastructure though - eg. i like how you're using npm to host and version vocabularies, and that federated infrastructure could (and imo should) serve some backstop role of preserving availability and providing bootstrap entrypoints for a p2p swarm. I think that has to look like using different protocols than HTTP though, and following along that line you pretty rapidly get to needing social infrastructure at the base in order to have comprehensible namespacing (rather than a bunch of long hashes, even with some naming system patched over the top of it, as IPNS demonstrates doesn't really work that well). I think your going towards integration with email and masto and whatnot from a local client is a nice set of steps towards personal web tooling, and i'm gonna keep this bookmarked for when i get closer to working on something related :)

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ smallcircles, markhughes

happyborg, 5 months ago

@jonny
FYI I worked with the help of the #Solid community including Timbl, to demonstrate that the Solid protocol, or at least a useful subset could be implemented and used on a #p2p data and comms network.

I believe the issues with centralised DNS and server based hosting (self hosting included) are not sufficient to meet the goals of Solid which include decentralization and self ownership of data.

I was able to run existing Solid apps running on p2p #SafeNetwork.

@hochstenbach

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ jonny, markhughes

happyborg, 5 months ago

@jonny @hochstenbach

I have a video of a presentation from 2018, slides etc if that demo if either of you are interested.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

hochstenbach, 5 months ago

@happyborg @jonny sure!

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

happyborg, 5 months ago

@hochstenbach
All the info and links are collected in a post on the #SafeNetwork forum, including the presentation video, slides etc: https://safenetforum.org/t/devcon-talk-supercharging-the-safe-network-with-project-solid/23081?u=happybeing

The demos no longer work on Safe Network as the APIs have changed but the key elements demonstrated were:

hosting #Solid apps on a decentralised #p2p network with just one library swapped

using p2p storage via LDP API in standard Solid apps

no dependence on ephemeral centralised DNS, web server or serverside code
@jonny #LinkedData

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ markhughes, jonny, happyborg

jonny, 5 months ago

@happyborg @hochstenbach HELL yes. this is what i have been looking for for literal years.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ happyborg, markhughes

jonny, 5 months ago

@happyborg @hochstenbach is it ok if i link to this in docs?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

happyborg, 5 months ago

@jonny sure and I'd be interested to see where it fits in your docs etc.

@hochstenbach

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

jonny, 5 months ago

@happyborg it would be on this currently dramatically blank page as a section with references to any "citable" work and a footnote link to this post to thank you for sharing it with me. plz don't mind the domain:
https://piracy.solutions/docs/comparison/ld/solid.html

which has yet to be filled in since this is basically a long prelude set of notes at the moment in preparation for some long work, but also because i have yet to find a satisfying entrypoint or even a way of getting an overview of where solid is at the moment (likely mostly my lack of time and the fact that there is a lot of ground to cover)

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

happyborg, 5 months ago

@jonny I have no problem with the domain. Good luck with the mammoth (!) task and let's keep in touch.

I got deep into Solid but it's been a while so I'm not bang up to date. There was a good forum and chat with v helpful community. Same for Safe Network.

Safe Network is finally coming to fruition but has been trimmed down to get there, so not quite ready for Solid yet but still a big step towards data ownership, security and privacy for everyone: without gatekeepers, servers, DNS etc

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

jonny, 5 months ago

@happyborg @hochstenbach sorry for triple-posting, it is late so i am sort of subdued and trying to log off but i cannot tell you how excited i am to check out your work, i have been wondering 'where is solid' and 'why isn't solid p2p' since the moment i read about it.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ markhughes

happyborg, 5 months ago

@jonny no worries, you've no idea how good it is to hear someone else getting excited about this, especially after so long.

@hochstenbach

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

hochstenbach, 5 months ago

@jonny The Event Notifications protocol doesn't demand that one implements this using on Solid Pods. It could probably be done using a Mastodon instance too with some tweaks.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

jonny, 5 months ago

@hochstenbach we definitely are interested in hacking our instance to be able to be a more general LDN/AP source. if you want to try and make masto into an Event Notifications emitter, hmu, we would be very happy to try that out on our instance <3

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

julian, 5 months ago

@jonny Potentially pertinent re ORCID: it's possible to add various profile links to one's ORCID record, but they're a bit difficult to access programmatically.

It seems that when they built their site, they bet on one of those 2010s client side JavaScript frameworks that don't let you serve web pages with information already on them. So to access an ORCID record's data, you can't just fetch the page, you also need to run a virtual browser and execute their JavaScript: https://github.com/ORCID/ORCID-Source/issues/6668

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

jonny, 5 months ago

@julian oh lovely. Well i knew it would just be a matter of time before I had to bust out a webdriver. Thankfully I have had plenty of experience with these things from other fun crawling projects

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

jonny, 5 months ago

@gbilder @mpe sorry to pester, as before tell me to buzz off if this is annoying - do yall have any examples of using the events API to do URL -> DOI resolution? Ive hunted around the docs a bit but I can only find usages of the URL at the level of a domain rather than a specific page. I have asked around and read y'alls docs on URL =/= DOI and the sense I get is that we will need a hybrid heuristic approach - get the page and try and parse a DOI from in-page metadata with a handful of methods, and also try to use events API if possible.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Add comment