In which I think out loud for a bit about #JsonLD, it's place, and what compromises need to be made:
First, the vision of JSON-LD is essentially this (and yes I will use entirely #ArmoredCore6 references, this is me talking to myself after all :p ):
A request comes in to Handler Walter (HW) for a mission for C4-621 (Raven) to assault a dam and blow up some generators along with G4 and G5, troops under G1 Michigan, part of a squad called the "Red Guns"
I have at most about a few minutes of processing time before people will expect to see these messages and messages come in a variety of formats oh and Commander Michigan can't seem to use the same one twice and…
So now I have not just a blob of data I may or may not be able to extract meaning from, I have a lot of data from a lot of different sources.
The #JsonLD algorithms have, uh, let's just call it essentially an unbounded stack memory requirement as well as some pretty ugly computational complexity.
There's also a lot of boilerplate that needs to be parsed each and every time when dealing with an online streaming system. The cost of this adds up relatively quickly.
Oh, and this isn't even getting into the security quagmire.
These are Serious Challenges and should not be dismissed.
Using a library that "does #JsonLD" and later discovering that some of these modern specs only sort of comply with the JSON-LD spec, or use subsets already, so you end up with a heavyweight library that's slow, memory inefficient, and difficult to use.
If such a library even exists on your platform.
3. Adding a set of directives that are sort of JSON-LDish on top of a JSON library.
The trick with (3) is that as you decide to go in that direction you also lose a lot of the functionality of #JsonLD, which a lot of these specs then depend on for building any sort of interoperability. So now you have a headache dealing with that world as well.
If you do these things (and probably a few more steps that I'm forgetting, it's after 1 AM here) you end up with a few things that make parsing this much easier.
Basically, you disallow those parts of JSON-LD that make parsing complicated
Now:
You know that if a context is included that nothing it says will change the other contexts you have. This makes it trivial to cache and reuse
We've removed most of the #JsonLD specific conceits, so most JSON tools now work out of the box
I find this idea, one I run across here and there among certain groups of functional programmers, that "a good type system means you don't need unit tests" (and/or you can just use fuzzing/whatever) to be very weird.
On the one hand yes a good type system obviates the need for a great many of the basic types of unit tests, but it doesn't remotely prevent logic errors
Yes, your cute dependent type here is great, but it doesn't replace the need to know whether you calculated the mod correctly
On the other hand this doesn't mean that a good type system is useless (cough talking to the javascript programmers here, or at least the ones that write specs cough).
Like you can remove a huge amount of the error handling—and thus cyclomatic complexity and thus unit testing—in #JsonLD by just… introducing an IRI type. Even just as an intermediate type that you get rid of in later stages.
With a robust system you can get very advanced. But it doesn't eliminate the need for unit testing
Took a step back on my #ActivityPub toy project and started mostly over with a clean template now that I know a heck of a lot more about #OCaml and have a clearer vision. So now things like my actor tools are not embedded inside of my JSON-LD experiment.
I also rewrote the actor component in the process now that I know more about how to manage around the type system there. Seems to work so far and is a lot cleaner, relatively pleased with how it is progressing.
Migrating the http components from dream to cohttp-eio. This may get moved back in the future, but while dream is nice its development largely seems to have stalled and it hasn't seemed to be working on eio integration in quite some time
My strategy for #JsonLD is going to do a "meet in the middle" approach. I'm following exactly none of the official JSON-LD algorithms and it won't play nicely with the full spec, but it should work pretty well and efficiently with AS2/AP
I've been hearing people talk about #jsonld a lot, so I finally Googled it. I must be missing something because I don't see the point. Best I can tell it's just data with links to schema.org to tell you what type of data it is. It's just a less powerful Swagger doc? #webDev#activityPub
Doing some writing on #JsonLD and realizing that JSON-LD is really doing two things.
It is providing a mechanism for converting back and forth from RDF.
It provides a way of describing a syntactic presentation of the data that is distinct from the semantic interpretation. Technically it provides several different syntatic presentations that lead to the same semantic interpretation.
But for most protocols (1) isn't important and (2) is usually standardized on a single presentation.
When a protocol supports multiple presentations it is usually because there is a fundamentally different requirement that they are trying to hit.
For example, supporting JSON and protobuf.
Technically #JsonLD supports this same form of varied presentation, but it also supports a functionally infinite number of possible encodings that come out to the exact same thing
If your context includes something like "toot : http://joinmastodon.org/ns#" it could just as easily substitute "foo" for "toot"
This is part of why concepts like #JsonLD Framing ( https://www.w3.org/TR/json-ld11-framing/ ) exist, but these sorts of tools come with substantive performance penalties and increases in complexity.
But we can absolutely 100% address this in something like a #FEP.
I don't know of many protocols that would tell you:
"Oh, it's perfectly acceptable for you to send a completely different protobuf, so long as you send along with it instructions on how to encode it into a different protobuf that we can use."
But that's essentially what using #JsonLD is built to do.
But I don't think #ActivityPub—or any other protocol that uses JSON-LD—needs to play by those rules.
We do so because it is the default, but we could make life easier.
Breaking it down, going for rough estimates (I haven't formally proved any of these):
Most of the algorithms in #JsonLD are between Θ(n lg n) and a few are O(n^3), sometimes with the controls for such put in the hands of the person writing the JSON-LD object and not in the hands of the person writing the server
How you get there is, for instance, in the context processing algorithm:
for each item in the local context (5) -> for each kv pair (5.13) -> strings and invokes context processing…
The problem with this is that the control on what to do is in the hands of the entity calling the API: the only way forward for the server to not encounter these scenarios is to shed features, and even then there are some you just can't shed.
I've talked about the challenges with this, but it also gives a potential solution:
What if the Server maintains a list of valid context objects and anyone who communicates with the server is responsible for formatting accordingly?
Side note: I don't actually think this is strictly worthwhile. To me this is an academic exercise that may yield fruit, but it's also a low-stakes game for me.
But there are people who are true believers. If that's you then a lot of people would love to see serious work in improving the performance and safety of these, and it could very well help get people on board with the vision.
I don't think that's likely, mind, but without improvements here I don't think it is even possible.
Well that makes life easier! If that's all it takes then all of these URI implementations that worry about whether it has an "authority" or a "protocol" are just overly complicated!
Sarcasm aside, this is one of the things I just do not get about this set of algorithms.
They could have said:
"if value is an IRI, a blank node identifier, or a compact IRI" for (6) and (6.1) could have said "if it is an IRI or blank node identifier then return it as is"
But instead they have baked into the official algorithm a series of steps around string parsing to ascertain type in a halfways manner?
It just really continues to feel like #w3c is trying push #JsonLD into everything and the kitchen sink… regardless of if it fits and doing a "beat to fit paint to match" when it doesn't, the JSON-LD working groups and such are busy trying to figure out how to do a #YamlLD, and the rest of us are trying to figure out how to do this practically in production in a way that doesn't ignore all of JSON-LD, or just doing "JSON + a weird context obj."
Some thoughts on solutions to the problems of #JsonLD in #ActivityPub, understanding that these work in tension with one another: so fixing one is likely to increase the challenge in some other area, requiring some balancing or other strategies.
We can build a "processing network" the first time we process a context for a given message class. Similar conceptually to a sorting network: https://en.wikipedia.org/wiki/Sorting_network
One of the problems of #JsonLD for #ActivityPub style systems that I keep chewing on is that JSON-LD has, at its core, something akin to dependency resolution. Without the safety factors around that.
Ex: If a file F or a context A loads a remote context B you are supposed to go out, fetch that, cache the result, and use that context.
But there are several problems with this:
Remote contexts are not immutable. Loading a context today does not mean that it will be the same tomorrow.
Contexts basically are processing directives for the #JsonLD request. There are limits around it to prevent some pathological use cases, but in general these processing directives indicate how to process the file.
This is like if Jackson serialized a copy of the object you are supposed to load a JSON file into and attached it to each JSON request.
The number of potential security considerations around this are huge.
Especially when you consider the lack of "ownership" in URIs.