hrefna, to random
@hrefna@hachyderm.io avatar

In which I think out loud for a bit about , it's place, and what compromises need to be made:

First, the vision of JSON-LD is essentially this (and yes I will use entirely references, this is me talking to myself after all :p ):

A request comes in to Handler Walter (HW) for a mission for C4-621 (Raven) to assault a dam and blow up some generators along with G4 and G5, troops under G1 Michigan, part of a squad called the "Red Guns"

G1 gives you the "lucky" callsign G13

1/

hrefna,
@hrefna@hachyderm.io avatar

Note that these both contain mostly the same information but in very different formats.

This is a big part of the problem that #JsonLD is supposed to help with.

So #JsonLD provides a bunch of algorithms that let you figure out what all of these things map to and then compare them against each other.

Basically each party provides a context that shows a mapping of the various keys to IRIs.

So now they become:

ac://AllMind/mercenary/designation/g13…
ac://AllMind/…/callsign/Raven

3/

hrefna,
@hrefna@hachyderm.io avatar

I have at most about a few minutes of processing time before people will expect to see these messages and messages come in a variety of formats oh and Commander Michigan can't seem to use the same one twice and…

So now I have not just a blob of data I may or may not be able to extract meaning from, I have a lot of data from a lot of different sources.

So we get to the algorithms.

6/

hrefna,
@hrefna@hachyderm.io avatar

Back to the algorithms.

The #JsonLD algorithms have, uh, let's just call it essentially an unbounded stack memory requirement as well as some pretty ugly computational complexity.

There's also a lot of boilerplate that needs to be parsed each and every time when dealing with an online streaming system. The cost of this adds up relatively quickly.

Oh, and this isn't even getting into the security quagmire.

These are Serious Challenges and should not be dismissed.

8/

hrefna,
@hrefna@hachyderm.io avatar

Poor developers often choose one or more of:

  1. Deciding "frak it" and just parsing in #Json
  2. Using a library that "does #JsonLD" and later discovering that some of these modern specs only sort of comply with the JSON-LD spec, or use subsets already, so you end up with a heavyweight library that's slow, memory inefficient, and difficult to use.

If such a library even exists on your platform.
3. Adding a set of directives that are sort of JSON-LDish on top of a JSON library.

9/

hrefna,
@hrefna@hachyderm.io avatar

The trick with (3) is that as you decide to go in that direction you also lose a lot of the functionality of #JsonLD, which a lot of these specs then depend on for building any sort of interoperability. So now you have a headache dealing with that world as well.

10/

hrefna,
@hrefna@hachyderm.io avatar

If you do these things (and probably a few more steps that I'm forgetting, it's after 1 AM here) you end up with a few things that make parsing this much easier.

Basically, you disallow those parts of JSON-LD that make parsing complicated

Now:

  1. You know that if a context is included that nothing it says will change the other contexts you have. This makes it trivial to cache and reuse

  2. We've removed most of the #JsonLD specific conceits, so most JSON tools now work out of the box

14/

hrefna, to random
@hrefna@hachyderm.io avatar

I find this idea, one I run across here and there among certain groups of functional programmers, that "a good type system means you don't need unit tests" (and/or you can just use fuzzing/whatever) to be very weird.

On the one hand yes a good type system obviates the need for a great many of the basic types of unit tests, but it doesn't remotely prevent logic errors

Yes, your cute dependent type here is great, but it doesn't replace the need to know whether you calculated the mod correctly

hrefna,
@hrefna@hachyderm.io avatar

On the other hand this doesn't mean that a good type system is useless (cough talking to the javascript programmers here, or at least the ones that write specs cough).

Like you can remove a huge amount of the error handling—and thus cyclomatic complexity and thus unit testing—in #JsonLD by just… introducing an IRI type. Even just as an intermediate type that you get rid of in later stages.

With a robust system you can get very advanced. But it doesn't eliminate the need for unit testing

hrefna, to fediverse
@hrefna@hachyderm.io avatar

Took a step back on my #ActivityPub toy project and started mostly over with a clean template now that I know a heck of a lot more about #OCaml and have a clearer vision. So now things like my actor tools are not embedded inside of my JSON-LD experiment.

I also rewrote the actor component in the process now that I know more about how to manage around the type system there. Seems to work so far and is a lot cleaner, relatively pleased with how it is progressing.

hrefna, (edited )
@hrefna@hachyderm.io avatar

Some notes so far:

  • Migrating the http components from dream to cohttp-eio. This may get moved back in the future, but while dream is nice its development largely seems to have stalled and it hasn't seemed to be working on eio integration in quite some time

  • My strategy for #JsonLD is going to do a "meet in the middle" approach. I'm following exactly none of the official JSON-LD algorithms and it won't play nicely with the full spec, but it should work pretty well and efficiently with AS2/AP

schizanon, to webdev

I've been hearing people talk about #jsonld a lot, so I finally Googled it. I must be missing something because I don't see the point. Best I can tell it's just data with links to schema.org to tell you what type of data it is. It's just a less powerful Swagger doc? #webDev #activityPub

hrefna, to random
@hrefna@hachyderm.io avatar

Doing some writing on and realizing that JSON-LD is really doing two things.

  1. It is providing a mechanism for converting back and forth from RDF.
  2. It provides a way of describing a syntactic presentation of the data that is distinct from the semantic interpretation. Technically it provides several different syntatic presentations that lead to the same semantic interpretation.

But for most protocols (1) isn't important and (2) is usually standardized on a single presentation.

hrefna,
@hrefna@hachyderm.io avatar

When a protocol supports multiple presentations it is usually because there is a fundamentally different requirement that they are trying to hit.

For example, supporting JSON and protobuf.

Technically #JsonLD supports this same form of varied presentation, but it also supports a functionally infinite number of possible encodings that come out to the exact same thing

If your context includes something like "toot : http://joinmastodon.org/ns#" it could just as easily substitute "foo" for "toot"

hrefna,
@hrefna@hachyderm.io avatar

This is part of why concepts like Framing ( https://www.w3.org/TR/json-ld11-framing/ ) exist, but these sorts of tools come with substantive performance penalties and increases in complexity.

But we can absolutely 100% address this in something like a .

I don't even think it would be especially hard.

Something I'm chewing on.

hrefna,
@hrefna@hachyderm.io avatar

Putting this another way:

I don't know of many protocols that would tell you:

"Oh, it's perfectly acceptable for you to send a completely different protobuf, so long as you send along with it instructions on how to encode it into a different protobuf that we can use."

But that's essentially what using #JsonLD is built to do.

But I don't think #ActivityPub—or any other protocol that uses JSON-LD—needs to play by those rules.

We do so because it is the default, but we could make life easier.

hrefna, to random
@hrefna@hachyderm.io avatar

Breaking it down, going for rough estimates (I haven't formally proved any of these):

Most of the algorithms in #JsonLD are between Θ(n lg n) and a few are O(n^3), sometimes with the controls for such put in the hands of the person writing the JSON-LD object and not in the hands of the person writing the server

How you get there is, for instance, in the context processing algorithm:

for each item in the local context (5) -> for each kv pair (5.13) -> strings and invokes context processing…

hrefna,
@hrefna@hachyderm.io avatar

The problem with this is that the control on what to do is in the hands of the entity calling the API: the only way forward for the server to not encounter these scenarios is to shed features, and even then there are some you just can't shed.

I've talked about the challenges with this, but it also gives a potential solution:

What if the Server maintains a list of valid context objects and anyone who communicates with the server is responsible for formatting accordingly?

#JsonLD

hrefna,
@hrefna@hachyderm.io avatar

Side note: I don't actually think this is strictly worthwhile. To me this is an academic exercise that may yield fruit, but it's also a low-stakes game for me.

But there are people who are true believers. If that's you then a lot of people would love to see serious work in improving the performance and safety of these, and it could very well help get people on board with the vision.

I don't think that's likely, mind, but without improvements here I don't think it is even possible.

#JsonLD

hrefna, to random
@hrefna@hachyderm.io avatar

This just in, IRIs are now defined by the regex:

[^:]+://.*

Well that makes life easier! If that's all it takes then all of these URI implementations that worry about whether it has an "authority" or a "protocol" are just overly complicated!

#JsonLD

hrefna,
@hrefna@hachyderm.io avatar

Sarcasm aside, this is one of the things I just do not get about this set of algorithms.

They could have said:

"if value is an IRI, a blank node identifier, or a compact IRI" for (6) and (6.1) could have said "if it is an IRI or blank node identifier then return it as is"

But instead they have baked into the official algorithm a series of steps around string parsing to ascertain type in a halfways manner?

#JsonLD

Cecily Strong Snl GIF

hrefna, to random
@hrefna@hachyderm.io avatar

I know I keep ranting about this, but the #JsonLD algorithms are just… so weird.

"these four required parameters, which can be null, and these optional parameters, which all have defaults and that aren't actually parameters…"

WHY ARE THEY WRITTEN THIS WAY

hrefna, to random
@hrefna@hachyderm.io avatar

It just really continues to feel like #w3c is trying push #JsonLD into everything and the kitchen sink… regardless of if it fits and doing a "beat to fit paint to match" when it doesn't, the JSON-LD working groups and such are busy trying to figure out how to do a #YamlLD, and the rest of us are trying to figure out how to do this practically in production in a way that doesn't ignore all of JSON-LD, or just doing "JSON + a weird context obj."

The situation feels untenable on a few levels.

hrefna, to fediverse
@hrefna@hachyderm.io avatar

Some thoughts on solutions to the problems of #JsonLD in #ActivityPub, understanding that these work in tension with one another: so fixing one is likely to increase the challenge in some other area, requiring some balancing or other strategies.

Following on from: https://hachyderm.io/@hrefna/110945724907576079

  1. We can build a "processing network" the first time we process a context for a given message class. Similar conceptually to a sorting network: https://en.wikipedia.org/wiki/Sorting_network

(cont)

1/

hrefna, to fediverse
@hrefna@hachyderm.io avatar

One of the problems of #JsonLD for #ActivityPub style systems that I keep chewing on is that JSON-LD has, at its core, something akin to dependency resolution. Without the safety factors around that.

Ex: If a file F or a context A loads a remote context B you are supposed to go out, fetch that, cache the result, and use that context.

But there are several problems with this:

  1. Remote contexts are not immutable. Loading a context today does not mean that it will be the same tomorrow.

1/

hrefna,
@hrefna@hachyderm.io avatar
  1. Contexts basically are processing directives for the #JsonLD request. There are limits around it to prevent some pathological use cases, but in general these processing directives indicate how to process the file.

This is like if Jackson serialized a copy of the object you are supposed to load a JSON file into and attached it to each JSON request.

The number of potential security considerations around this are huge.

Especially when you consider the lack of "ownership" in URIs.

3/

Jeremiah, to random
@Jeremiah@alpaca.gold avatar

There are not enough good “monkey see, monkey do” examples for JSON-LD, even in the spec.

I still don’t ever feel like I’m doing something correctly with 100% certainty and I’ve been designing web APIs in other formats for 15 years.

#JSONLD #APIs

  • All
  • Subscribed
  • Moderated
  • Favorites
  • JUstTest
  • ngwrru68w68
  • everett
  • InstantRegret
  • magazineikmin
  • thenastyranch
  • rosin
  • GTA5RPClips
  • Durango
  • Youngstown
  • slotface
  • khanakhh
  • kavyap
  • DreamBathrooms
  • provamag3
  • tacticalgear
  • osvaldo12
  • tester
  • cubers
  • cisconetworking
  • mdbf
  • ethstaker
  • modclub
  • Leos
  • anitta
  • normalnudes
  • megavids
  • lostlight
  • All magazines