jonny, (edited )
@jonny@neuromatch.social avatar

Hello fedi. i am trying to solve the "fetch all replies" problem once and for all that makes the fedi feel a lot more desolate and with a lot more reply guys in it than it should be. this is take two, where before i had it triggered by a button, but now i think it should happen on the server-side whenever you expand a post. can anyone help me out figure out how to make this more efficient by only fetching posts that the server doesn't already have? i am not sure what the best strategy would be, and if anyone with experience doing efficient rails and SQL stuff could give me some pointers that would be gr8. the patch is actually extremely simple it just needs a few nice things to make it not DDoS everyone.

https://github.com/NeuromatchAcademy/mastodon/pull/44

Issue that describes approach: https://github.com/NeuromatchAcademy/mastodon/issues/43
Wiki page: https://wiki.neuromatch.social/Fetch_All_Replies

bkil,

@jonny Someone mentioned this post on Matrix, I checked it on Friendica and it automatically fetches all replies and shows it as a comment thread. Welcome to technology from 2010 I guess, or maybe I have overlooked part of the original problem statement?.

jonny,
@jonny@neuromatch.social avatar

@bkil
If friendica does it, then great. Masto doesnt.

4censord,
@4censord@unfug.social avatar

@jonny have you considered running this as sidekiq jobs?
So either adding a new queue, or using the pull queue?

This would have the disadvantage that on high queue delay the fetching will be delayed.
But, it would also have the advantage of moving the load of fetching maybe many replies to a separate system, out of the main puma process.
This eases scaling and performance concerns.

On first thread expansion, it'd queue a job to fetch the first n replies to the thread.
When this job completes, and there are still more than n replies it should queue a new job (maybe even delayed to in a few seconds) to fetch the next n replies and repeat.

jonny,
@jonny@neuromatch.social avatar

@4censord
Thats what id like to do, but wasnt sure the best way to make it happen!

mike,
@mike@rebel-lion.uk avatar

@jonny Would love to see this fixed! I’m on my own instance and always have to navigate to the original page to see replies outside my follows.

Sorry I don’t know anything about mastodon internals to help!

jonny,
@jonny@neuromatch.social avatar

@mike doin it for the small instances ;)

efi,
@efi@chitter.xyz avatar

@jonny don't posts have some kind of id when you fetch them from a server?

jonny,
@jonny@neuromatch.social avatar

@efi yep! just need a little help with making an efficient query to check those against the local representations

efi,
@efi@chitter.xyz avatar

@jonny append the server name to the id, hash it and use that as the index, I think, would be the most efficient?
tho sql has very good inference, so maybe indexing both on server name, then secondary index for post id would work even better, not sure, it's been a decade since I did sql myself

jonny,
@jonny@neuromatch.social avatar

@efi masto makes an internal snowflake ID for posts already and stores the originating post URI as well, i will investigate tmrw what indexes exist between URI and ID, but presumably that data is already all there. mostly concerned with the implementation in rails and the caching system for debouncing/deduplicating requests.

smallcircles,
@smallcircles@social.coop avatar

@jonny

Hey super interesting, Jonny!

> and with a lot more reply guys in it than it should be

As it happens this morning I was coincidentally side-tracked on a self-assigned quest to put thoughts on "Reply Guy" anti-pattern together on #SocialCoding movement's forum.

Till now that turned into this wiki post (and related discussion thread): https://discuss.coding.social/t/wiki-for-sx-anti-pattern-reply-sigh-aka-reply-guy/530

smallcircles,
@smallcircles@social.coop avatar

@jonny

PS. I cross-ref'ed this great thread to the #SX matrix chatroom..

https://matrix.to/#/#socialcoding-foundations:matrix.org

can,
@can@haz.pink avatar

@jonny I unfortunately don’t have any knowledge into optimizing this, but I want to thank you for working on this issue. I think this is a very crucial feature that has been ignored for too long and will contribute greatly to the overall readiness of Mastodon. Because the current state clearly feels like a bug every time I open a post and I’m constantly viewing posts on the original instance, which is terrible UX. So, thanks!

jonny,
@jonny@neuromatch.social avatar

@can hopefully we get it to work!!! we already had it working in v1, but it was masto-to-masto only, this one should be more general and should blend more seamlessly into normal use on both web and apps.

18+ jonny,
@jonny@neuromatch.social avatar

Pitch

When expanding a post, the instance should fetch all replies from the host server.

This issue is to move more general conversation out of #8 because i think that's the wrong approach

Previous context:

Motivation

Two reasons:

  • It's an important discovery mechanism - people should be able to see the conversation around a post (within normal privacy settings, ie. we should not be trying to get followers-only posts, etc.)
  • The "a thousand of the same replies" problem is notorious on fedi and part of what makes it somewhat exhausting, and can quickly feel like brigading if a post becomes even moderately popular.

Approach

Concerns

Privacy has been discussed elsewhere - we will only be getting posts that wouldn't be filtered out by normal post visibility settings. ie. the user would be able to get them on their own by just running a bunch of manual searches.

  • Perf & API Consistency: Having a potentially long-running service call in the context endpoint is undesirable. We should run the service as async. This will mean that later calls will yield different results (ie. as the posts are imported by the async worker). That's really only a problem for programmatic API usage, and just requires a note on the endpoint documentation. In normal web UI usage, it should look like the posts loading into the interface as they are received. The context endpoint would behave as expected on the first call, and just have extra replies in future calls. We could add an additional option that defaults false to make the reply fetching service synchronous.
jonny, (edited )
@jonny@neuromatch.social avatar

I think that with a combination of debouncing how frequently the reading server requests from the OP server, and only asking tertiary (replier) servers for new posts off the results from the context request that it isn't any more of a DoS problem than normal masto operation. Recall that masto is already pretty dang inefficient (eg. if you expand a post, masto will already fetch the profile, which includes fetching pinned posts, preview cards for all links in bio and pinned posts, etc.), and expanding the context of a post would be a directly triggered behavior that matches i think normal expectations: when i look for the replies to that post, i should see the replies to that post. This would tie into any existing privacy controls - a 'followers only' reply wouldn't be reported in the response from the OP -> reading server, the reading server would have to abide by AUTHORIZED_FETCH, blocks would still hold, etc.

The costs of not having fetch all replies are pretty bad - first is that fedi can feel vacant on smaller servers. it takes actually quite a lot of people with quite a lot of follows to start having anything resembling a conversation among ppl more then 1-deep in a social graph. One of the primary criticisms of the fedi (mastodon specifically) is the high number of reply guys, and if you have ever had a post with even a moderate amount of popularity you are aware how exhausting it is to get exactly the same reply over and over again, as well as pointing well-intentioned people to information/replies/etc. that already exist elsewhere in the replies.

I'll stop there, but I think that the benefits of having fetch all replies pretty strongly outweigh the costs, and so that's why i want to do it efficiently. This is an especially important behavior if we want to get to a point of making the fedi p2p, where we can make sparse state updates more of the norm <3

jonny,
@jonny@neuromatch.social avatar

this is a patch on top of glitch, and so if we find something that works here the goal would be to pull it upstream, with neuromatchstodon as sort of the live testing instance. so ur work would be respected, credited, and made more general

jonny,
@jonny@neuromatch.social avatar

This is, imo one of the biggest problems with running a small or single-user fedi instance. This patch would make small fedi instances about a billion times more usable - aka it's directly responsive to the problem of 'fediverse is cool, but actually most accounts are on the largest 3 servers' bc smaller servers see like 0.01% of the fedi.

this is actually imo a more efficient behavior compared to the current alternative, which is to make some dummy account (or pollute your home feed) with lots and lots of follows you need to make just to be able to see the context around a post. ie. currently you need to get many many more posts than you want vs. just requesting the context of the posts you want to see.

also a polite cc to @hrefna who i have seen write about amplification on activitypub and masto a bunch of times, in case xe have any thoughts here

  • All
  • Subscribed
  • Moderated
  • Favorites
  • random
  • thenastyranch
  • DreamBathrooms
  • InstantRegret
  • magazineikmin
  • ethstaker
  • Youngstown
  • mdbf
  • slotface
  • everett
  • rosin
  • ngwrru68w68
  • kavyap
  • khanakhh
  • cubers
  • provamag3
  • tacticalgear
  • osvaldo12
  • GTA5RPClips
  • cisconetworking
  • modclub
  • Durango
  • Leos
  • normalnudes
  • megavids
  • tester
  • anitta
  • JUstTest
  • lostlight
  • All magazines