ajsadauskas, (edited )
@ajsadauskas@aus.social avatar

In an age of LLMs, is it time to reconsider human-edited web directories?

Back in the early-to-mid '90s, one of the main ways of finding anything on the web was to browse through a web directory.

These directories generally had a list of categories on their front page. News/Sport/Entertainment/Arts/Technology/Fashion/etc.

Each of those categories had subcategories, and sub-subcategories that you clicked through until you got to a list of websites. These lists were maintained by actual humans.

Typically, these directories also had a limited web search that would crawl through the pages of websites listed in the directory.

Lycos, Excite, and of course Yahoo all offered web directories of this sort.

(EDIT: I initially also mentioned AltaVista. It did offer a web directory by the late '90s, but this was something it tacked on much later.)

By the late '90s, the standard narrative goes, the web got too big to index websites manually.

Google promised the world its algorithms would weed out the spam automatically.

And for a time, it worked.

But then SEO and SEM became a multi-billion-dollar industry. The spambots proliferated. Google itself began promoting its own content and advertisers above search results.

And now with LLMs, the industrial-scale spamming of the web is likely to grow exponentially.

My question is, if a lot of the web is turning to crap, do we even want to search the entire web anymore?

Do we really want to search every single website on the web?

Or just those that aren't filled with LLM-generated SEO spam?

Or just those that don't feature 200 tracking scripts, and passive-aggressive privacy warnings, and paywalls, and popovers, and newsletters, and increasingly obnoxious banner ads, and dark patterns to prevent you cancelling your "free trial" subscription?

At some point, does it become more desirable to go back to search engines that only crawl pages on human-curated lists of trustworthy, quality websites?

And is it time to begin considering what a modern version of those early web directories might look like?

@degoogle

Emperor,
@Emperor@feddit.uk avatar

I used them and contributed to links as well - it was quite a rush to see a contribution accepted because it felt like you were adding to the great summary of the Internet. At least until the size of the Internet made it impossible to create a user-submitted, centrally-approved index of the Net. And so that all went away.

What seemed like a better approach was social bookmarking, like del.icio.us, where everyone added, tagged and shared bookmarks. The tagging basically crowd-sourced the categorisation and meant you could browse, search and follow links by tags or by the users. It created a folksonomy (thanks for the reminder Wikipedia) and, crucially, provided context to Web content (I think we’re still talking about the Semantic Web to some degree but perhaps AI is doing this better). Then after a long series of takeovers, it all went away. The spirit lives on in Pinterest and Flipboard to some degree but as this was all about links it was getting at the raw bones of the Internet.

I’ve been using Postmarks a single user social bookmarking tool but it isn’t really the same as del.icio.us because part of what made it work was the easy discoverablity and sharing of other people’s links. So what we need is, as I named my implementation of Postmarks, Relicious - pretty much del.icio.us but done Fediverse style so you sign up to instances with other people (possibly run on shared interests or region, so you could have a body modification instance or a German one, for example) and get bookmarking. If it works and people find it useful a FOSS Fediverse implementation would be very difficult to make go away.

ShadowCat,

Pinboard and TinyGem come to mind.

Emperor,
@Emperor@feddit.uk avatar

Oh indeed there are services out there that do something similar to Delicious, but I put a lot into that site only for it all to disappear due to the whims of some corporate overlord and I am not doing that again. What I am looking for is an easy Fediverse solution so my data is never lost again. Postmarks is definitely getting there but as a single-user service it isn’t quite what I am looking for.

Wren,

@Emperor @ajsadauskas I've been thinking about this myself lately - but I had wondered how a curated directory might scale, I hadn't considered federated social bookmarking and honestly that sounds like a brilliant solution. I'd love to see something like that happen, maybe even contribute

Emperor,
@Emperor@feddit.uk avatar

As the links show, Relicious/Fedilicious has been on my mind a while and I have been mourning the loss of Delicious for a long time. However, the above got me jotting down some notes.

It should be doable. I haven’t had a root through PostMark’s code but it might be they have done the bulk of the work already and it just needs a multiuser interface bolting on top of it.

polgeonow,
@polgeonow@mstdn.social avatar

@ajsadauskas I still rely on Google's extensive automatically-generated index for certain specialized research purposes (like finding all the news about a specific place from small national newspapers and web forums in other countries), but I totally agree that in general Google Search has lost a lot of its usefulness, largely due to rampant SEO abuse (ads at the top are nothing compared to whole pages of low-quality search results).

meet_eli,

@ajsadauskas @degoogle
New online family game is coming next month ! Only first 1000 will get to play it for free for 1 month !

Check out https://www.meeteli.com

hyc,
@hyc@mastodon.social avatar

@ajsadauskas @degoogle ah the good ol' days. I was a curator on yahoo's directory for a few years, before it ended.

GnomeComedy,

Sounds like you may enjoy en.m.wikipedia.org/wiki/Gemini_(protocol) if you haven’t installed a browser and tried it.

merthyr1831,

This is how it’s gonna go. we’ll get human-curated search results, before someone “innovates” by mildly automating the process until someone “innovates” again by using AI to automate it further. Time is a circle

thoughtsinuserspace,
timrichards, (edited )
@timrichards@aus.social avatar

@ajsadauskas @degoogle I actually contributed to one! I was a writer at LookSmart for four years; we manually created categories and added websites to then, with short descriptive reviews. Though an algorithm listed more sites below our selections, we could force the top result, eg we'd make sure the most relevant website was the first result of a search on that topic. Old-skool now, but had better results in some ways.

SDWolf,
@SDWolf@furries.club avatar

@ajsadauskas @degoogle So, classic mid-90s Yahoo. Or LookSmart, which was initially curated by Reader's Digest.

patrickleavy,
@patrickleavy@mastodon.social avatar
Atemu,
@Atemu@lemmy.ml avatar

I’d argue that link aggregators like Lemmy (from which I’m posting o/) are the new world version of that. Link aggregators are human-edited web directories; humans post links and other humans vote whether those links are relevant to the “category” (community) they’re in. The main difference is that it’s an open communal effort with implicit trust rather than closed groups of permitted editors.

SomeKindaName,

The problem is bots

Atemu,
@Atemu@lemmy.ml avatar

All instances of malicious bots I saw around here were downvoted into oblivion.

khleedril,
@khleedril@cyberplace.social avatar

@ajsadauskas @degoogle What we need to do is re-visit the GnuPG philosophy of building rings of trust. If one emerges with enough people proven to provide quality aggregators/summarizers then we can start to depend on that, or those.

cdamian,
@cdamian@rls.social avatar

@khleedril
Maybe something like liquid democracy, where you can give your votes on certain topics to trusted others.
https://en.m.wikipedia.org/wiki/Liquid_democracy

@ajsadauskas @degoogle

bluGill,
bluGill avatar

@ajsadauskas sounds like you want https://curlie.org/ - which seems to be up to date and interesting.

SnepperStepper,
@SnepperStepper@mastodon.social avatar

@ajsadauskas @degoogle i love this idea, i'm going to start my own web directory.

Emperor,
@Emperor@feddit.uk avatar

Do it!

Then federate it.

simon_brooke,
@simon_brooke@mastodon.scot avatar

@ajsadauskas @degoogle I used to be one of those human editors. I was the editor of Scotland.org from about 1994 to about 1997, back in the days when it was exactly one of those hierarchical web directories – with the intention of indexing every website based in Scotland.

simon_brooke,
@simon_brooke@mastodon.scot avatar

@ajsadauskas @degoogle having said that, the patents on Google's PageRank algorithm have now all expired, and a distributed, co-op operated search engine would now be possible. Yes, there would be trust issues, and you'd need to build fairly sophisticated filters to identify and exclude crap sites, but it might nevertheless be interesting and useful.

OldWoodFrame,

The tale of the internet has been curation, and I would describe it a little differently.

First we had hand made lists of website (Yahoo directory, or we had a list of websites literally written in pen in a notebook saying “yahoo.com” and “disney.com”).

Then it was bot-assisted search engines like Google.

Then there was so much content we didn’t even know where to start with Google, so we had web rings, then forums, then social media to recommend where to go. Then substack style email newsletters from your chosen taste makers are a half-step further curated from there.

If that is all getting spammed out of existence, I think the next step is an AI filter, you tell the AI what you like and it sifts through the garbage for you.

The reasons we moved past each step are still there, we can’t go back, but we can fight fire with fire.

Pamasich,
Pamasich avatar

@ajsadauskas I think Github's awesome lists are kind of like this. They're human-maintained catalogues of worthwhile websites on a specific topic.

gl33p,
@gl33p@mastodon.social avatar

@ajsadauskas Back when, UW Madison hosted an outfit called The Internet Scout Project that was in the curation business for web resources. The decaying state of search (alternatively the growth of web resources intended to serve interests other than their visitors') has me thinking it would be good to work with public libraries to convene and host this sort of thing.

Librarianship is the right sort of ethos for it, and libraries are infrastructure for human-mediated discoverability.

@degoogle

elxeno,

And is it time to begin considering what a modern version of those early web directories might look like?

Something like fmhy.net?

airwhale,
@airwhale@mastodon.social avatar

@ajsadauskas

With the steep decline of search and social media algorithms, my twist on this has been returning to NetNewsWire and RSS.

Self-curation has become a need, but also relying on Mastodon users to share good content.

riggbeck,
@riggbeck@mastodon.social avatar

@ajsadauskas @degoogle

It would be sad to go back to walled gardens like AOL, particularly since they were corporate-owned. But a sort of Kite Mark, certifying a site is free of LLMs, would be useful. Then users could choose for themselves.

tryst,

@ajsadauskas @degoogle Webrings! Bring back Webrings!

critter_in_flux, (edited )
@critter_in_flux@fluffs.au avatar

deleted_by_author

  • Loading...
  • Emperor,
    @Emperor@feddit.uk avatar

    Indeed. As I mentioned below, something like a webring (a FedRing) might be the solution to something I was pondering.

    It is increasingly clear to me that a lot of directions Web 1.0 was evolving in were diverted or just killed off by Big Tech’s landgrab which built walled gardens. I see the Fediverse as a return to the idea of blogs (micro and macro), forums, etc but in a more natural progression to interoperability. This still isn’t perfect and there may be other early web ideas, like webrings, that improve discoverablity.

    bradenslen,

    @ajsadauskas @degoogle Since I run a small directory this is a fascinating conversation to me.

    There is a place for small human edited directories along with search engines like Wiby and Searchmysite which have human review before websites are entered. Also of note: Marginalia search.

    I don't see a need for huge directories like the old Yahoo, Looksmart and ODP directories. But directories that serve a niche ignored by Google are useful.

    BernardSheppard,
    @BernardSheppard@mastodon.au avatar

    @bradenslen @ajsadauskas @degoogle looksmart! There's a blast from the past.

    As a very early internet user (suburbia.org.au- look it up, and who ran it) and a database guy, what I learnt very early is that any search engine needed users who knew how to write highly selective queries to get highly specific results.

    Google - despite everything - can still be used as a useful tool - if you are a skilled user.

    I am still surprised that you are not taught how to perform critical internet searching in primary school. It is as important as the three Rs

    Emperor,
    @Emperor@feddit.uk avatar

    But directories that serve a niche ignored by Google are useful.

    This is a good point - as search is increasingly enshittified too (from top down, with corporate interests, and bottom up, from SEO manipulation and dodgy sites) it makes sense for topics or communities often drowned out by the noise.

    I also see you are using webrings - another blast from the past that has it’s uses.

    seindal,
    @seindal@mastodon.social avatar

    @ajsadauskas @degoogle DMOZ was once an important part of the internet, but it too suffered from abuse and manipulation for traffic.

    For many DMOZ was the entry point to the web. Whatever you were looking for, you started there.

    Google changed that, first for the better, then for the worse.

    happyborg,
    @happyborg@fosstodon.org avatar

    @ajsadauskas
    I agree we need better and remember the early days well. Before indexes we passed URLs, in fact just IP addresses of servers we'd visit to see what was there, and that was often a directory of documents, papers etc. It filled us with awe, but let's not dial back that far!

    Another improvement will be #LocalLLMs both for privacy and personalised settings. Much of the garbage now is in service of keeping us searching rather than finding what we want.
    @degoogle

  • All
  • Subscribed
  • Moderated
  • Favorites
  • tech
  • DreamBathrooms
  • ngwrru68w68
  • modclub
  • magazineikmin
  • thenastyranch
  • rosin
  • khanakhh
  • InstantRegret
  • Youngstown
  • slotface
  • Durango
  • kavyap
  • mdbf
  • GTA5RPClips
  • provamag3
  • tacticalgear
  • tester
  • normalnudes
  • osvaldo12
  • everett
  • cubers
  • ethstaker
  • Leos
  • megavids
  • cisconetworking
  • anitta
  • JUstTest
  • lostlight
  • All magazines