#SearchEngines - kbin.social

ajsadauskas, 3 months ago (edited 3 months ago) to tech

In an age of LLMs, is it time to reconsider human-edited web directories?

Back in the early-to-mid '90s, one of the main ways of finding anything on the web was to browse through a web directory.

These directories generally had a list of categories on their front page. News/Sport/Entertainment/Arts/Technology/Fashion/etc.

Each of those categories had subcategories, and sub-subcategories that you clicked through until you got to a list of websites. These lists were maintained by actual humans.

Typically, these directories also had a limited web search that would crawl through the pages of websites listed in the directory.

Lycos, Excite, and of course Yahoo all offered web directories of this sort.

(EDIT: I initially also mentioned AltaVista. It did offer a web directory by the late '90s, but this was something it tacked on much later.)

By the late '90s, the standard narrative goes, the web got too big to index websites manually.

Google promised the world its algorithms would weed out the spam automatically.

And for a time, it worked.

But then SEO and SEM became a multi-billion-dollar industry. The spambots proliferated. Google itself began promoting its own content and advertisers above search results.

And now with LLMs, the industrial-scale spamming of the web is likely to grow exponentially.

My question is, if a lot of the web is turning to crap, do we even want to search the entire web anymore?

Do we really want to search every single website on the web?

Or just those that aren't filled with LLM-generated SEO spam?

Or just those that don't feature 200 tracking scripts, and passive-aggressive privacy warnings, and paywalls, and popovers, and newsletters, and increasingly obnoxious banner ads, and dark patterns to prevent you cancelling your "free trial" subscription?

At some point, does it become more desirable to go back to search engines that only crawl pages on human-curated lists of trustworthy, quality websites?

And is it time to begin considering what a modern version of those early web directories might look like?

@degoogle #tech #google #web #internet #LLM #LLMs #enshittification #technology #search #SearchEngines #SEO #SEM

reply

expand (76)

collapse (76)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ timrichards, AdeptVeritatis, ubi, oblomov +11 more

The First Search Engines, Built By Librarians (hackaday.com)

Before the Internet became the advertisement generator we know and love today, interspersed with interesting information here and there, it was originally a network of computers largely among various universities.

reillypascal, 1 month ago to privacy

This guy made a tool that beeps every time a website sends data about you to Google. The beeps blur into a continuous buzz: https://berthub.eu/articles/posts/tracker-beeper/

I use Privacy Badger (https://privacybadger.org/, blocks trackers), uBlock Origin (https://ublockorigin.com/, adblocker), Firefox, and the SearXNG search engine (https://searxng.ca/ or find instances at https://searx.space/), but it's annoying I have to.

#Privacy #Tracking #Google #AdBlock #uBlockOrigin #Security #SearchEngines #SearXNG

A video of navigating various popular news sites (Daily Mail, nu.nl, telegraaf.nl) in Chrome. A terminal is visible in the background, running a program that beeps ever time the site reports back to Google.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ n3wjack, mullana, wonziu, 74

amanjeev, 5 months ago to web

What search engines are you using these days? Asking for

non-tech searches

tech/programming related searches

What's working for you?

#SearchEngines #Search #WebSearch #Web #search_engines #AskFedi

reply

expand (4)

collapse (4)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ oblomov, Sh4d0w_H34rt, Mojeek

jppelt, 8 months ago (edited 8 months ago) to privacy

@Mojeek @tilvids

Fun little video: https://tilvids.com/w/rj93zSiWSCLgJxYuYVKbfY

"How Search Works: Words in the Index"

#TryMojeekSearch #mojeek #privacy #searchengines

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ Mojeek, fmo, tilvids

strypey, 10 months ago to random

An overview of search engines (mainly English language ones for now):

https://seirdy.one/posts/2021/03/10/search-engines-with-own-indexes/

#search #WebSearch #SearchEngines

#HatTip to @indieterminacy for the link in a matrix room.

reply

expand (2)

collapse (2)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ Mojeek, indieterminacy

TheLastofHisName, 3 months ago to web

I encourage folks to try using Stract search engine as an alternative to Bing, Google, etc.

It's open source, and run by one person out of their basement.

https://stract.com

#search #web #websearch #SearchEngines

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ mxtthxw, Binder

vform, 1 year ago to fediverse

So, now there are at least two cross-instance opt-in searches for the fediverse (or Mastodon). Both with different approaches and scopes, but searches nonetheless.

https://www.tootfinder.ch/ (by @buercher)

https://fediverse.info/explore/people (https://mastodon.social/users/atomicpoet/statuses/110267439186849360 explains it fine)

Any more I missed? :)

#SearchEngines #Mastodon

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ shelenn, eob

bsm, 5 months ago to searxng German

There are so many other search engines, than Google. Much more privacy friendly, free and not bad at all:

#Qwant: https://www.qwant.com/

#StartPage: https://www.startpage.com

#MetaGer: https://metager.de

#SearXNG: https://searxng.ch

#SwissCows: https://swisscows.com/

#eTools: https://www.etools.ch/

#kagi #noGoogle #search #searchengines #Suchmaschine #internet #recherche #privacy

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ caos, evelynefoerster

jd7h, 6 months ago to SmallWeb

Today I signed up for alternative search engine Kagi because I fell for their Small Web initiative: https://blog.kagi.com/small-web

#smallweb #kagi #search #searchengines

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ twilliability

researchbuzz, 4 months ago to Russia

#Russia #SearchEngines #Yandex

"A Russia-based company has become the legal owner of tech giant Yandex as it prepares to separate from its Dutch parent company, the state-run Interfax news agency reported Tuesday."

https://www.themoscowtimes.com/2024/01/23/tech-giant-yandex-gets-new-russian-owner-ahead-of-restructuring-a83817

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ anathema_device

phlogiston, 16 days ago to privacy

This is quite interesting/insightful. A 'map of the land' (universe?) of search engines, crawlers, meta-search engines, how they're related to each other, including info on their jurisdiction, ownership, features, etc.

https://www.searchenginemap.com/

Done by @Mojeek

#SearchEngines #privacy

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ Mojeek

Narayoni, 7 months ago to technology

#technology #ai #generativeAI #seo #searchengineoptimization #searchengines #google #microsoft #bingai #ChatGPT #googlebard
Will the increasing incorporation of generative AI negatively impact the SEO industry? Generative AI in its present form with its challenges that include hallucinations, haven't yet transformed online search, but research efforts are ongoing and the effect on SEO industry is interesting to consider.
https://theconversation.com/why-google-bing-and-other-search-engines-embrace-of-generative-ai-threatens-68-billion-seo-industry-210243?utm_medium=email&utm_campaign=Global%20Economy%20%20Business%20-%202023%2010%2025&utm_content=Global%20Economy%20%20Business%20-%202023%2010%2025+CID_864af18f08e3cede0b5b20bf11a387c2&utm_source=campaign_monitor_global&utm_term=Why%20Google%20Bing%20and%20other%20search%20engines%20embrace%20of%20generative%20AI%20threatens%2068%20billion%20SEO%20industry

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ TechDesk

JeremyMallin, 6 months ago to random

Does anyone have suggestions for alternative #SearchEngines that actually have good, meaningful results?? That's going to mean no deeply flawed AI, no SEO gaming the system, just useful results.

This isn't about privacy or a lack of privacy. I'm just really getting sick of not finding what I'm actually looking for.

#NoAI

reply

expand (2)

collapse (2)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ Cassandra

researchbuzz, 17 days ago to ai

Using ChatGPT to Double-Distill Mojeek Results into a Date-Based Topic Overview

My concern about AI-assisted search results has been, from the beginning, the lack of human context. A simple query is rarely going to be sufficient in itself; after all, the user is searching because of some existing information lack. Outside of the most basic queries (When is a movie playing? Where is that restaurant? How many ounces in a pound?)...

https://www.calishat.com/2024/05/27/using-chatgpt-to-double-distill-mojeek-results-into-a-date-based-topic-overview/

#SearchEngines #WebSearch #AI #Mojeek

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ thepoliticalcat

TheLastOfHisName, 20 days ago to web

Just a reminder that the independent search engine #Stract is still growing, but has a clear mission: pure search.

https://stract.com

#web #search #searchengines #Mastodon #Akkoma #Sharkey #PixelFed #tech

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ eyesquash

agr, 21 days ago to searxng

While Bing is down, time to (re)visit @Seirdy excellent post:

**A look at search engines with their own indexes

https://seirdy.one/posts/2021/03/10/search-engines-with-own-indexes/

Including Mojeek (own index) and SearxNG a metasearch engine (list of public instances here: https://searx.space/)

#Bing #SearchEngines #Mojeek #SearxNG

reply

expand (5)

collapse (5)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ kravietz

jotbe, 7 months ago to SEO

The people who ruined the internet

https://www.theverge.com/features/23931789/seo-search-engine-optimization-experts-google-results

#rr #adsters #seo #chatgpt #searchengines

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ nblr

avail, 2 months ago to random

What is everyone using for a search engine these days? I was bouncing between Duck Duck Go and Kagi - but Kagi is now in cahoots with Brave which I'm not a fan of so that's out. Duck Duck Go is ok, but curious what else is out there.

#SearchEngines / #recommendations / #WebSearch

reply

expand (1)

collapse (1)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ mxtthxw

RonaldTooTall, 4 months ago to privacy

Major Advance in Cryptography Could Make Fully Private Internet Searches a Reality

https://www.wired.com/story/cryptographers-fully-private-internet-searches-cybersecurity-databases-privacy/
#Cryptography #Privacy #Security #CyberSecurity #SearchEngines #Technology

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ Binder

jackyan, 1 month ago to random

All those who doubted me when I said that Bingʼs index was in the 1 to 2 milliards …
This is where Inktomi was over two decades ago, and itʼs a fraction of the size of Mojeekʼs.

Source and methodology: https://www.worldwidewebsize.com

#Mojeek #Bing @Mojeek #SearchEngines

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ Mojeek

researchbuzz, 7 months ago to reddit

#Reddit #Google #SearchEngines #WebCrawling

'The Washington Post reported Friday that Reddit might cut off Google and force users to log in to Reddit itself to read anything, if it can’t reach deals with generative AI companies to pay for its data. Initially, Reddit seemed to deny the report.... But after the Post corrected that story, only one major detail had changed — the Post no longer suggests Reddit users would need to log in.'

https://www.theverge.com/2023/10/20/23925504/reddit-deny-force-log-in-see-posts-ai-companies-deals

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ blogoklahoma

researchbuzz, 7 months ago to twitter

#Twitter #Bing #SearchEngines #WebIndexing

'X, well, Twitter.com, is now blocking Bing Search, specifically Bingbot, from crawling and accessing content posted on Twitter.com, on the X platform. Twitter specifically added to its robots.txt file a directive to disallow Bingbot from crawling the content on its platform.'

https://www.seroundtable.com/twitter-x-blocks-bing-search-36237.html

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ blogoklahoma

bsm, 5 months ago to iOS German

Willst du im im #Safari unter #iOS deine geliebte #Suchmaschine aktivieren?
Dafür gibt es eine super geniale „Extension App“ namens #xSearch:

https://apps.apple.com/ch/app/xsearch-for-safari/id1579902068

#search #searchengines #Apple #AppStore #ilike #extension

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ evelynefoerster

remixtures, 3 months ago to journalism Portuguese

#Media #News #Journalism #SEO #AdTech #Search #SearchEngines: "[F]ew network effects have damaged the news more than Search Engine Optimization, where the allure of traffic from search engines like Google has led publishers to create content not with the goal of serving their audience, but attracting the spurious traffic that one might get from those searching "when does the Super Bowl start."

The result is a media industry in crisis. Desperate executives and disconnected editors twist their reporters' coverage to please Google's algorithms as a means of improving traffic to please advertisers' algorithms, creating content that looks and sounds the same as other outlets, which in turn leads to layoffs as profits fail to increase, which in turn normalizes and weakens the content created by the outlet. This is largely a result of those in power not actually consuming or producing any of the product that makes the outlet money, only understanding the business as a series of symbols that at some point create revenue, ostensibly from the written word and video.

When you make decisions for a website or company that produces words that it sells for money based not on the writing, but on how to twist that writing to make it "more profitable," the conclusion is always inevitable — the creation of identical-looking slop that people only read by accident, and the slow asphyxiation of journalism and culture.

It almost always leads to overstaffing and mismanagement, too. Any form of creative media requires an understanding that building an audience takes time and money, and that one cannot just spend a bunch of money to make that happen. But these craven idiots are as rotten as the rest of the economy (...) The media is being run by people that do not see value in people or the things that they create, but the metrics that come as a result."

https://www.wheresyoured.at/the-anti-economy/

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ onepict