This is quite interesting/insightful. A 'map of the land' (universe?) of search engines, crawlers, meta-search engines, how they're related to each other, including info on their jurisdiction, ownership, features, etc.
Using ChatGPT to Double-Distill Mojeek Results into a Date-Based Topic Overview
My concern about AI-assisted search results has been, from the beginning, the lack of human context. A simple query is rarely going to be sufficient in itself; after all, the user is searching because of some existing information lack. Outside of the most basic queries (When is a movie playing? Where is that restaurant? How many ounces in a pound?)...
All those who doubted me when I said that Bingʼs index was in the 1 to 2 milliards …
This is where Inktomi was over two decades ago, and itʼs a fraction of the size of Mojeekʼs.
What is everyone using for a search engine these days? I was bouncing between Duck Duck Go and Kagi - but Kagi is now in cahoots with Brave which I'm not a fan of so that's out. Duck Duck Go is ok, but curious what else is out there.
In an age of LLMs, is it time to reconsider human-edited web directories?
Back in the early-to-mid '90s, one of the main ways of finding anything on the web was to browse through a web directory.
These directories generally had a list of categories on their front page. News/Sport/Entertainment/Arts/Technology/Fashion/etc.
Each of those categories had subcategories, and sub-subcategories that you clicked through until you got to a list of websites. These lists were maintained by actual humans.
Typically, these directories also had a limited web search that would crawl through the pages of websites listed in the directory.
Lycos, Excite, and of course Yahoo all offered web directories of this sort.
(EDIT: I initially also mentioned AltaVista. It did offer a web directory by the late '90s, but this was something it tacked on much later.)
By the late '90s, the standard narrative goes, the web got too big to index websites manually.
Google promised the world its algorithms would weed out the spam automatically.
And for a time, it worked.
But then SEO and SEM became a multi-billion-dollar industry. The spambots proliferated. Google itself began promoting its own content and advertisers above search results.
And now with LLMs, the industrial-scale spamming of the web is likely to grow exponentially.
My question is, if a lot of the web is turning to crap, do we even want to search the entire web anymore?
Do we really want to search every single website on the web?
Or just those that aren't filled with LLM-generated SEO spam?
Or just those that don't feature 200 tracking scripts, and passive-aggressive privacy warnings, and paywalls, and popovers, and newsletters, and increasingly obnoxious banner ads, and dark patterns to prevent you cancelling your "free trial" subscription?
At some point, does it become more desirable to go back to search engines that only crawl pages on human-curated lists of trustworthy, quality websites?
And is it time to begin considering what a modern version of those early web directories might look like?
#Media#News#Journalism#SEO#AdTech#Search#SearchEngines: "[F]ew network effects have damaged the news more than Search Engine Optimization, where the allure of traffic from search engines like Google has led publishers to create content not with the goal of serving their audience, but attracting the spurious traffic that one might get from those searching "when does the Super Bowl start."
The result is a media industry in crisis. Desperate executives and disconnected editors twist their reporters' coverage to please Google's algorithms as a means of improving traffic to please advertisers' algorithms, creating content that looks and sounds the same as other outlets, which in turn leads to layoffs as profits fail to increase, which in turn normalizes and weakens the content created by the outlet. This is largely a result of those in power not actually consuming or producing any of the product that makes the outlet money, only understanding the business as a series of symbols that at some point create revenue, ostensibly from the written word and video.
When you make decisions for a website or company that produces words that it sells for money based not on the writing, but on how to twist that writing to make it "more profitable," the conclusion is always inevitable — the creation of identical-looking slop that people only read by accident, and the slow asphyxiation of journalism and culture.
It almost always leads to overstaffing and mismanagement, too. Any form of creative media requires an understanding that building an audience takes time and money, and that one cannot just spend a bunch of money to make that happen. But these craven idiots are as rotten as the rest of the economy (...) The media is being run by people that do not see value in people or the things that they create, but the metrics that come as a result."
#AI#GenerativeAI#Web#Search#SearchEngines#Chatbots: "The Browser Company’s new app lets you ask semantic questions to a chatbot, which then summarizes live internet results in a simulation of a conversation. Which is great, in theory, as long as you don’t have any concerns about whether what it’s saying is accurate, don’t care where that information is coming from or who wrote it, and don’t think through the long-term feasibility of a product like this even a little bit. Or, as Dash put it, “It’s the parasite that kills the host.”
The base logic of something like Arc’s AI search doesn’t even really make sense. As Engadget recently asked in their excellent teardown of Arc’s AI search pivot, “Who makes money when AI reads the internet for us?” But let’s take a step even further here. Why even bother making new websites if no one’s going to see them? At least with the Web3 hype cycle, there were vague platitudes about ownership and financial freedom for content creators. To even entertain the idea of building AI-powered search engines means, in some sense, that you are comfortable with eventually being the reason those creators no longer exist. It is an undeniably apocalyptic project, but not just for the web as we know it, but also your own product."
#Media#News#Journalism#SEO#Google#Search#SearchEngines: "In our experience, each rollout of the Products Review Update has shaken things up, generally benefitting sites and writers who actually dedicated time, effort, and money to test products before they would recommend them to the world.
That said, most searches for specific product models don’t just magically start with users searching for specific devices off the top of their heads. There is an immediate step before this: the hours of research reading through lists of product recommendations.
If you have been reading HouseFresh for a while, your first encounter with us was likely a list like this one or this one recommending the best devices for a specific issue you were trying to solve. That is how most of our readers find us.
Unfortunately, we’re getting less and less traffic from those pages, and it’s endangering the future of our site.
"A Russia-based company has become the legal owner of tech giant Yandex as it prepares to separate from its Dutch parent company, the state-run Interfax news agency reported Tuesday."
'[Consumers] tend to view sponsored listings with suspicion and often prefer to click on what are called 'organic' listings that appear high in their product search results but are not sponsored, said [Professor] Mingyu 'Max' Joo... In fact, a sponsored listing can be detrimental when it replaces a seller’s organic listing that would have appeared in the top few positions in the search results.'
Does anyone have suggestions for alternative #SearchEngines that actually have good, meaningful results?? That's going to mean no deeply flawed AI, no SEO gaming the system, just useful results.
This isn't about privacy or a lack of privacy. I'm just really getting sick of not finding what I'm actually looking for.
I've been seriously fed up with Google's search recently: results full of ads and/or SEO spam, to the point that it's hard to find info.
The results in kagi search are overall good (though the map is really lacking), but it's just very expensive. I'd definitely need their $10 per month plan, and I'm not sure I could justify this, given there is free (though arguably worse) competition. 🤔
I guess I could cycle through the trial by creating a new account every week (the joys of having my own domain name) but that seems a little ridiculous, too 😆