mima, 1 hour ago to fediverse Hmm I probably have the most ridiculous #robotstxt for a #Misskey instance right now lol. I just want to let #Mojeek and #Marginalia crawl #Makai and make sure to keep out #Google and the AI scrapers... :satrithink: If there are other user-agents of independent #searchengines I should allow in https://makai.chaotic.ninja/robots.txt, please let me know! I'm actually searching #SauceNAO, #TinEye, and #IQDB's #useragent so I can let them fetch our media for their reverse image search. User-Agent: MojeekBot User-Agent: FeedFetcher-Mojeek User-Agent: search.marginalia.nu Allow: / Allow: /notes Disallow: /admin Disallow: /settings Disallow: /my/ User-Agent: * User-Agent: Googlebot User-Agent: Google-Extended User-Agent: GoogleOther User-Agent: AdsBot-Google User-Agent: AdsBot-Google-Mobile User-Agent: Mediapartners-Google User-Agent: CCBot User-Agent: ChatGPT-User User-Agent: GPTBot User-Agent: Omgilibot User-Agent: omgili User-Agent: FacebookBot User-agent: Twitterbot User-Agent: cohere-ai User-Agent: anthropic-ai User-Agent: Bytespider User-Agent: Amazonbot User-Agent: Applebot User-Agent: PerplexityBot User-Agent: YouBot User-Agent: AwarioRssBot User-Agent: AwarioSmartBot User-Agent: ClaudeBot User-Agent: Claude-Web User-Agent: DataForSeoBot User-Agent: FriendlyCrawler User-Agent: ImagesiftBot User-Agent: magpie-crawler User-Agent: Meltwater User-Agent: peer39_crawler User-Agent: PiplBot User-Agent: Seekr Disallow: / # todo: sitemap #sysadmin #fediadmin
Hmm I probably have the most ridiculous #robotstxt for a #Misskey instance right now lol. I just want to let #Mojeek and #Marginalia crawl #Makai and make sure to keep out #Google and the AI scrapers... :satrithink:
If there are other user-agents of independent #searchengines I should allow in https://makai.chaotic.ninja/robots.txt, please let me know! I'm actually searching #SauceNAO, #TinEye, and #IQDB's #useragent so I can let them fetch our media for their reverse image search.
User-Agent: MojeekBot User-Agent: FeedFetcher-Mojeek User-Agent: search.marginalia.nu Allow: / Allow: /notes Disallow: /admin Disallow: /settings Disallow: /my/ User-Agent: * User-Agent: Googlebot User-Agent: Google-Extended User-Agent: GoogleOther User-Agent: AdsBot-Google User-Agent: AdsBot-Google-Mobile User-Agent: Mediapartners-Google User-Agent: CCBot User-Agent: ChatGPT-User User-Agent: GPTBot User-Agent: Omgilibot User-Agent: omgili User-Agent: FacebookBot User-agent: Twitterbot User-Agent: cohere-ai User-Agent: anthropic-ai User-Agent: Bytespider User-Agent: Amazonbot User-Agent: Applebot User-Agent: PerplexityBot User-Agent: YouBot User-Agent: AwarioRssBot User-Agent: AwarioSmartBot User-Agent: ClaudeBot User-Agent: Claude-Web User-Agent: DataForSeoBot User-Agent: FriendlyCrawler User-Agent: ImagesiftBot User-Agent: magpie-crawler User-Agent: Meltwater User-Agent: peer39_crawler User-Agent: PiplBot User-Agent: Seekr Disallow: / # todo: sitemap
#sysadmin #fediadmin