I just issued a data deletion request to #StackOverflow to erase all of the associations between my name and the questions, answers and comments I have on the platform.
One of the key ways in which #RAG works to supplement #LLMs is based on proven associations. Higher ranked Stack Overflow members' answers will carry more weight in any #LLM that is produced.
By asking for my name to be disassociated from the textual data, it removes a semantic relationship that is helpful for determining which tokens of text to use in an #LLM.
If you sell out your user base without consultation, expect a backlash.
@KathyReid Good stuff! Out of curiosity… when you mention that higher ranked users' posts carry more weight… is there anywhere I can read more about this feature engineering? Are we talking about RAG/search-operators manually annotating CSS selectors to pull user-ranking info per-site? Related: after crawling user rank info, would a RAG/search-provider not keep the info in-cache, i.e. do account deletions actually trickle down to search engines' collection of valuable features?
@KathyReid The high volume of text for high rank users makes total sense from a training bias perspective, though I think anonymizing authors might not change this.
Unless the RAG provider uses specially designed extractors for user rank info in their corpus, I'm doubtful ML could pick up on a numerical rank like SO karma and figure out to weight by this number. That's too much System 2 thinking for ML, IMO!
Still good to give big firms as little free data as possible, of course! ☺