sean

@sean@idf.social

This profile is from a federated server and may be incomplete. Browse more on the original instance.

KathyReid, 1 month ago to stackoverflow

I just issued a data deletion request to #StackOverflow to erase all of the associations between my name and the questions, answers and comments I have on the platform.

One of the key ways in which #RAG works to supplement #LLMs is based on proven associations. Higher ranked Stack Overflow members' answers will carry more weight in any #LLM that is produced.

By asking for my name to be disassociated from the textual data, it removes a semantic relationship that is helpful for determining which tokens of text to use in an #LLM.

If you sell out your user base without consultation, expect a backlash.

reply

expand (16)

collapse (16)

report

activity

copy /kbin url

copy original url

open original url

Loading...

sean, 1 month ago

@KathyReid Good stuff! Out of curiosity… when you mention that higher ranked users' posts carry more weight… is there anywhere I can read more about this feature engineering? Are we talking about RAG/search-operators manually annotating CSS selectors to pull user-ranking info per-site? Related: after crawling user rank info, would a RAG/search-provider not keep the info in-cache, i.e. do account deletions actually trickle down to search engines' collection of valuable features?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

sean, 1 month ago

@KathyReid The high volume of text for high rank users makes total sense from a training bias perspective, though I think anonymizing authors might not change this.

Unless the RAG provider uses specially designed extractors for user rank info in their corpus, I'm doubtful ML could pick up on a numerical rank like SO karma and figure out to weight by this number. That's too much System 2 thinking for ML, IMO!

Still good to give big firms as little free data as possible, of course! ☺

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

alex, 2 months ago to random

Is there a way to adjust GitHub history in a particular repository to see differences by character rather than by line?

reply

expand (4)

collapse (4)

report

activity

copy /kbin url

copy original url

open original url

Loading...

sean, 2 months ago

@alex I don't think so, but if you git clone the repository, then you can use git diff --word-diff locally to get something similar.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...