@FaceDeer@fedia.io
@FaceDeer@fedia.io avatar

FaceDeer

@FaceDeer@fedia.io

Basically a deer with a human face. Despite probably being some sort of magical nature spirit, his interests are primarily in technology and politics and science fiction.

Spent many years on Reddit and then some time on kbin.social.

This profile is from a federated server and may be incomplete. Browse more on the original instance.

FaceDeer,
@FaceDeer@fedia.io avatar

Have to save it up in jars ahead of time.

FaceDeer,
@FaceDeer@fedia.io avatar

Keep making it more expensive to suck on their toes and perhaps eventually they'll stop.

FaceDeer,
@FaceDeer@fedia.io avatar

Well that seems unlikely, what are the odds that an airplane is going to be chased by a ship?

FaceDeer,
@FaceDeer@fedia.io avatar

Also pretty sure training LLMs after someone opts out is illegal?

Why? There have been a couple of lawsuits launched in various jurisdictions claiming LLM training is copyright violation but IMO they're pretty weak and none of them have reached a conclusion. The "opting" status of the writer doesn't seem relevant if copyright doesn't apply in the first place.

FaceDeer,
@FaceDeer@fedia.io avatar

Nor is it up to you. But fact remains, it's not illegal until there are actually laws against it. The court cases that might determine whether current laws are against it are still ongoing.

FaceDeer,
@FaceDeer@fedia.io avatar

Maybe it's "simple as that" if you're just expressing an opinion, but what's the legal basis for it?

FaceDeer,
@FaceDeer@fedia.io avatar

The GDPR says that information that has been anonymized, for example through statistical analysis, is fine. LLM training is essentially a form of statistical analysis. There's hardly anything in law that is "simple."

FaceDeer,
@FaceDeer@fedia.io avatar

You could say it's to "circumvent" the law or you could say it's to comply with the law. As long as the PII is gone what's the problem?

FaceDeer,
@FaceDeer@fedia.io avatar

It is impossible for them to contain more than just random fragments, the models are too small for it to be compressed enough to fit. Even the fragments that have been found are not exact, the AI is "lossy" and hallucinates.

The examples that have been found are examples of overfitting, a flaw in training where the same data gets fed into the training process hundreds or thousands of time over. This is something that modern AI training goes to great lengths to avoid.

FaceDeer,
@FaceDeer@fedia.io avatar

You don't think LLMs are being trained off of this content too? Nobody needs to bother "announcing a deal" for it, it's being freely broadcast.

FaceDeer,
@FaceDeer@fedia.io avatar

Surely the use of user-deleted content as training data carries the same liabilities as reinstating it on the live site?

Why would that be? It's not the same.

And what liabilities would there be for reinstating it on the live site, for that matter? Have there been any lawsuits?

FaceDeer,
@FaceDeer@fedia.io avatar

I just did a bit of poking around on the subject of the "right to be forgotten" and it's legally complex. Data without personally identifying information, and data that's been anonymized through statistical analysis (which LLM training is a form of) aren't covered.

FaceDeer,
@FaceDeer@fedia.io avatar

You think they don't have the originals archived?

FaceDeer,
@FaceDeer@fedia.io avatar

"Model collapse" can be easily avoided by keeping old human data with new synthetic data in the training set. The old archives of Reddit content from before there was AI are still around.

FaceDeer,
@FaceDeer@fedia.io avatar

There are torrents of complete Reddit comment archives available for any random person who wants them, I'm sure Reddit themselves has a comprehensive edit history of everything.

FaceDeer,
@FaceDeer@fedia.io avatar

By "old archives" I mean everything from 2022 and earlier.

FaceDeer,
@FaceDeer@fedia.io avatar

Existing AIs such as ChatGPT were trained in part on that data so obviously they've got ways to make it work. They filtered out some stuff, for example - the "glitch tokens" such as solidgoldmagikarp were evidence of that.

FaceDeer,
@FaceDeer@fedia.io avatar

The echo-chamberiness of Lemmy is different from Reddit, but still a thing unfortunately. It'll really depend on the community you're in, but since the population of the Fediverse (and especially the Threadiverse) is very small compared to Reddit you tend to have the same people cropping up a lot. I haven't been banned from anywhere (that I know of - I don't actually know if I would get notified) but I find myself hammered with downvotes more frequently here than on Reddit when I say something unpopular.

I'd say, mess around a bit and see.

FaceDeer,
@FaceDeer@fedia.io avatar

Honestly, I've been using Bing a lot in recent months thanks to its integrated AI. Google is now just for when I know I want a specific web page, when it's a general answer I want then nothing beats Bing Chat. So this is a good move by Google.

FaceDeer,
@FaceDeer@fedia.io avatar

Buddy, I just want to type a search term and get results.

Telemetry can help them do better at providing that. Devs aren't magical beings, they don't know what's working and what's not unless someone tells them.

FaceDeer,
@FaceDeer@fedia.io avatar

No, this analogy would make more sense if it was a matter of recording a large number of interactions between customers and tellers to ensure that the window isn't interfering with their interactions. Is the window the right size? Can the customer and teller hear each other through it? Is that little hole at the bottom large enough to let through the things they need to physically exchange? If you deploy the windows and then never gather any telemetry you have no idea whether it's working well or if it could be improved.

FaceDeer,
@FaceDeer@fedia.io avatar

The analogy isn't perfect, no analogy ever is.

In this case the content of the search is all that really matters for the quality of the search. What else would you suggest be recorded, the words-per-minute typing speed, the font size? If they want to improve the search system they need to know how it's working, and that involves recording the searches.

It's anonymized and you can opt out. Go ahead and opt out. There'll still be enough telemetry for them to do their work.

FaceDeer,
@FaceDeer@fedia.io avatar

Yeah, why aren't regular folks building reusable heavy-lift rocket systems?

FaceDeer,
@FaceDeer@fedia.io avatar

And even if SLS is an example of non-private rocketry, it's hardly something that should be touted as a positive example. Especially not when launch pace is your criterion.

FaceDeer,
@FaceDeer@fedia.io avatar

Some are, but they still don't build rockets. I think there's some other factor that's important.

  • All
  • Subscribed
  • Moderated
  • Favorites
  • Leos
  • rosin
  • InstantRegret
  • ethstaker
  • DreamBathrooms
  • mdbf
  • magazineikmin
  • thenastyranch
  • Youngstown
  • tacticalgear
  • slotface
  • Durango
  • khanakhh
  • kavyap
  • megavids
  • everett
  • vwfavf
  • normalnudes
  • osvaldo12
  • cubers
  • GTA5RPClips
  • cisconetworking
  • ngwrru68w68
  • anitta
  • provamag3
  • tester
  • modclub
  • JUstTest
  • All magazines