FaceDeer

@FaceDeer@fedia.io

Basically a deer with a human face. Despite probably being some sort of magical nature spirit, his interests are primarily in technology and politics and science fiction.

Spent many years on Reddit and then some time on kbin.social.

This profile is from a federated server and may be incomplete. Browse more on the original instance.

Hippie woowoo Facebook is also very crazy. (lemmy.world)

FaceDeer, 18 hours ago

Have to save it up in jars ahead of time.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

A Columbia University protester says the NYPD made her remove her hijab—despite new policy (www.motherjones.com)

FaceDeer, 19 hours ago

Keep making it more expensive to suck on their toes and perhaps eventually they'll stop.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Attention! Butt missiles. That is all. (i.ibb.co)

FaceDeer, 12 hours ago

Well that seems unlikely, what are the odds that an airplane is going to be chased by a ship?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Slack is now using all content, including DMs, to train LLMs (mastodon.sdf.org)

cross-posted from: lemmy.ml/post/15741608...

FaceDeer, 1 day ago

Also pretty sure training LLMs after someone opts out is illegal?

Why? There have been a couple of lawsuits launched in various jurisdictions claiming LLM training is copyright violation but IMO they're pretty weak and none of them have reached a conclusion. The "opting" status of the writer doesn't seem relevant if copyright doesn't apply in the first place.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

FaceDeer, 1 day ago

Nor is it up to you. But fact remains, it's not illegal until there are actually laws against it. The court cases that might determine whether current laws are against it are still ongoing.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

FaceDeer, 1 day ago

Maybe it's "simple as that" if you're just expressing an opinion, but what's the legal basis for it?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

FaceDeer, 1 day ago

The GDPR says that information that has been anonymized, for example through statistical analysis, is fine. LLM training is essentially a form of statistical analysis. There's hardly anything in law that is "simple."

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

FaceDeer, 23 hours ago

You could say it's to "circumvent" the law or you could say it's to comply with the law. As long as the PII is gone what's the problem?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

FaceDeer, 20 hours ago

It is impossible for them to contain more than just random fragments, the models are too small for it to be compressed enough to fit. Even the fragments that have been found are not exact, the AI is "lossy" and hallucinates.

The examples that have been found are examples of overfitting, a flaw in training where the same data gets fed into the training process hundreds or thousands of time over. This is something that modern AI training goes to great lengths to avoid.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Reddit’s deal with OpenAI will plug its posts into “ChatGPT and new products” (www.theverge.com)

FaceDeer, 1 day ago

You don't think LLMs are being trained off of this content too? Nobody needs to bother "announcing a deal" for it, it's being freely broadcast.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

FaceDeer, 1 day ago

Surely the use of user-deleted content as training data carries the same liabilities as reinstating it on the live site?

Why would that be? It's not the same.

And what liabilities would there be for reinstating it on the live site, for that matter? Have there been any lawsuits?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

FaceDeer, 1 day ago

I just did a bit of poking around on the subject of the "right to be forgotten" and it's legally complex. Data without personally identifying information, and data that's been anonymized through statistical analysis (which LLM training is a form of) aren't covered.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

FaceDeer, 1 day ago

You think they don't have the originals archived?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

FaceDeer, 1 day ago

"Model collapse" can be easily avoided by keeping old human data with new synthetic data in the training set. The old archives of Reddit content from before there was AI are still around.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

FaceDeer, 1 day ago

There are torrents of complete Reddit comment archives available for any random person who wants them, I'm sure Reddit themselves has a comprehensive edit history of everything.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

FaceDeer, 1 day ago

By "old archives" I mean everything from 2022 and earlier.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

FaceDeer, 1 day ago

Existing AIs such as ChatGPT were trained in part on that data so obviously they've got ways to make it work. They filtered out some stuff, for example - the "glitch tokens" such as solidgoldmagikarp were evidence of that.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Is Lemmy a good alternative?

New here. Migrated from Reddit. Still trying to figure out Lemmy - what’s everyone’s experiences like coming from Reddit and does Lemmy serve as a good alternative? Pros and cons/differences?...

FaceDeer, 3 days ago

The echo-chamberiness of Lemmy is different from Reddit, but still a thing unfortunately. It'll really depend on the community you're in, but since the population of the Fediverse (and especially the Threadiverse) is very small compared to Reddit you tend to have the same people cropping up a lot. I haven't been banned from anywhere (that I know of - I don't actually know if I would get notified) but I find myself hammered with downvotes more frequently here than on Reddit when I say something unpopular.

I'd say, mess around a bit and see.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Google is redesigning its search engine — and it’s AI all the way down (www.theverge.com)

FaceDeer, 3 days ago

Honestly, I've been using Bing a lot in recent months thanks to its integrated AI. Google is now just for when I know I want a specific web page, when it's a general answer I want then nothing beats Bing Chat. So this is a good move by Google.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Firefox to collect your (anonymized) search data (blog.mozilla.org)

FaceDeer, 4 days ago

Buddy, I just want to type a search term and get results.

Telemetry can help them do better at providing that. Devs aren't magical beings, they don't know what's working and what's not unless someone tells them.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

FaceDeer, 3 days ago

No, this analogy would make more sense if it was a matter of recording a large number of interactions between customers and tellers to ensure that the window isn't interfering with their interactions. Is the window the right size? Can the customer and teller hear each other through it? Is that little hole at the bottom large enough to let through the things they need to physically exchange? If you deploy the windows and then never gather any telemetry you have no idea whether it's working well or if it could be improved.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

FaceDeer, 3 days ago

The analogy isn't perfect, no analogy ever is.

In this case the content of the search is all that really matters for the quality of the search. What else would you suggest be recorded, the words-per-minute typing speed, the font size? If they want to improve the search system they need to know how it's working, and that involves recording the searches.

It's anonymized and you can opt out. Go ahead and opt out. There'll still be enough telemetry for them to do their work.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Air Force is “growing concerned” about the pace of Vulcan rocket launches (arstechnica.com)

FaceDeer, 4 days ago

Yeah, why aren't regular folks building reusable heavy-lift rocket systems?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

FaceDeer, 4 days ago

And even if SLS is an example of non-private rocketry, it's hardly something that should be touted as a positive example. Especially not when launch pace is your criterion.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

FaceDeer, 4 days ago

Some are, but they still don't build rockets. I think there's some other factor that's important.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...