markigra,
@markigra@sciences.social avatar

Yesterday when typing in a Google doc, autocomplete suggested the name of an interview subject. I do not store any qual data in Google Drive and all my transcripts use aliases anyway, so it freaked me out. I’m guessing gdocs uses gmail data (at least addresses) as autocomplete fodder and I know I have emailed this person from my university email (gmail) account. I’m guessing AI is going to lead to a lot of inadvertent data leakage as autocomplete++ generates more content. Be careful out there.

alkoclick,

@markigra One more for the "there's an xkcd for that" counter: https://xkcd.com/2169/

Linux_Is_Best,

@markigra

Your Google Account links across ALL Google services and devices.

This has been true for some time now, but it happens so seamlessly, that many do not even notice.

If you're using Google, Google is using you and your data. It is in their terms of services and acceptable user policy.

Your choices are to either

  1. Accept it
  2. Stop using Google
coggins,
@coggins@mastodon.social avatar

@markigra I asked Bard to summarize a document I had in Google Drive. It found and referenced a document on someone else's drive by the same name, summarizing it instead.

simon,
@simon@simonwillison.net avatar

@coggins @markigra Could you tell if that document was marked as public (anyone with the link can view) or private? It's bad either way, but even worse if it was private!

coggins,
@coggins@mastodon.social avatar

@simon @markigra Good question. I do think it was public because when I later went back into drive, it indicated it had opened it before and allowed me access.

But, apart from the access rights issue, this lead me to believe Bard was using people's Drive files to train its LLM! Even if you've marked a file as open to anyone with the link, I'm sure you weren't expecting that.

Also interesting is that it found the doc not by the file name but by the title of the doc inside the file.

simon,
@simon@simonwillison.net avatar

@coggins @markigra I don't think this necessarily means Drive files are being used to train the LLM model itself - this sounds much more like Retrieval Augmented Generation, where the model "reads" the file once on-demand to answer a question, but doesn't maintain any information from that file beyond the execution of that prompt

Problem is companies are really bad at documenting this! It's super important to understand if something is RAG or training, but they rarely explain that anywhere

Eh__tweet,

@markigra
I guess "being careful" needs to be brought to a totally new level.
Not sure how many people would know how.

circfruit,
@circfruit@fosstodon.org avatar

@markigra they scape even the paid Workspace accounts. I have few passport copies uploaded there from few years back and the admin console suggested I remove them. How do they know if the files are passport scans.. they’ve been OCR-ing and scraping without permission for years.

ashteranic,
@ashteranic@hachyderm.io avatar

@markigra ugh. This isn't surprising to hear, unfortunately. I remember getting burned when google plus leaked my g+ profile picture to people I was emailing, because they were using gmail, despite us not having g+ associations at all.

They just love to commingle data :(

drahardja,
@drahardja@sfba.social avatar

@markigra I stopped using Google products to store my data when I started journaling my medical conditions in GDocs, and suddenly started getting ads for medicines to treat them.

RobertJackson58585858,
@RobertJackson58585858@masto.ai avatar

@markigra

Earlier today I did a Mastodon post about #FamilyHistory including a reference to model railways (!) then looked at my YouTube feed & found a suggested film about a model railway.

This is spooky!

kcivey,
@kcivey@mastodon.social avatar

@markigra Reminiscent of Facebook suggesting therapist's patients as friends for each other.
https://www.apa.org/monitor/2017/02/facebook

lathamgreen,

@markigra yet google docs is weirdly particular about nont autocorrecitng some words for some kind of ethical reason I dont quite get

  • All
  • Subscribed
  • Moderated
  • Favorites
  • random
  • DreamBathrooms
  • ngwrru68w68
  • modclub
  • magazineikmin
  • thenastyranch
  • rosin
  • khanakhh
  • InstantRegret
  • Youngstown
  • slotface
  • Durango
  • kavyap
  • mdbf
  • GTA5RPClips
  • JUstTest
  • ethstaker
  • normalnudes
  • tester
  • osvaldo12
  • everett
  • cubers
  • tacticalgear
  • anitta
  • Leos
  • provamag3
  • cisconetworking
  • megavids
  • lostlight
  • All magazines