@sarahjamielewis@mastodon.social
@sarahjamielewis@mastodon.social avatar

sarahjamielewis

@sarahjamielewis@mastodon.social

Cryptography and Privacy Researcher. Executive Director @ Open Privacy Research Society (https://hachyderm.io/@openprivacy).

Founder @ Blodeuwedd Labs (https://mastodon.social/@blodeuweddlabs)

Building free and open source, privacy-enhancing, surveillance-resisting tech like Cwtch (https://fosstodon.org/@cwtch)

This profile is from a federated server and may be incomplete. Browse more on the original instance.

sarahjamielewis, to random
@sarahjamielewis@mastodon.social avatar

After writing this note on Recall (https://mastodon.social/@sarahjamielewis/112482021770758791) a few weeks back, I've received many messages under the assumption that I don't understand how DRM / OS interaction works.

As if the integration of a broken, backwards technology into the core of our computing systems happened by accident.

"No, you see the OS doesn't get to see those bits of the screen, so it totally makes sense why the system scraps your financial documents and passwords but not netflix" - utterly unhinged worldview

sarahjamielewis,
@sarahjamielewis@mastodon.social avatar

For the record I totally understand that everyone from chip manufactures to browser vendors made the decision to sell out their own customers and users to support and implement DRM everywhere - I recall those days pretty well.

sarahjamielewis,
@sarahjamielewis@mastodon.social avatar

The whole thing is a damn policy choice that's been playing out over 20+ years.

sarahjamielewis,
@sarahjamielewis@mastodon.social avatar

The boundaries could have been cut dozens of different ways, but they are where they are because of the compromises built into our systems.

And every paper cut compromise has led us to a place where modern Windows prevents you from taking a screenshot of Mickey Mouse while it happily subverts every other kind of process and workflow isolation.

That was and is a choice.

sarahjamielewis,
@sarahjamielewis@mastodon.social avatar

At the end of the day, I'm the kind of person that compiles (and occasionally writes) my own kernels - this affects me to the extent that people and organizations I engage with use these awful machines - and I expect they will in droves.

I've long given up on the idea that any systems besides my own can be trusted to keep secrets - but I will keep trying to both build better ones, and encourage others to do the same.

sarahjamielewis, to random
@sarahjamielewis@mastodon.social avatar

For all the discussion of "prompt engineering" and "finetuning", I think the most interesting biasing structure for modern AI that has flown somewhat under the mainstream discussion is the ability to directly constrain the output space through e.g. grammars for llms and control nets for image generation.

It's weird to see people deploy the raw output of large scale generative statistical models when there are pretty powerful tools just sitting there that allow more finegrained application.

sarahjamielewis,
@sarahjamielewis@mastodon.social avatar

It's frustrating the see 99.9% of the AI discussion being driven by chat interfaces and third party APIs.

That is really not where these tools are most interesting/useful - you really want them in tight, local, feedback loops, different aspects broken out into discrete workflows, constrained output spaces, and with the interface driven and mediated by the application at hand.

And I don't think it does any side any favours to fixate so strongly on the magic textfield that hallucinates wildly.

sarahjamielewis, to random
@sarahjamielewis@mastodon.social avatar

The thing about chat control / upload filters / client side scanning, whatever it's being called now; They are responses to an old generation of technology - one of an internet governed by centralized corporations.

Anonymous, peer to peer, file sharing exists. No centralized place to subvert - except the software running locally. Imperfect now, but intrinsically extant.

What proponents of these laws really want is to roll back the clock; something that is, fundamentally, not possible.

sarahjamielewis,
@sarahjamielewis@mastodon.social avatar

I don't know...am I supposed to seriously consider a call to mass subvert encryption? backdoor all open source code? pretend math doesn't exist? insist we never teach the children about mixnets? let this knowledge only be discussed in hushed tones?

I refuse to believe these never-ending discussions and statements and draft bills are nothing but unserious people doing unserious things.

sarahjamielewis,
@sarahjamielewis@mastodon.social avatar

Though on the subject of client side scanning, the best approach I've ever seen was the Apple one; an impressive result of years of research.

It was fundamentally broken in any sane risk model that these tools are being proposed for.

https://pseudorandom.resistant.tech/neuralhash-collisions.html

sarahjamielewis, to random
@sarahjamielewis@mastodon.social avatar

New Paper: On the application of Bloom Filter Hierarchies representing
Sub-word Token Bigram Occurrence to Probabilistic Full Text Search

This is a note regarding a prototype I've been working on for a few months in the domain of Decentralized Search (and Indexing)

It covers a data structure with interesting properties that I've been playing with, and documents some experiments regarding naive full text search performance.

Comments/questions/critique welcome.

PDF: https://sarahjamielewis.com/decentralization/search/ftsbloom.pdf

sarahjamielewis, to random
@sarahjamielewis@mastodon.social avatar

The more I think about search engines and compiling and weighting corpora, the more inclined I am to implement hard signal-filters i.e. assume all documents are spam to start with and only accept a document into the corpus if it can be shown to be unspam-like.

sarahjamielewis,
@sarahjamielewis@mastodon.social avatar

The concept of a spam filter is one from a more innocent age where even if spam was a majority of documents, it could still be identified and dropped.

I'm not sure it's possible to really identify spam anymore. Even previously well-trusted news publishers are playing games with thinly veiled advertorials / scientific journals are full of generative spam etc.

That problem is just going to get worse.

sarahjamielewis,
@sarahjamielewis@mastodon.social avatar

There are certain signals that can be identified as minimizing the likelihood that something is spam:

  • having minimal formatting / plain text representation
  • minimal references outside of the core semantic domain of the document (e.g. no links to ad servers / no affiliate links)
  • maximal referencing of other documents that are unspam-like

Nothing completely flawless, but I'm reminded of xkcd 810: https://xkcd.com/810/

sarahjamielewis, to random
@sarahjamielewis@mastodon.social avatar

After not finding the graph software I really wanted I decided to take the jump and start writing my own.

Pretty happy with this initial mvp, can load graphs from a directory made up of linked md files, add new nodes, move them around, and add new edges.

Decided to get what I really wanted would mean writing the UI stack from scratch, so most of my initial effort has gone into getting some basic widgets together.

Next step is to get a feel for how I want to specify edge types, and editing.

A video of the graph editing/maintenance software. Initially 2 nodes are visible, connected by a single edge. Using a form at the bottom of the app, 2 new nodes are added. The gif then proceeds to demonstrate moving these nodes around, and creating new edges between the nodes.

sarahjamielewis, to random
@sarahjamielewis@mastodon.social avatar

While understanding that not everyone has the kind of freedom that permits control over the systems they use...if you do have such freedom I encourage you to take advantage of it.

The most powerful thing about free and open source software is the ability to take it apart, understand how a piece of it works, and adapt it for your own purposes.

Don't like how something works? Rip it out. Share the modified version with the world.

Your systems don't need to be subject to the whims of others.

sarahjamielewis,
@sarahjamielewis@mastodon.social avatar

I am a terrible person to ask about getting into linux or what distros are the easiest to use - I don't think I've properly used a Windows for nearly a decade.

I compile my own kernels - sometimes for fun; my window manager has custom key bindings; I'm spending my Friday evening implementing better line drawing algorithms for a custom UI framework I'm writing for some project.

But if you do make the jump, and stumble upon a gnarly scenario and have questions - I'm happy to try and answer them.

simon, to random
@simon@simonwillison.net avatar

I'm on a flight and the in-flight WiFi blocks all forms of video

Any ideas how it might be doing that, given HTTPS? My best guess is that it could be filtering out known CDN host names that serve video

sarahjamielewis,
@sarahjamielewis@mastodon.social avatar

@simon though technical sophistication varies, the bigger airlines typically they do some detection based on a combination of IPs/hostnames/SNI (for trivial blocking of youtube/netflix etc.) and fallback to tcp session shaping (e.g. terminating/lowering bandwidth for flows after a certain amount of data is exchanged) for everything else.

sarahjamielewis, to random
@sarahjamielewis@mastodon.social avatar

Perhaps I have simply outgrown some kind of naive idealism, and perhaps some of it is the tendency to view the past through a more generous filter.

But wow is it hard to -find stuff- now. Even stuff I know exists. Hell, even stuff I know I wrote and put out there.

Lost in an ocean of empty words.

sarahjamielewis, to random
@sarahjamielewis@mastodon.social avatar

Getting to the root of it, I think the thing I miss the most about the old internet was the unstated assumption that the people on the other end of the wire were...people who shared similar interests and just wanted to connect.

I think of all the friends I made, the experiences I had that branched from IRC channels / forums / and even twitter in the later days.

Now the main question I find myself asking of anything that comes across my screen is "what is this trying to sell me?"

sarahjamielewis, to random
@sarahjamielewis@mastodon.social avatar

I spent large portions of my early career rearranging binary sequences on a chalkboard, and writing assembler for obscure architectures.

There are parts of my brain hard wired to recognize and align protocol stacks from a visual representation of a signal dump.

It's cute that you think you have to explain how computers work to me.

sarahjamielewis, to random
@sarahjamielewis@mastodon.social avatar

Software request: I'm looking for a tool I can use to manipulate nodes in a graph. Specifically I would like to be able to:

  • Add new nodes to the graph (not a tree)
  • Create multiple distinct edge relationships between nodes (bonus if the tool lets me formalize these edge types)
  • Have nodes contain notes, perhaps be typed
  • Export the graph to a reasonable (text) file format for external processing
  • Explicitly not an image editor or diagram tool.
  • Run on linux / be open source (flexible)
sarahjamielewis,
@sarahjamielewis@mastodon.social avatar

Additional requirements:

  • be able to handle a moderate number of nodes (at least a few thousand)
  • filter nodes by content and/or type
  • calculate subgraphs by edge relationships
  • have a file format that is practical to import into.
sarahjamielewis,
@sarahjamielewis@mastodon.social avatar

A few more notes after answering some questions:

I explicitly want a tool to help me visually modify nodes and edges in a reasonably sized graph.

The modification bit is really key, as it the ability to maintain multiple distinct edges between two nodes.

I want to steer way from diagramming tools because in my experience they don't scale. And I'm not really interested in visualization tools as I already have a workflow for that.

sarahjamielewis, to random
@sarahjamielewis@mastodon.social avatar

"Note that Recall does not perform content moderation. It will not hide information such as passwords or financial account numbers."

The computer, however, will stop you from recording DRM'd content.

Find it fascinating that when faced with drawing safety and security boundaries, the primary beneficiary is not the owner of the device, or the person using it, but random corporations who control the intellectual property rights.

The system doesn't work for you.

sarahjamielewis,
@sarahjamielewis@mastodon.social avatar

I find it equally fascinating that in order to get anywhere near an integrated computing experience in 2024 we apparently need constant recording and transformer models.

No structured file systems, no permission models, no shared stores, no capabilities - just firehose the display output and hope for the best.

  • All
  • Subscribed
  • Moderated
  • Favorites
  • megavids
  • mdbf
  • ngwrru68w68
  • tester
  • magazineikmin
  • thenastyranch
  • rosin
  • khanakhh
  • InstantRegret
  • Youngstown
  • slotface
  • Durango
  • kavyap
  • DreamBathrooms
  • JUstTest
  • tacticalgear
  • osvaldo12
  • normalnudes
  • cubers
  • cisconetworking
  • everett
  • GTA5RPClips
  • ethstaker
  • Leos
  • provamag3
  • anitta
  • modclub
  • lostlight
  • All magazines