Yesterday I described hardware for a map/filter/reduce database reading from flash storage, so how'd we write to that flash storage?
The trick hardware designers use for writing to flash memory is that if you manufacturing the insulating walls to the exact right ultra-thinness (and we have a large enough positive charge to attract it) electron probability can "tunnel" from the data wires to an isolated circuit. And stay there for decades!
The catch is that flash memory can be a bit too good at storing electrons, which in practice means they wear out over time. So operating systems or firmware need to minimize & spread out writes increasing its longevity.
To minimize writes we add some "RAM" chips which stores the data in transient memory cells made from capacitors (with a counter circuit & latch to periodically recharge them). Once we've filled a "block" with data (or we're losing power) we can write it to flash memory.
If these "blocks" are spread across all the different wafers that'll speed up the concurrent processing over these tables!
To spread the writes out I'd wait for the flash chips to fill up before running queries to read out all still-live rows via RAM so we can throw out the stale data & overwrite it with the compacted live data. A compacting GC!
Prior to this I'd only overwrite the rows to flag them as stale, once a certain transaction's committed. Thus speeding up transactions!
Regarding the Google leak I'll concur with @Seirdy : I don't trust any conclusions from these leaks as to how Google uses these metrics they gather.
Apparently it included docs, but we all know quickly those get out of date!
As I engage with my local community it really strikes me...
My training in software development provides a solid grounding in Systems Theory. That is: Social systems aren't all that different from software systems, only there isn't source code to treat as a point of truth. We're stuck theorizing based on (if we're lucky) imperfect documentation.
And I also think that the "low coupling, high cohesion" systems we software devs strive for are also ideal in social systems.
Yesterday I established that compiling SQL queries & many of the more tangential features of a relational database can be framed as queries themselves, just on internal tables which we already know the schema of. So today: How do we read the desired data?
The common approach today is to store extra electrons between a transistor & its enable wire, I'll get into how we write those electrons tomorrow! Now that we've mastered the technique, it can store enough data for most non-archivists!
We've even refined our circuits fine enough to read back analogue data stored in this "flash" memory, though I strongly suspect there's limits to how cost-effective this strategy will be to further refine.
Stuff a silicon wafer full of these flash storage cells in 3D adding a decoder, place multiple such wafers in the same microchip package, & place multiple microchips into the hardware until we've got enough storage to meet your needs!
For each of those flash wafers I'd include a CPU core with very little memory, separating data (read from Flash) vs code (uploaded by CPU). Thus letting it evaluating any filters, column-extraction, & synopsizing it wants!
I'd want it to be able to handle strings, ints, & floats including non-trivial maths like multiply, divide, & remainder. Then again minimal & segregated code storage can allow for a simpler instruction-decoding pipeline without branch prediction!
Having recently finished discussing how I'd implement a web browser (and server) from the hardware on up, how'd do the same for a relational SQL database?
To start I'd have a central processor parse the SQL sent to this machine, & compile it in reference to queries upon some internal tables. The architecture of this central processor (whether x86, ARM, RISCV, 6502,or my hypothetical string-centric design) doesn't really matter, we're not asking much of it.
Its worth digging into those "internal tables" some more.
This would include tables listing all the tables & table-columns in the database, with or without including themselves. Aside from needing to assume its schema, these'll be queried like any other table. Maybe like PostgreSQL we'll store code for processing the different data types in yet more tables?
Additionally there'd be tables for "views" which we'd parse to subqueries. As well as user accounts & access rights.
Finally I'd include a table of precompiled "triggers" to run before or after the query itself.
So compiling a query to run involves running several other queries! How'd we evaluate these queries?
Lets see the data is stored in flash memory, that should be plenty for most people today! Which allows me to say: What if we paired every flash wafer with a (RISC64fim?) CPU core?
We'd want non-trivial ALUs to process the common SQL types, but barely any RAM. Separating code from data.