A Simple showcase for the Sea-of-Nodes compiler IR https://github.com/SeaOfNodes/Simple
Chapter 9: Global Value Numbering. Iterative peepholes to fixpoint. Worklists.
Microsoft’s Maia chip for hardware acceleration uses a new set of numeric data types to speed up computation. It used to be called internally msfloat, but is now becoming an open standard with a consortium of companies behind it.
This format requires that the data use the same exponent, and only differ in the sign/mantissa. So you can move a lot more of these numbers in one go:
These are much more effective than the ARM developer site -- can just use normal search, and they even include things bizarrely missing on the ARM developer site like vst1_*.
(I've probably been directed at these at least twice before, but maybe by posting this will help me remember the right place to go for the reference...)
Note: not "View HTML" but "Download XML", a tarball which has a PDF and a directory corresponding to the version (e.g., "ISA_A64_xml_A_profile-2023-09"): unpack that, open "index.html" in your web browser, and you'll get a pretty convenient instructions manual (with top bar allowing to navigate between the BASE, SIMD&FP, and SVE instructions).
Facile: Fast, Accurate, and Interpretable Basic-Block Throughput Prediction https://arxiv.org/abs/2310.13212
IEEE International Symposium on Workload Characterization (IISWC) 2023
Andreas Abel (https://uops.info/), Shrey Sharma, Jan Reineke
Are there some resources out there to learn more about how database storage works? Like, what does the layout of the files on disk look like? What kind of structure is used to store the ‘current’ version of the DB? How do they achieve robust atomic updates? That sort of stuff. Would prefer to just learn about one particular implementation versus the academic theory.
I'm trying to find posts about this but my Google-Fu is letting me down: does anyone remember something about a microcode bug for one of the Zens where RDTSC had drastically reduced granularity?
@pervognsen Take A Way: Exploring the Security Implications of AMD’s Cache Way Predictors https://mlq.me/download/takeaway.pdf
Section 2.3, 2.3 High-resolution Timing & Appendix A, RDTSC Resolution
Anyone know if there are any rough stats for the % of functions that are hot / medium / cold in most codebases?
I know that this is gonna be super application dependent - but I'm curious what the rough shape of these numbers would be for your average blob of code.
Hot being hit all the time, cold being hit never or almost never, medium being the rest.
@neilhenning "even among the hottest and most well-optimized functions in our server fleet, more than 50% of code is completely cold."
"Not only is more than 50% of code cold, but it is also interspersed between the relatively hot regions, and likely unnecessarily brought in by prefetchers." - from "AsmDB: Understanding and Mitigating Front-End Stalls in Warehouse-Scale Computers", https://research.google/pubs/pub48320/