mattpd

@mattpd@mastodon.social

https://github.com/MattPD
https://twitter.com/matt_dz

This profile is from a federated server and may be incomplete. Browse more on the original instance.

mattpd, 7 days ago to random

Circle C++ with Memory Safety
https://www.circle-lang.org/site/intro/
by Sean Baxter (https://x.com/seanbax/status/1796933674291646552)

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ gracicot

mattpd, 3 months ago to random

A Simple showcase for the Sea-of-Nodes compiler IR
https://github.com/SeaOfNodes/Simple
Chapter 9: Global Value Numbering. Iterative peepholes to fixpoint. Worklists.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ mbr

mattpd, 5 months ago to random

How Badly Do We Want Correct Compilers?
https://www.youtube.com/watch?v=tMYYrR-hazI
John Regehr (@regehr) - NDC TechTown 2023

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ rml, pervognsen

mattpd, 6 months ago to llvm

How single-iteration InstCombine improves LLVM compile time
https://developers.redhat.com/articles/2023/12/07/how-single-iteration-instcombine-improves-llvm-compile-time
by Nikita Popov
#LLVM

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ daridrea

Migueldeicaza, 6 months ago to random

Microsoft’s Maia chip for hardware acceleration uses a new set of numeric data types to speed up computation. It used to be called internally msfloat, but is now becoming an open standard with a consortium of companies behind it.

This format requires that the data use the same exponent, and only differ in the sign/mantissa. So you can move a lot more of these numbers in one go:

https://www.opencompute.org/documents/ocp-microscaling-formats-mx-v1-0-spec-final-pdf

reply

expand (9)

collapse (9)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ dgoldsmith, c0de517e, drahardja

mattpd, 6 months ago

@Migueldeicaza @rhempel @jripley BTW, would you happen to know whether the approximation properties of the Dot and DotGeneral MX operations have been standardized? Context: a nice post by @gconstantinides, https://constantinides.net/2023/10/30/industry-coheres-around-mx/ (also pointing out the historical block floating point background).

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

mattpd, 6 months ago to llvm

2023 LLVM Developers' Meeting Trip Report by Henrich Lauko
https://xlauko.github.io/2023/11/10/llvm-dev-met.html
#LLVM #MLIR

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ daridrea

chandlerc, 6 months ago to random

FYI & note to future self for easier finding and referencing Arm and Neon intrinsics:

https://arm-software.github.io/acle/main/acle.html
https://arm-software.github.io/acle/neon_intrinsics/advsimd.html

These are much more effective than the ARM developer site -- can just use normal search, and they even include things bizarrely missing on the ARM developer site like vst1_*.

(I've probably been directed at these at least twice before, but maybe by posting this will help me remember the right place to go for the reference...)

reply

expand (3)

collapse (3)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ Paxxi

mattpd, 6 months ago

@chandlerc FWIW, if you also need asm instructions, I've found "Arm A64 Instruction Set Architecture" to be pretty navigable:
https://developer.arm.com/downloads/-/exploration-tools

Note: not "View HTML" but "Download XML", a tarball which has a PDF and a directory corresponding to the version (e.g., "ISA_A64_xml_A_profile-2023-09"): unpack that, open "index.html" in your web browser, and you'll get a pretty convenient instructions manual (with top bar allowing to navigate between the BASE, SIMD&FP, and SVE instructions).

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

mattpd, 7 months ago to random

Facile: Fast, Accurate, and Interpretable Basic-Block Throughput Prediction
https://arxiv.org/abs/2310.13212
IEEE International Symposium on Workload Characterization (IISWC) 2023
Andreas Abel (https://uops.info/), Shrey Sharma, Jan Reineke

image/jpeg

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ mbr, pervognsen, oblomov

mikenicolella, 7 months ago to random

Are there some resources out there to learn more about how database storage works? Like, what does the layout of the files on disk look like? What kind of structure is used to store the ‘current’ version of the DB? How do they achieve robust atomic updates? That sort of stuff. Would prefer to just learn about one particular implementation versus the academic theory.

reply

expand (1)

collapse (1)

report

activity

copy /kbin url

copy original url

open original url

Loading...

mattpd, 7 months ago

@mikenicolella CMU DB lectures from https://15445.courses.cs.cmu.edu/fall2023/schedule.html (starting from "Lecture #03: Database Storage I") and https://15721.courses.cs.cmu.edu/spring2023/schedule.html (starting from "#03 — Storage Models & Data Layout") are pretty good, most have multiple examples from/comparing practical DBMS implementations.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ daridrea

pervognsen, 8 months ago to random

I'm trying to find posts about this but my Google-Fu is letting me down: does anyone remember something about a microcode bug for one of the Zens where RDTSC had drastically reduced granularity?

reply

expand (9)

collapse (9)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ mbr

mattpd, 8 months ago

@pervognsen Take A Way: Exploring the Security Implications of AMD’s Cache Way Predictors
https://mlq.me/download/takeaway.pdf
Section 2.3, 2.3 High-resolution Timing & Appendix A, RDTSC Resolution

AMD Prefetch Attacks through Power and Time
https://mlq.me/download/amdprefetch.pdf
Section 3.1, Leakage Analysis Primitives

(AMD Zen 2 or newer: 36 cycle update interval using rdtsc/rdtscp/MPERF; however, still 1 cycle update interval reading APERF using rdpru).

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

neilhenning, 9 months ago to random

Anyone know if there are any rough stats for the % of functions that are hot / medium / cold in most codebases?

I know that this is gonna be super application dependent - but I'm curious what the rough shape of these numbers would be for your average blob of code.

Hot being hit all the time, cold being hit never or almost never, medium being the rest.

reply

expand (7)

collapse (7)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ daridrea, oblomov

mattpd, 9 months ago

@neilhenning "even among the hottest and most well-optimized functions in our server fleet, more than 50% of code is completely cold."
"Not only is more than 50% of code cold, but it is also interspersed between the relatively hot regions, and likely unnecessarily brought in by prefetchers." - from "AsmDB: Understanding and Mitigating Front-End Stalls in Warehouse-Scale Computers", https://research.google/pubs/pub48320/

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ oblomov

regehr, 9 months ago to random

I found the guy! someone hire him quick!!!

reply

expand (16)

collapse (16)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ oblomov, gsuberland, jaseg, 0x5DA +1 more

mattpd, 9 months ago

@zwarich @regehr I'd go with pointer provenance, there's a brief DR, https://www.open-std.org/jtc1/sc22/wg14/www/docs/dr_260.htm, should be an easy freebie!

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...