If you're interested in discussing #parallel processing in #RStats at #PositConf2023, please let me know or reply here. I'm hoping we can have an informal hangout during Monday, Tuesday, or Wednesday to discuss what's missing, what's new and what's on the roadmap for parallelization in R. It doesn't have to be on just #Futureverse
(I'll arrive late Sunday Sept 17 and leave early Thursday Sept 21)
I get ridiculed by young JavaScript and Python coders, whenever I say that parallel processing is essential to the future of computing.
The seasoned among them point out to me that the idea of #supercomputers is almost as old as me, that their iPhone can run rings round a typical supercomputer I may have used in my grad school days, and that their Python programmes running on laptops can beat anything I may have written on a CRAY in Fortran or C. Those points seem valid, but they miss the mark.
First, just outrunning a 30-year-old system is not a legitimate measure of current performance.
Secondly, if modern hardware performance has reached a level where a naïve implementation of an algorithm in a slow scripting language can beat a hand-tuned parallel programme running on an old supercomputer, then today's programmers have the ethical responsibility to optimise their software implementations by exploiting those newer, greater hardware capabilities available to them.
Thirdly, if there is so much excess hardware capacity, the software should soak that up by striving for more accuracy, more precision, more features, whatever, but not by mining bitcoins.
Lastly, just about every consumer-grade machine today—server, desktop, laptop, tablet, phone, single-board computer—is a multicore, multiprocessor monster. Programmers should be exploiting those readily available parallel resources, now. Automatic performance upgrade of sequential code by Moore's law and Dennard scaling is dead and gone. And fully automatic parallelisation of sequential code by compilers is still a distant dream.
We're meeting at a round table in Riverside (lunchroom downstairs; right & right after the escalators; you'll see us) to talk about #parallel processing in #RStats.
One topic is marshalling - figuring out how to send special, non-exportable objects to other R processes.
The word "parallel" conveniently contains two parallel lines in the middle to remind us of the double-l spelling. But don't overdo it: paralleled, paralleling.
Bayesian cross-validation by parallel Markov Chain Monte Carlo by Alex Cooper, Aki Vehtari, Catherine Forbes, Lauren Kennedy, and Dan Simpson. http://arxiv.org/abs/2310.07002
fast general parallel brute force Bayesian cross-validation with GPUs
constant memory streaming estimates and convergence diagnostics
assessing convergence (Rhat) and accuracy (MCSE) of aggregated result from parallel computations
"In this #paper, we introduce new #algorithms for #CSS selector matching, layout solving, and font rendering, which represent key components for a fast layout engine. Evaluation on popular sites shows speedups as high as 80x. We also formulate the layout problem with attribute grammars, enabling us to not only parallelize our algorithm but prove that it computes in O(log) time and without reflow."
is "not-so-good" code. Anything that changes the state of random number generator (RNG) on package load prevents reproducible results. It's impossible to protect against this in some situations, e.g. when running things in #parallel where the result depends on whether the package is already loaded on parallel workers.