@allendowney@fosstodon.org
@allendowney@fosstodon.org avatar

allendowney

@allendowney@fosstodon.org

Professor emeritus at Olin College, Principal Data Scientist at PyMC Labs, author of Think Python, blauthor of Probably Overthinking It, and stark raving Bayesian.

This profile is from a federated server and may be incomplete. Browse more on the original instance.

allendowney, to random
@allendowney@fosstodon.org avatar

If you compute the standard deviation of the same sample with NumPy and Pandas, you get different answers.

Why? And which one is right?

It's another installment of Data Q&A: Answering the Real Questions with Python.
https://www.allendowney.com/blog/2024/06/08/which-standard-deviation/

allendowney, to random
@allendowney@fosstodon.org avatar

On a recent run with a Spanish friend, we wondered whether the population of Spain would be shrinking if there were no net immigration.

The answer is in this new blog post: https://www.allendowney.com/blog/2024/06/06/migration-and-population-growth/

allendowney, to random
@allendowney@fosstodon.org avatar

Penrose is a really impressive tool for generating a wide variety of diagrams: https://penrose.cs.cmu.edu/

It would be even better if it were wrapped in an ipywidget. Anyone looking for a project?

allendowney, to random
@allendowney@fosstodon.org avatar

Cookie Cutter Data Science was already a great way to organize a data project, and now V2 is even better.

https://drivendata.co/blog/ccds-v2

There's a lot of experience and good advice embodied in a project template.

tao, to random
@tao@mathstodon.xyz avatar

In math research papers (particularly the "good" ones) one often observes a negative correlation between the conceptual difficulty of a component of an argument, and its technical difficulty: the parts that are conceptually routine or straightforward may take many pages of technical computation, whereas the parts that are conceptually interesting (and novel) are actually relatively brief, once all the more routine auxiliary steps (e.g., treatment of various lower order error terms) are stripped away.

I theorize that this is an instance of Berkson's paradox. I found the enclosed graphic from https://brilliant.org/wiki/berksons-paradox to be a good illustration of this paradox. In this (oversimplified) example, a negative correlation is seen between SAT scores and GPA in students admitted to a typical university, even though a positive correlation exists in the broader population, because students with too low of a combined SAT and GPA will get rejected from the university, whilst students with too high a score would typically go to a more prestigious school.

Similarly, mathematicians tend to write their best papers where the combined conceptual and technical difficulty of the steps of the argument is close to the upper bound of what they can handle. So steps that are conceptually and technically easy don't occupy much space in the paper, whereas steps that are both conceptually and technically hard would not have been discovered by the mathematician in the first place. This creates the aforementioned negative correlation.

Often the key to reading a lengthy paper is to first filter out all the technically complicated steps and identify the (often much shorter) conceptual core.

allendowney,
@allendowney@fosstodon.org avatar

@tao I gave a talk about Berkson's paradox recently, which you might like: https://youtu.be/8rUm46mk0Yo

allendowney, to random
@allendowney@fosstodon.org avatar

You might have 99 problems, but heteroskedasticity is not one of them.

An update from Data Q&A:
https://www.allendowney.com/blog/2024/05/26/logarithms-and-heteroskedasticity

allendowney, to random
@allendowney@fosstodon.org avatar

Think Python 3e is off to the printer! Electronic copies should "ship" next week, and print copies in ~3 weeks.

And Bookshop.org is running a promotion:

https://bookshop.org/a/98697/9781098155438

If you make a purchase this weekend, you could get your order refunded!

allendowney, to random
@allendowney@fosstodon.org avatar

Is there something like an average, but it can exceed the maximum of the data?

There is, and it makes more sense than it might sound like.

https://www.allendowney.com/blog/2024/05/24/combining-risks/

allendowney, to random
@allendowney@fosstodon.org avatar

In 1889 Joseph Bertrand posed and solved one of the oldest paradoxes in probability. But his solution is not quite correct – it is right for the wrong reason.

As always, Bayes's Theorem clears up the confusion.

https://www.allendowney.com/blog/2024/05/20/bertrands-boxes/

allendowney, to random
@allendowney@fosstodon.org avatar

From Probably Overthinking It -- the longevity of dogs is one Simpson's paradox nested inside another:

  1. Across all species, larger animals live longer
  2. Across dog breeds, smaller breeds live longer
  3. Within a breed, larger individuals live longer.

https://www.nytimes.com/2024/02/01/science/dogs-longevity-health.html?unlocked_article_code=1.r00.lG6A.-kTTSwpukOBV&smid=url-share

allendowney, to random
@allendowney@fosstodon.org avatar

The 3rd Edition of Think Python is available now at https://allendowney.github.io/ThinkPython

The print edition is available for preorder, expected to ship in June.

What's new?

  • The entire book is in Jupyter notebooks that run on Colab, so you can read the book, run the code, and work on exercises -- without installing anything.
allendowney,
@allendowney@fosstodon.org avatar
  • Each chapter includes suggestions for using virtual assistants like ChatGPT to develop, test, and debug programs, and explore additional topics.

  • The examples that use turtle graphics now work in Jupyter notebooks!

  • More testing with doctest and unittest.

And a new, full color, parrot on the cover!

allendowney, to random
@allendowney@fosstodon.org avatar

What better way to spend Friday afternoon than watching me talk about Chapter 7 of Probably Overthinking It?

"Causation, Collision, and Confusion"

https://www.youtube.com/watch?v=8rUm46mk0Yo

allendowney, to random
@allendowney@fosstodon.org avatar

Another installment of Data Q&A: Is it OK to compute the mean of a variable on a Likert scale?

Yes and no.

https://www.allendowney.com/blog/2024/05/03/the-mean-of-a-likert-scale/

Next week I'll discuss the correct pronunciation of Likert.

allendowney, to random
@allendowney@fosstodon.org avatar

I was at Google today to give a talk about Chapter 7 of Probably Overthinking It: Causation, Collision, and Confusion.

I'll post the video when it's available, but in the meantime, the slides are here: https://docs.google.com/presentation/d/e/2PACX-1vT3Wb80roqlKxQTQQlug4cRTKIZ304S453OehgE7Xpomed2OdG1xQEDGUo6el5Wfkrhfzl8Dbb79rxe/pub

allendowney, to random
@allendowney@fosstodon.org avatar

This week's installment of Data Q&A is about testing differences in the 85th percentile

https://www.allendowney.com/blog/2024/04/28/testing-percentiles/

Different models yield different p-values, but that's ok -- they don't have to be precise

allendowney, to random
@allendowney@fosstodon.org avatar

The latest installment in the Data Q&A series is about estimating percentiles, the limits of bootstrapping, and quantifying uncertainty due to missing data.

https://www.allendowney.com/blog/2024/04/26/small-percentiles-and-missing-data/

allendowney,
@allendowney@fosstodon.org avatar

@avehtari Good question -- I'm not sure. Some of the reduction in ESS is because we're estimating such a small percentile, I think. But yes, there's a ton of structure in the data that the bootstrap is ignoring. Hmm...

allendowney,
@allendowney@fosstodon.org avatar

@avehtari I was thinking about this on my morning run and I have a new theory -- the reduced ESS is a consequence of using KDE. Any values more than a few bandwidths away from the estimate contribute nothing.

Still not sure how much better we would do with a model that takes into account the autocorrelation. Might have to do the experiment.

allendowney,
@allendowney@fosstodon.org avatar

@avehtari Thanks for looking into this! There are a couple of things I'm finding confusing here. One is that the CI you got is substantially wider than the one I got. Why is that?

allendowney,
@allendowney@fosstodon.org avatar

@avehtari The other is what you said about the tails -- I expected the Gaussian tail of the KDE kernel to match the tail of the data pretty well -- and this figure suggests that it does:

allendowney,
@allendowney@fosstodon.org avatar

@avehtari A Pareto tail would be much thicker, wouldn't it?

allendowney,
@allendowney@fosstodon.org avatar

@avehtari Hmm. I think the number of things not making sense to me has exceeded the number of things that can be cleared up in this medium :(

allendowney,
@allendowney@fosstodon.org avatar

@avehtari Ok, but doesn't the figure in my previous message indicate that the Gaussian tail of the KDE fits the data well over the range of the data? If the values below that range are a little smaller or a lot smaller, that would not affect 0.2 percentile.

allendowney, to random
@allendowney@fosstodon.org avatar

Which plot indicates a stronger relationship?

Discussion here:
https://www.allendowney.com/blog/2024/04/21/what-does-strength-mean/

allendowney,
@allendowney@fosstodon.org avatar

@Biff_Bruise Glad to hear it is useful. The response to the Data Q&A series has been very positive -- and it is really fun to work on!

  • All
  • Subscribed
  • Moderated
  • Favorites
  • JUstTest
  • mdbf
  • ngwrru68w68
  • modclub
  • magazineikmin
  • thenastyranch
  • rosin
  • khanakhh
  • InstantRegret
  • Youngstown
  • slotface
  • Durango
  • kavyap
  • DreamBathrooms
  • megavids
  • GTA5RPClips
  • tacticalgear
  • normalnudes
  • tester
  • osvaldo12
  • everett
  • cubers
  • ethstaker
  • anitta
  • provamag3
  • Leos
  • cisconetworking
  • lostlight
  • All magazines