allendowney

@allendowney@fosstodon.org

Professor emeritus at Olin College, Principal Data Scientist at PyMC Labs, author of Think Python, blauthor of Probably Overthinking It, and stark raving Bayesian.

This profile is from a federated server and may be incomplete. Browse more on the original instance.

allendowney, 3 days ago to random

If you compute the standard deviation of the same sample with NumPy and Pandas, you get different answers.

Why? And which one is right?

It's another installment of Data Q&A: Answering the Real Questions with Python.
https://www.allendowney.com/blog/2024/06/08/which-standard-deviation/

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

allendowney, 6 days ago to random

On a recent run with a Spanish friend, we wondered whether the population of Spain would be shrinking if there were no net immigration.

The answer is in this new blog post: https://www.allendowney.com/blog/2024/06/06/migration-and-population-growth/

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

allendowney, 7 days ago to random

Penrose is a really impressive tool for generating a wide variety of diagrams: https://penrose.cs.cmu.edu/

It would be even better if it were wrapped in an ipywidget. Anyone looking for a project?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

allendowney, 14 days ago to random

Cookie Cutter Data Science was already a great way to organize a data project, and now V2 is even better.

https://drivendata.co/blog/ccds-v2

There's a lot of experience and good advice embodied in a project template.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

tao, 14 days ago to random

In math research papers (particularly the "good" ones) one often observes a negative correlation between the conceptual difficulty of a component of an argument, and its technical difficulty: the parts that are conceptually routine or straightforward may take many pages of technical computation, whereas the parts that are conceptually interesting (and novel) are actually relatively brief, once all the more routine auxiliary steps (e.g., treatment of various lower order error terms) are stripped away.

I theorize that this is an instance of Berkson's paradox. I found the enclosed graphic from https://brilliant.org/wiki/berksons-paradox to be a good illustration of this paradox. In this (oversimplified) example, a negative correlation is seen between SAT scores and GPA in students admitted to a typical university, even though a positive correlation exists in the broader population, because students with too low of a combined SAT and GPA will get rejected from the university, whilst students with too high a score would typically go to a more prestigious school.

Similarly, mathematicians tend to write their best papers where the combined conceptual and technical difficulty of the steps of the argument is close to the upper bound of what they can handle. So steps that are conceptually and technically easy don't occupy much space in the paper, whereas steps that are both conceptually and technically hard would not have been discovered by the mathematician in the first place. This creates the aforementioned negative correlation.

Often the key to reading a lengthy paper is to first filter out all the technically complicated steps and identify the (often much shorter) conceptual core.

reply

expand (2)

collapse (2)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ oblomov, fractalkitty, Jonathanglick

allendowney, 13 days ago

@tao I gave a talk about Berkson's paradox recently, which you might like: https://youtu.be/8rUm46mk0Yo

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

allendowney, 16 days ago to random

You might have 99 problems, but heteroskedasticity is not one of them.

An update from Data Q&A:
https://www.allendowney.com/blog/2024/05/26/logarithms-and-heteroskedasticity

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

allendowney, 18 days ago to random

Think Python 3e is off to the printer! Electronic copies should "ship" next week, and print copies in ~3 weeks.

And Bookshop.org is running a promotion:

https://bookshop.org/a/98697/9781098155438

If you make a purchase this weekend, you could get your order refunded!

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

allendowney, 19 days ago to random

Is there something like an average, but it can exceed the maximum of the data?

There is, and it makes more sense than it might sound like.

https://www.allendowney.com/blog/2024/05/24/combining-risks/

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

allendowney, 23 days ago to random

In 1889 Joseph Bertrand posed and solved one of the oldest paradoxes in probability. But his solution is not quite correct – it is right for the wrong reason.

As always, Bayes's Theorem clears up the confusion.

https://www.allendowney.com/blog/2024/05/20/bertrands-boxes/

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

allendowney, 29 days ago to random

From Probably Overthinking It -- the longevity of dogs is one Simpson's paradox nested inside another:

Across all species, larger animals live longer

Across dog breeds, smaller breeds live longer

Within a breed, larger individuals live longer.

https://www.nytimes.com/2024/02/01/science/dogs-longevity-health.html?unlocked_article_code=1.r00.lG6A.-kTTSwpukOBV&smid=url-share

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ villares, christianp

allendowney, 30 days ago to random

The 3rd Edition of Think Python is available now at https://allendowney.github.io/ThinkPython

The print edition is available for preorder, expected to ship in June.

What's new?

The entire book is in Jupyter notebooks that run on Colab, so you can read the book, run the code, and work on exercises -- without installing anything.

reply

expand (2)

collapse (2)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ villares, treyhunner

allendowney, 30 days ago

Each chapter includes suggestions for using virtual assistants like ChatGPT to develop, test, and debug programs, and explore additional topics.

The examples that use turtle graphics now work in Jupyter notebooks!

More testing with doctest and unittest.

And a new, full color, parrot on the cover!

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

allendowney, 1 month ago to random

What better way to spend Friday afternoon than watching me talk about Chapter 7 of Probably Overthinking It?

"Causation, Collision, and Confusion"

https://www.youtube.com/watch?v=8rUm46mk0Yo

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

allendowney, 1 month ago to random

Another installment of Data Q&A: Is it OK to compute the mean of a variable on a Likert scale?

Yes and no.

https://www.allendowney.com/blog/2024/05/03/the-mean-of-a-likert-scale/

Next week I'll discuss the correct pronunciation of Likert.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

allendowney, 1 month ago to random

I was at Google today to give a talk about Chapter 7 of Probably Overthinking It: Causation, Collision, and Confusion.

I'll post the video when it's available, but in the meantime, the slides are here: https://docs.google.com/presentation/d/e/2PACX-1vT3Wb80roqlKxQTQQlug4cRTKIZ304S453OehgE7Xpomed2OdG1xQEDGUo6el5Wfkrhfzl8Dbb79rxe/pub

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

allendowney, 1 month ago to random

This week's installment of Data Q&A is about testing differences in the 85th percentile

https://www.allendowney.com/blog/2024/04/28/testing-percentiles/

Different models yield different p-values, but that's ok -- they don't have to be precise

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

allendowney, 1 month ago to random

The latest installment in the Data Q&A series is about estimating percentiles, the limits of bootstrapping, and quantifying uncertainty due to missing data.

https://www.allendowney.com/blog/2024/04/26/small-percentiles-and-missing-data/

reply

expand (13)

collapse (13)

report

activity

copy /kbin url

copy original url

open original url

Loading...

allendowney, 1 month ago

@avehtari Good question -- I'm not sure. Some of the reduction in ESS is because we're estimating such a small percentile, I think. But yes, there's a ton of structure in the data that the bootstrap is ignoring. Hmm...

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

allendowney, 1 month ago

@avehtari I was thinking about this on my morning run and I have a new theory -- the reduced ESS is a consequence of using KDE. Any values more than a few bandwidths away from the estimate contribute nothing.

Still not sure how much better we would do with a model that takes into account the autocorrelation. Might have to do the experiment.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

allendowney, 1 month ago

@avehtari Thanks for looking into this! There are a couple of things I'm finding confusing here. One is that the CI you got is substantially wider than the one I got. Why is that?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

allendowney, 1 month ago

@avehtari The other is what you said about the tails -- I expected the Gaussian tail of the KDE kernel to match the tail of the data pretty well -- and this figure suggests that it does:

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

allendowney, 1 month ago

@avehtari A Pareto tail would be much thicker, wouldn't it?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

allendowney, 1 month ago

@avehtari Hmm. I think the number of things not making sense to me has exceeded the number of things that can be cleared up in this medium :(

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

allendowney, 1 month ago

@avehtari Ok, but doesn't the figure in my previous message indicate that the Gaussian tail of the KDE fits the data well over the range of the data? If the values below that range are a little smaller or a lot smaller, that would not affect 0.2 percentile.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

allendowney, 1 month ago to random

Which plot indicates a stronger relationship?

Discussion here:
https://www.allendowney.com/blog/2024/04/21/what-does-strength-mean/

reply

expand (4)

collapse (4)

report

activity

copy /kbin url

copy original url

open original url

Loading...

allendowney, 1 month ago

@Biff_Bruise Glad to hear it is useful. The response to the Data Q&A series has been very positive -- and it is really fun to work on!

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...