2019 - BBC article "AAAS: Machine learning 'causing science crisis'"... - Machine Learning

CCochard, 6 months ago

2019 - BBC article "AAAS: Machine learning 'causing science crisis'"
The two main points are:
1- ML can find patterns that don't exist in other datasets
2- no understanding of uncertainties in ML.

My question is are those intrinsic to ML or are they user-based?

I understand the article is a few years old and things change quickly in a dynamic field
https://www.bbc.com/news/science-environment-47267081

#MachineLearning #Reproductibility

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Image

Image alternative text

CCochard, 6 months ago

@HydrePrever is this what we were talking about last time about students (aka future scientists) not wanting to know what's under the hood?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

HydrePrever, 6 months ago

@CCochard it's clear to me that some ML methods are "black box by design". Many users (not only students) are happy with it the way it is. The need to produce (to produce papers in academic research, to satisfy the customer command in stats consulting organisations) has certainly something to do with it...

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

CCochard, 6 months ago

@HydrePrever That's interesting!
Are you saying that some ML code are designed to be a blackbox and it would be difficult to go through the hassle of understanding the system if one were to try to?
Or maybe with less intent: they weren't design thinking that people would want to understand the blackbox and there's no such functionality?

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

HydrePrever, 6 months ago

@CCochard it's in between. I'm no specialist but I'd say that some methods learn too many parameters to make the resulting predictor interpretable, e.g. random forests or neural networks. The most popular ways to try to peek under the hood is to identify important variables, for example by removing one variable at a time and looking at how much the predictions quality lowers. But I'm really not competent, @krazykitty may have a different opinion.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

krazykitty, 6 months ago

deleted_by_author

Loading...

CCochard, 6 months ago

@krazykitty @HydrePrever Thank you for the explanations! Do you have an reasonably accessible resources to look at the limits of ML? (I am decently skilled at math for a physicist...)

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

CCochard, 6 months ago

@krazykitty @HydrePrever In a weird turn of events, my use of ML so far as been to find small features to see if it was worth developping a physics-based model to understand it.
The advantage of using PCA (in the case I am thinking of) is that I reduced the number of parameters compared to the physics-based one. Seeing that there's potentially something interesting makes the time-investment a lot more worthwhile.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

krazykitty, 6 months ago

deleted_by_author

Loading...

HydrePrever, 6 months ago

@krazykitty @CCochard yep it's different but related: if you don't care much about what's under the hood, when you the occasion to see that under the hood there's a hotchpotch of pipes and commutators, you're conforted in your decision to not worry about it

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

jaztrophysicist, 6 months ago

deleted_by_author

Loading...

krazykitty, 6 months ago

deleted_by_author

Loading...

CCochard, 6 months ago

@krazykitty @HydrePrever Really? I am not saying that the results would be published coming staright out of ML, but it motivates the development of our traditional models which are extremely time consumming.

I struggle to see how it's different from me plotting my data and checking there's indeed something before going through an in-depth analysis. But maybe that's the problem with naive ML users

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

krazykitty, 6 months ago

deleted_by_author

Loading...

krazykitty, 6 months ago

deleted_by_author

Loading...

+ brome

jaztrophysicist, 6 months ago (edited 6 months ago)

deleted_by_author

Loading...

18+ dioux70, 6 months ago

@jaztrophysicist @krazykitty
In the same vein, when I was in uni (mind you, I'm old, this was a time before deep learning "won" and when all kind of AI approaches were still competing and image recognition was still a huge challenge) the following tale (possible urban legend) circulated:

A lab had been approached by the military to make an automatic threat classification algo for vehicles - basically tell whether this photo contains a tank or only toothless trucks.

Tried a neural network approach - scanned the supplied photos, added the metadatas, divided the files in two sets, a learning set and a test set, and made the network learn. And miracle! the test set gave nearly 100% correct results!

They were not dumb, they knew that was suspiciously good. Especially in those old day when it was a challenge to get any complex enough neural network to run in a practical time on the available processing power - you usually had to compromise precision for feasibility. But they couldn't find any bug, like the same algo worked as expected on other problems, they had crosschecked the separation of the learning and testing sets, and that the algo was only using the pixels and no metadata or file path or things like that. And time was running out. So they declared the prototype ready for testing.

And when the military came in with new photos, it all came crashing down with a 50% correct result, basically the algo was not doing better than a randomiser.

After pulling their hair out trying to find a bug, a frustrated programmer decided to rework everything from scratch and started pinning all the learning set photos on a break room wall, then grouping them by similarities. And there, the problem dawned on them.

The military photographer instructed to provide "one set of tank photos and one set of truck photos" didn't bother with sorting photos after development. They just set out one day to photograph tanks, the next day to photograph trucks, labeled the rolls, and got done with it. (this happened when serious photographers still used rolls and not digital) And the weather had changed during the night.

So the ''tanks detection algo" was actually just a "blue sky detection algo"...

@CCochard @HydrePrever

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

18+ gboussard, 6 months ago

@dioux70 @jaztrophysicist @krazykitty @CCochard @HydrePrever
#ShitInShitOut!
You can't do nothing with a shitty dataset. No matter how much you tune the algorithm…

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

18+ krazykitty, 6 months ago

deleted_by_author

Loading...

18+ gboussard, 6 months ago

@krazykitty @dioux70 @jaztrophysicist @CCochard @HydrePrever
Is there a real-world dataset that is not garbage? (Real question)

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

18+ jaztrophysicist, 6 months ago

deleted_by_author

Loading...

Add comment