"The Data Liberation Project is an initiative to identify, obtain, reformat, clean, document, publish, and disseminate government datasets of public interest."
[#commemorations] Actuellement en train de finaliser un #jeudedonnees consacré aux célébrations et commémorations nationales en #France depuis 1970, je vous propose pour les jours à venir un petit #quizz sur le sujet ⤵️
I connected #Raleigh 's #dataset of #trees to a #Wikipedia API query for finding nearby items of interest and then to a OpenAI API query so the trees could describe themselves and the area around them.
The MIT researchers found that #MachineLearning models trained for autocaptioning with their dataset consistently generated captions that were precise, semantically rich, and described data trends and complex patterns.
Researchers teach an #AI to write better chart captions.
A new #dataset can help scientists develop automatic systems that generate richer, more descriptive captions for online charts for #blind people.
Has anybody done a Subject Access Request to Nectar/Tesco Clubcard? Given that should have pretty much all the grocery and fuel purchase for the last 10+ years it should make an interesting inflation data set
AI-TRIGGER WARNING: I've asked ChatGPT to revise my writing because it was ass (writing a stream of coherent looking text is not my forte). Proceed at your own discretion....
You might be a #linguist, or an #ML#engineer, doing things like data specifications, filtering or pre-processing or training #ASR, #STT or #TTS models, or you might work in #fairness or #bias evaluation.
If so, I’d love your help to understand current #dataset#documentation practices, and what we can do to make them better as part of my #PhD#research 🤓 ⌨️ 🎤
The #survey takes 10-20 minutes to complete, and you can opt in to win one of 3 gift cards valued at $AUD 50 each.
Research Protocol 2021/427 approved by #ANU Human Research Ethics Committee
Do you work with #voice or #speech#data? You might contribute data, write data specifications for collection, perform filtering or pre-processing, train #ASR or #TTS models, or design or perform evaluations on #ML speech models.
If so, I’d love your help to understand current #dataset#documentation practices, and what we can do to make them better as part of my #PhD#research
The #survey takes 10-20 minutes to complete, and you can opt in to win one of 3 gift cards valued at $AUD 50 each.
Research Protocol 2021/427 approved by #ANU Human Research Ethics Committee
Delighted to be able to publicise a paper that was presented at the @ALTAnlp 2023 Workshop at the end of last year, co-authored with my #PhD supervisor, Associate Professor @eltwilliams, and written as part of my research at #ANU School of Cybernetics.
Titled "Right the docs: Characterising voice dataset documentation practices used in machine learning", it combines both exploratory interviews and documentation analysis to characterise how large voice datasets - e.g. #LibriSpeech, @mozilla's #CommonVoice, and several others, document their #metadata.
Unsurprisingly, it finds that the #dataset#documentation practices seen currently do not meet the needs of the #ML practitioners who use these datasets.
We show, once again, in the words of Nithya Sambasivan - "everyone wants to do the model work, but nobody wants to do the data work" ...
Reid, K., Williams, E.T., 2023. Right the docs: Characterising voice dataset documentation practices used in machine learning, in: Muresan, S., Chen, V., Casey, K., David, V., Nina, D., Koji, I., Erik, E., Stefan, U. (Eds.), Proceedings of the 21st Annual Workshop of the Australasian Language Technology Association. Association for Computational Linguistics, Melbourne, Australia, pp. 51–66.
"The Arsenical Books Database — part of the Winterthur Museum and the University of Delaware’s Poison Book Project — has identified hundreds of examples of 19th-century books that used [green pigments containing arsenic] in their covers and other binding components."
Need help on saving reddit threads (for post-blackout reasons) to Obsidian
AI-TRIGGER WARNING: I've asked ChatGPT to revise my writing because it was ass (writing a stream of coherent looking text is not my forte). Proceed at your own discretion....