KathyReid, (edited ) to ML
@KathyReid@aus.social avatar

Delighted to be able to publicise a paper that was presented at the @ALTAnlp 2023 Workshop at the end of last year, co-authored with my #PhD supervisor, Associate Professor @eltwilliams, and written as part of my research at #ANU School of Cybernetics.

Titled "Right the docs: Characterising voice dataset documentation practices used in machine learning", it combines both exploratory interviews and documentation analysis to characterise how large voice datasets - e.g. #LibriSpeech, @mozilla's #CommonVoice, and several others, document their #metadata.

Unsurprisingly, it finds that the #dataset #documentation practices seen currently do not meet the needs of the #ML practitioners who use these datasets.

We show, once again, in the words of Nithya Sambasivan - "everyone wants to do the model work, but nobody wants to do the data work" ...

https://aclanthology.org/2023.alta-1.6/

#RightTheDocs #WriteTheDocs

Citation:

Reid, K., Williams, E.T., 2023. Right the docs: Characterising voice dataset documentation practices used in machine learning, in: Muresan, S., Chen, V., Casey, K., David, V., Nina, D., Koji, I., Erik, E., Stefan, U. (Eds.), Proceedings of the 21st Annual Workshop of the Australasian Language Technology Association. Association for Computational Linguistics, Melbourne, Australia, pp. 51–66.

  • All
  • Subscribed
  • Moderated
  • Favorites
  • megavids
  • thenastyranch
  • rosin
  • GTA5RPClips
  • osvaldo12
  • love
  • Youngstown
  • slotface
  • khanakhh
  • everett
  • kavyap
  • mdbf
  • DreamBathrooms
  • ngwrru68w68
  • provamag3
  • magazineikmin
  • InstantRegret
  • normalnudes
  • tacticalgear
  • cubers
  • ethstaker
  • modclub
  • cisconetworking
  • Durango
  • anitta
  • Leos
  • tester
  • JUstTest
  • All magazines