KathyReid, (edited )
@KathyReid@aus.social avatar

Delighted to be able to publicise a paper that was presented at the @ALTAnlp 2023 Workshop at the end of last year, co-authored with my supervisor, Associate Professor @eltwilliams, and written as part of my research at School of Cybernetics.

Titled "Right the docs: Characterising voice dataset documentation practices used in machine learning", it combines both exploratory interviews and documentation analysis to characterise how large voice datasets - e.g. , @mozilla's , and several others, document their .

Unsurprisingly, it finds that the practices seen currently do not meet the needs of the practitioners who use these datasets.

We show, once again, in the words of Nithya Sambasivan - "everyone wants to do the model work, but nobody wants to do the data work" ...

https://aclanthology.org/2023.alta-1.6/

Citation:

Reid, K., Williams, E.T., 2023. Right the docs: Characterising voice dataset documentation practices used in machine learning, in: Muresan, S., Chen, V., Casey, K., David, V., Nina, D., Koji, I., Erik, E., Stefan, U. (Eds.), Proceedings of the 21st Annual Workshop of the Australasian Language Technology Association. Association for Computational Linguistics, Melbourne, Australia, pp. 51–66.

  • All
  • Subscribed
  • Moderated
  • Favorites
  • ML
  • rosin
  • ethstaker
  • everett
  • slotface
  • InstantRegret
  • osvaldo12
  • Youngstown
  • kavyap
  • thenastyranch
  • DreamBathrooms
  • GTA5RPClips
  • ngwrru68w68
  • magazineikmin
  • mdbf
  • JUstTest
  • cubers
  • Durango
  • modclub
  • tacticalgear
  • normalnudes
  • cisconetworking
  • khanakhh
  • tester
  • anitta
  • Leos
  • megavids
  • provamag3
  • lostlight
  • All magazines