#ASR - kbin.social

KathyReid, 16 days ago to philosophy

Folks, I'm starting my post-#PhD job search low-key on the side while I write up my #thesis.

I have an odd collection of skills - #Linux, #Python, #Jupyter, #pandas, #DevRel, and I've done a lot of work in team leadership and management, and have led a multi-million $ not for profit in the past. Keynote speaker.

My speciality is #voice and #speech AI, more on the #ASR side with models like #Whisper.

I'm looking for something that harnesses all of these skills - and it will be a senior role with senior pay, given my experience, qualifications and proven capability. I have time and will be discerning about my next step.

Job titles that might fit here would be Senior Research Engineer, Engineering Lead, Lead AI Engineer or similar.

Looking for fully remote work, with one day a fortnight max in #Melbourne, AU. If you don't believe in #RemoteWork or #WFH, we're not a good fit.

Super keen on something full time rather than splitting my attention over multiple part-time roles.

Looking to start around August, so a fair amount of lead time.

Keen on organisations that have strong values alignment - #FAIR and #CARE data use, #EthicalAI, AI for social good.

No crypto, no web3, no deepfake stuff.

Check out my LinkedIn for more info on my background:
https://www.linkedin.com/in/kathyreid/

#FediHired #FediJobs #GetFediHired

reply

expand (4)

collapse (4)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ decryption, andrewfeeney, Girgias, grmpyprogrammer +1 more

KathyReid, 2 months ago to mastodon

A warm welcome to #Mastodon to @thorstenvoice - one of the best communicators about #ASR #TTS and #STT in the world. His #OpenSource #German #Deutsche dataset is in use in many places.

Please make Thorsten welcome 👋

#Introduction

reply

expand (4)

collapse (4)

report

activity

copy /kbin url

copy original url

open original url

Loading...

thorstenvoice, 1 month ago

Thanks @potungthul for your nice welcome 😊.

To clear up the hashtags a little bit:
Think of the components of a voice assistant / smartspeaker.

You need #stt (speech-to-text) or #asr (automatic speech recognition) on the "input" side of a user request and #tts (text-to-speech) on the "output" side.

To throw in another technology - #nlp (natural language processing) is used in the "middle" to really understand what the user request is all about.

cc: @KathyReid

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

KathyReid, 3 months ago to datascience

For folks who work in #DataScience, what's the easiest way for me to to calculate the #CosineSimilarity of two strings? I'm looking at sklearn cosine_similarity first.

Related to hallucination detection in #ASR - low cosine similarity indicative of hallucination.

reply

expand (7)

collapse (7)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ thepoliticalcat

KathyReid, 4 months ago to technology

Here's a #BookReview I wrote of Tobias Dengel and Karl Weber's "The Sound of the Future" - which claims that #voice #technology like #ASR, #TTS and #synthetic #speech are transformative, and that businesses should start to invest heavily in them.

While the book covers a lot of ground, it leaves many more critical questions unanswered in its unabashed techno-optimism.

https://blog.kathyreid.id.au/2024/01/04/book-review-the-sound-of-the-future-by-tobias-dengel-with-karl-weber/

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

governa, 6 months ago to ai

Automatic Speech Recognition #Ai Assistant :raspberrypi:

Turning a #RaspberryPi 4B into a satellite for self-hosted language model, all with a sprinkle of #ASR and #NordVPN Meshnet

https://hackaday.io/project/193635-automatic-speech-recognition-ai-assistant

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

Kiloku, 7 months ago to random

Does anyone here know how to use #kaldi or other speech recognition stuff?

I tried whisper.cpp but it apparently can only use OpenAI's models, so it's not an option, on ethical grounds.

I want to implement cross-platform voice commands into #Freespace Open, as it currently only works in Windows with the Microsoft SAPI.

(boosts welcome)

#speechrecognition #ASR

reply

expand (12)

collapse (12)

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ thepoliticalcat, grrrr_shark, KathyReid

chenzi, 7 months ago to linguistics

Are you looking for a Cantonese forced aligner? Check out my new #Kaldi (source) tutorial on training models of HK Cantonese: 🌟https://chenzixu.rbind.io/resources/3asr/sr3/

I also replicated the process of training acoustic models for HK Cantonese in a streamlined MFA workflow. It is easily applicable to many other languages. Check out the MFA tutorial: 🌟 https://chenzixu.rbind.io/resources/3asr/sr4/

#linguistics #phonetics #ASR #academic #speechrecognition

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ KathyReid

KathyReid, 8 months ago to OpenAI

For folks who work with #ASR #SpeechRecognition, specifically #Whisper from #OpenAI - I have heard some anecdotal evidence of transcription with the medium-en model returning paragraphs of "junk" content, like weather reports and adverts for golfing supplies.

I have three confirmed reports from transcripts of interviews of unrelated topics, and am curious if there are other (as yet unreported) instances of similar?

If so, please let me know - DM for email address.

Boosts appreciated.

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ Paxxi

KathyReid, 1 year ago to random

ICYMI: Do you work with #voice or #speech #data?

You might be a #linguist, or an #ML #engineer, doing things like data specifications, filtering or pre-processing or training #ASR, #STT or #TTS models, or you might work in #fairness or #bias evaluation.

If so, I’d love your help to understand current #dataset #documentation practices, and what we can do to make them better as part of my #PhD #research 🤓 ⌨️ 🎤

The #survey takes 10-20 minutes to complete, and you can opt in to win one of 3 gift cards valued at $AUD 50 each.

Research Protocol 2021/427 approved by #ANU Human Research Ethics Committee

Boosts appreciated 💕

https://anu.au1.qualtrics.com/jfe/form/SV_cSFODa5osYtm96e

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ Atexjam

dougholton, 1 year ago to accessibility

Automatically generated video captions are not sufficiently accessible as they lack proper punctuation, capitalization, and spelling, as explained further in this #NoMoreCraptions video by Rikki Poynter: https://youtu.be/-O4YcVQt5NM
Until YouTube and others improve their automatic captions, here's a Python script that runs OpenAI's #Whisper locally on your computer:
https://github.com/aichr/yt-whisper
See also https://github.com/sanchit-gandhi/whisper-jax
#accessibility #a11y #asr

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

+ teachpaperless

KathyReid, 1 year ago (edited 1 year ago) to random

ICYMI: Do you work with #voice or #speech #data?

You might be a #linguist, or an #ML #engineer, doing things like data specifications, filtering or pre-processing or training #ASR, #STT or #TTS models, or you might work in #fairness or #bias evaluation.

If so, I’d love your help to understand current #dataset #documentation practices, and what we can do to make them better as part of my #PhD #research 🤓 ⌨️ 🎤

The #survey takes 10-20 minutes to complete, and you can opt in to win one of 3 gift cards valued at $AUD 50 each.

Research Protocol 2021/427 approved by #ANU Human Research Ethics Committee

https://anu.au1.qualtrics.com/jfe/form/SV_cSFODa5osYtm96e

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

KathyReid, 1 year ago to random

Do you work with #voice or #speech #data? You might contribute data, write data specifications for collection, perform filtering or pre-processing, train #ASR or #TTS models, or design or perform evaluations on #ML speech models.

If so, I’d love your help to understand current #dataset #documentation practices, and what we can do to make them better as part of my #PhD #research

The #survey takes 10-20 minutes to complete, and you can opt in to win one of 3 gift cards valued at $AUD 50 each

Research Protocol 2021/427 approved by #ANU Human Research Ethics Committee

https://anu.au1.qualtrics.com/jfe/form/SV_cSFODa5osYtm96e

reply

report

activity

copy /kbin url

copy original url

open original url

Loading...

KathyReid, 1 year ago to random

Do you work with #voice or #speech #data? You might contribute data, write data specifications for collection, perform filtering or pre-processing, train #ASR or #TTS models, or design or perform evaluations on #ML speech models.

If so, I’d love your help to understand current #dataset #documentation practices, and what we can do to make them better as part of my #PhD #research

The #survey takes 10-20 minutes to complete, and you can opt in to win one of 3 gift cards valued at $AUD 50 each.

Research Protocol 2021/427 approved by #ANU Human Research Ethics Committee

https://anu.au1.qualtrics.com/jfe/form/SV_cSFODa5osYtm96e

reply

expand (11)

collapse (11)

report

activity

copy /kbin url

copy original url

open original url

Loading...