I set Home Assistant voice assist as my default on my phone, I'm super impressed with the speed! There's a way to go with the sentence recognition (which I could improve myself or wait for someone smarter to do it). I'm really excited for this project, I have no doubt it'll end up in hardware form and be a fully configurable, privacy conscious voice assistant. #homeassistant#homeautomation#voice#voiceassistant#tts#android
Coqui awesome Text-to-Speech project needs you, https://github.com/coqui-ai/TTS/issues/2589. Imagine being able to create any speech from a simple text, in multiple languages, with any voices (including voice cloning), based on open source technologies and state-of-the-art algorithms? You can make it real.
random observation but why do so very many #TTS voices make absolutely no distinction between a period and an exclamation mark when the screen reader leaves it up to the TTS voice to interpret them? Question marks, no problem. Exclamation marks ...nope. Just like a period. They are different, folks!
I'd like to use a voice generation tool that isn't deemed harmful or stealing someone else's voice. Is there any sort of "ethical" or community Creative Commons-esque backed model and/or program I can use to turn text into speech, on Linux? Mozilla Voice stuff maybe?
e or gspeak just does not cut it unfortunately with the default voices... Maybe there are better models for that program?
Just a thought. Listening to TTS is like driving. You go fast, and you're generally looking for an ending, a place to be, stuff like that.
Braille is like walking, or running if you can read it very quickly. Some people, like me, read Braille to enjoy the scenery, or to get at all the details. Some people, though, rely on Braille. They cannot drive, so they probably have a lot more stamina or can jog and run rather than walk. The same is for Braille and audio.
This probably isn't a great analogy, but it's something I thought of.
Some blind Android users really want the Eloquence TTS engine back. It will die when 64-bit phones become the norm. They went as far as seriously debating of they could ask phone carriers to step in. It's sad, both because Google could easily have licensed Eloquence, put it in a 32-bit ARM container, and there you go. It's sad that Apple is the only big corporation that spent five minutes and thought "Oh hey we have a license for this now, let's containerize this and ship it for VoiceOver." It's sad that Google doesn't inspire confidence from the blind community at large of Google's ability to uphold an accessible OS and a competitive screen reader. And it's definitely sad that another TTS engine hasn't come along that is any better than Eloquence, which is from the 90's.
For anyone using a #screenReader on #linux, do you have any recommendations or otherwise? I want to test some of my website designs against screen readers, but I'm unable to get #gnome#orca working properly on #ArchLinux . It doesn't seem to ever read anything but what's in my bash console; I can't get it to read from my web browser or otherwise.
If you have to do Speech-to-Text and Text-to-Speech tasks and don't want to send your data to the Internet, I recommend you to try Speech Note (Linux desktop app).
It is easy to use, works offline and supports 57 languages!
So, I've finally bitten the bullet, reverted my jailbreak, and am upgrading to iOS 16. What 3rd-party speech synthesis providers are available for me to use without needing a Mac to install them from source? I know eSpeak-ng is one, but are there any others? #TTS#blind#accessibility#a11y#apple#iPhone#iOS16
Over the past year, I've been experimenting with neural text to speech in various forms. I have done hours of experimentation and research, training models and getting varying results along the way. Some of you may have heard of Piper, an open source synthesizer and add on for NVDA that can be trained by anyone. It is currently in active development, and I have been there from the beginning, testing and evaluating the various versions. For years, I have had a goal to create a high-quality voice that is truly usable by a screen reader user, and yesterday I managed to achieve this. I'm really excited to share Alba, a female Scottish English voice. I'm considering this a beta phase, and I'm looking for feedback to make improvements as needed. Please note that you will most likely get an error upon installation, however the voice should still show up to NVDA, and I'm working on fixing this as soon as possible.
Link to Piper: https://github.com/rhasspy/piper/tree/v0.1.0
Link to addon: https://github.com/mush42/piper-nvda?ref=building.open-home.io
Link to Alba: https://drive.google.com/file/d/1wZHuIll6aEEFd4OdLBCVcxF7bd3PbQTB/view?usp=share_link#TTS#AI#ScreenReader#Piper
You might be a #linguist, or an #ML#engineer, doing things like data specifications, filtering or pre-processing or training #ASR, #STT or #TTS models, or you might work in #fairness or #bias evaluation.
If so, I’d love your help to understand current #dataset#documentation practices, and what we can do to make them better as part of my #PhD#research 🤓 ⌨️ 🎤
The #survey takes 10-20 minutes to complete, and you can opt in to win one of 3 gift cards valued at $AUD 50 each.
Research Protocol 2021/427 approved by #ANU Human Research Ethics Committee
@hl@flohgro with #tts mine is super sarcastic. We call it #snarkhome.
"(Name) left me alone in the office, so I turned out the light. .... (Name) left me alone in the dark."
"The office light got lonely so I Did What Had To Be Done."
"Once upon a time someone left the kitchen light on. I turned it off. That is it. That is the story. I turned off a light."
You might be a #linguist, or an #ML#engineer, doing things like data specifications, filtering or pre-processing or training #ASR, #STT or #TTS models, or you might work in #fairness or #bias evaluation.
If so, I’d love your help to understand current #dataset#documentation practices, and what we can do to make them better as part of my #PhD#research 🤓 ⌨️ 🎤
The #survey takes 10-20 minutes to complete, and you can opt in to win one of 3 gift cards valued at $AUD 50 each.
Research Protocol 2021/427 approved by #ANU Human Research Ethics Committee
Do you work with #voice or #speech#data? You might contribute data, write data specifications for collection, perform filtering or pre-processing, train #ASR or #TTS models, or design or perform evaluations on #ML speech models.
If so, I’d love your help to understand current #dataset#documentation practices, and what we can do to make them better as part of my #PhD#research
The #survey takes 10-20 minutes to complete, and you can opt in to win one of 3 gift cards valued at $AUD 50 each
Research Protocol 2021/427 approved by #ANU Human Research Ethics Committee
The result is quite good (though do you really know what your voice sounds like?).
Audiobook narrators still have a unique skill that AI can't quite replicant, I mean replicate. However, with 2 minutes of audio, it can read outloud a book in my voice far better than I can manage. #ai#tts
Do you work with #voice or #speech#data? You might contribute data, write data specifications for collection, perform filtering or pre-processing, train #ASR or #TTS models, or design or perform evaluations on #ML speech models.
If so, I’d love your help to understand current #dataset#documentation practices, and what we can do to make them better as part of my #PhD#research
The #survey takes 10-20 minutes to complete, and you can opt in to win one of 3 gift cards valued at $AUD 50 each.
Research Protocol 2021/427 approved by #ANU Human Research Ethics Committee
So, the Neural Microsoft voices are very good for reading books. And what did Microsoft remove from Edge? The ability to read EPUB books. Lol, ah well. Samsung TTS is almost as good.
I've been experimenting with Amazon's Polly service. It's their fancy text-to-sort-of-human-style-speech system. Think "Alexa" but with a variety of voices, genders, and accents.
Here's "Brian" - their English, male, received pronunciation voice - reading John Betjeman's poem "Slough":
The pronunciation of all the words is incredibly lifelike. If you heard it on the radio, it mi