IoT gets chatty with DeepMind’s cloud text-to-speech
With cloud text-to-speech, users can power voice response systems for call centres for real-time natural language conversations; and enable IoT devices (e.g. TVs, cars, robots) to talk back to you.
It can also convert text-based media (e.g. news articles, books) into spoken format (e.g. podcast or audiobook).
Google has just revealed the news but cloud text-to-speech is already in use at Cisco and IT communications firm Dolphin ONE.
Cloud text-to-speech offers 32 different voices from 12 languages and variants. Google says it correctly pronounces complex text such as names, dates, times and addresses for “authentic sounding speech right out of the gate”.
In addition, it also includes a selection of “high-fidelity” voices built using WaveNet, a generative model for raw audio created by DeepMind. WaveNet synthesises more natural-sounding speech and, on average, produces speech audio that people prefer over other text-to-speech technologies.