The human voice, with all its subtlety and nuance, is proving to be an exceptionally difficult thing for computers to emulate. Using a powerful new algorithm, a Montreal-based AI startup has developed a voice generator that can mimic virtually any person’s voice, and even add an emotional punch when necessary. The system isn’t perfect, but it heralds a future when voices, like photos, can be easily faked.
When Siri, Alexa, or our GPS talk to us, it’s fairly obvious that we’re being spoken to by a machine. That’s because virtually every text-to-speech system on the market relies on a pre-recorded set of words, phrases, and utterances (recorded from voice actors), which are then strung together in Frankenstein-like fashion to produce complete words and sentences. The end result is a vocal delivery that sounds distinctly uninspiring, robotic, and at times laughable. This approach to voice synthesis also means that we’re stuck listening to the same pre-recorded, monotonous voice over and over again.