In a major
step towards its "AI first" dream, Google has developed a
text-to-speech artificial intelligence (AI) system that will confuse you with
its human-like articulation.
The tech giant's text-to-speech system called "Tacotron
2" delivers an AI-generated computer speech that almost matches with the
voice of humans, technology news Website Inc.com reported.
At Google I/O 2017 developers conference, company's
Indian-origin CEO Sundar Pichai announced that the internet giant was shifting
its focus from mobile-first to "AI first" and launched several
products and features, including Google Lens, Smart Reply for Gmail and Google
Assistant for iPhone.
According to a paper published in arXiv.org, the system first
creates a spectrogram of the text, a visual representation of how the speech
should sound.
That image is put through Google's existing WaveNet algorithm,
which uses the image and brings AI closer than ever to indiscernibly mimicking
human speech. The algorithm can easily learn different voices and even
generates artificial breaths. "Our model achieves a mean opinion score
(MOS) of 4.53 comparable to a MOS of 4.58 for professionally recorded speech,"
the researchers were quoted as saying.
On the basis of its audio samples, Google claimed that
"Tacotron 2" can detect from context the difference between the noun
"desert" and the verb "desert," as well as the noun
"present" and the verb "present," and alter its
pronunciation accordingly.
It can place emphasis on capitalised words and apply the proper
inflection when asking a question rather than making a statement, the company
said in the paper.
Meanwhile, Google's engineers did not reveal much information
but they left a big clue for developers to figure out how far they have come in
developing this system.
According to the report, each of the '.wav' file samples has a
filename containing either the term "gen" or "gt."
Based on the paper, it's highly probable that "gen"
indicates speech generated by Tacotron 2 and "gt" is real human
speech. ("GT" likely stands for "ground truth," a machine
learning term that basically means "the real deal".)
0 تعليقات على " Google develops human-like text-to-speech AI "