Aiming to connect more people speaking different languages, Microsoft plans to roll out a beta version of its speech-translation technology on Skype later this year.
The new Skype Translator will first be available as a Windows 8 beta app before the end of this year, said Gurdeep Pall, corporate vice president of Skype and Lync.
The translator is an offshoot of a research project at Microsoft designed to provide fluent, cross-lingual conversations between speakers of different languages. It is an example of why Microsoft invests in basic research, Pall said.
At a conference hosted by news website Re/code, Pall gave a demo of "near real-time audio translation" from English to German and vice versa, combining Skype voice and instant messaging technologies with Microsoft Translator text translation and neural network-based speech recognition. He spoke with another Microsoft employee, Diana Heinrichs, who spoke German.
The technology was publicly demonstrated in 2012 in Tianjin, China, by former Microsoft Research head Rick Rashid, whose speech was translated from English to Mandarin.
Starting with the Windows version and a few languages, Microsoft plans to add more languages and support for a variety of computers and devices that people use to connect to Skype, Re/code said.
Making the translator available later this year as a limited beta has involved advances in translation, speech recognition, and language processing, combined with contributions from Microsoft engineering and research teams around the world, according to Microsoft Research.
Teams at Microsoft Research have collaborated on the use of artificial neural networks for large-vocabulary speech recognition. This approach is said to have greater potential than current commercially available speech-recognition technology such as voice-to-text software which requires training the system and handles a small vocabulary to be accurate.
"The ultimate goal of automatic speech recognition is to deliver out-of-the-box, speaker-independent speech-recognition services--a system that does not require user training to perform well for all users under all conditions," said Microsoft Research's Janie Chang.
Other companies have also been working on speech translation, including Google and NTT DoCoMo.