The Samsung Galaxy S4 will be shipping in 155 countries by the end of next month, and its real-time voice translation to help people communicate across borders may be one of its most ambitious features.
Translating voice conversations in real time has been a goal for devices and software for years, and Samsung claims to have reached it. With its built-in S-Translator app, the Galaxy S4 promises to capture words spoken in one language and then reproduce them in another language, all at the speed of a conversation. Samsung says it will offer support for 10 languages as soon as the phone hits the street.
There's a reason real-time voice translation has been in the works for so long. It involves three different processes that all are at least moderately hard to do well. In 2008, networking colossus Cisco Systems promised its own translation system in a year, designed to go into its TelePresence videoconferencing platform. The company had to backtrack on that a year later, saying the job was harder than expected. Cisco is still working toward such a feature today.
Though S-Translator and other tools have improved and will keep getting smarter, technology hasn't yet eliminated the language barrier, analysts and researchers say. Languages are too complex and open to misinterpretation.
"Even if human beings are doing this, real-time translation remains pretty hard, and I don't think we've seen a breakthrough," Opus Research analyst Dan Miller said.
S-Translator is designed for text messaging and email as well as voice, but it's the face-to-face scenarios that generate the "wow" factor. At the Galaxy S4 launch at Radio City Music Hall, actors dramatized the capabilities of S-Translator with a skit where an American backpacker asked a man in Shanghai what bus to take to a museum. The backpacker spoke the English question into his Galaxy S4 and it came back out in spoken Mandarin. After the man heard the question, he spoke an answer into the phone and his words were converted into English text.
Translating conversations requires three separate processes: converting speech to text, translating those written words into another language, and then converting the translated text back into speech, said Ananth Sankar, a distinguished engineer in Cisco's Collaboration and Technology Group. The first one is an especially hard nut to crack when it comes to natural conversations, according to Sankar.
At the heart of the problem is the way we talk to people, compared to the way we talk to audiences or to computers, Sankar said. He's talking about the "ums and ahhs," the false starts and self-corrections, that break up the fluency of natural speech. They make it much harder for software to interpret a conversation than a command or dictation, he said.
Sign up for CIO Asia eNewsletters.