Voice to text in multiple languages for speech-enabled apps
|Rebecca Ray in Programming Monday, November 13, 2017|
Tips on integrating multi-lingual speech recognition into your mobile apps.
With mobile devices so commonplace and being used by billions of people who did not grow up with keyboards, voice-driven apps in less common languages can’t come soon enough. Approximately 46% of internet users in India - the fastest growing e-commerce market in the world - now consume content in languages such as Hindi, Bengali, Tamil, and Urdu.
Developers need to be aware that localizing for multilingual conversations is not the same as translating text in user interfaces. Why? Because voice input is more unstructured due to dialects, accents, speech disorders, and ambient sound.
Here are eight steps to help you get started on delivering spoken interactions in multiple languages:
1.) Recognize that voice-driven platforms are not perfect - and never will be.
Introduced to speech’s potential by the conversational computers in decades of sci-fi films and TV series, humans are understandably frustrated when Alexa or Siri respond with “I don’t understand” to even simple requests. In turn, users raise their voices, speak more slowly, or dumb down their queries - tactics that often don’t lead to the results they want with humans either. Speech platforms must process arbitrarily complex utterances in a variety of accents, phrasings, and dialects. Accordingly, they won’t get everything right. That’s why suppliers are adding inference engines and machine learning software to improve language processing.
2.) Expect speech platforms to rapidly evolve.
There will be winners and losers, with some technology developers already moving beyond mobile phones to allow people to speak to their cars, trucks, and home appliances. Such applications will require support for even more languages as firms offer their products and services to ever-expanding consumer classes worldwide. Keep your eyes open for new players to emerge in China, India, Japan, and Russia.
3.) Acquire expertise.
Find, hire, or outsource development and localization staff to work on speech integration. You will need help evaluating solutions and integrating them with the rest of your development efforts. Because the tools to build virtual assistants such as Alexa or Siri into your apps are just coming to market, you won’t find a lot of available talent. Thus, you may have to educate your best developers or language services providers to develop the expertise required to incorporate these speech platforms.
4.) Invite the people who translate your products and services to the design table.
Make it easier to deliver world-ready, voice-enabled capabilities by learning from colleagues who already understand what’s required during the initial development phase. If your team could benefit from training in how to enable multilinguality and locale support for speech, they are your best friends - whether they are full-time employees or contractors working for your translation provider.
5.) Take advantage of commercial software.
Virtual assistants already offer conversation platforms for a variety of devices. Evaluate what’s available and select the one that best meets your development requirements. For cross-platform apps, you will need one that runs on multiple devices, e.g., Google’s Assistant, Microsoft’s Cortana, or SoundHound’s Hound. Confirm their support for the foreign languages that you require and quiz prospective suppliers about their future plans.
6.) Work within the platform’s ecosystem.
Today’s mobile apps are largely single-function, so conversational capability will be one of the first shared services that many will incorporate. We expect apps to be increasingly redeveloped around microservices. Code your own apps to be self-describing or interrogable so that they can more easily be discovered by others. Providing more metadata about what they do will enhance the ecosystem.
7.) Be a vocal partner of these platforms.
Besides offering feedback, ask for the ability to integrate and process your automated speech recognition (ASR) training data so that you can tune the speech module to meet your app requirements. This will be difficult for some suppliers to deliver because of privacy or business model reasons. For example, Google has kept its machine translation service closed to training by users. However, don’t let that keep you from adding it to your list of requirements.
8.) Prepare for content creation and style to transform.
Text to be used for conversation is not the same as that which is read in silence. Voice means no hands and the ability to summarize big amounts of written content very quickly - we can converse with machines and understand their responses much faster than we can type, especially in Arabic, Chinese, and Hindi. Therefore, you may need to replace textual instructions with video. Or, at least render them more simply so that drivers only need to listen - rather than read - as they drive along the street.
You may not be quite ready yet to deliver speech-enabled apps in foreign languages. However, using conversation to interact more intelligently with digital devices may eventually become too strong for you to ignore. So, get ready now by educating yourself and your team on the possibilities for integrating conversational capabilities. And reach out to localizers to ensure that you understand how to build in world-readiness for more than one language.
Are you paying more taxes than you have to as a developer or freelancer? The IRS is certainly not going to tell you about a deduction you failed to take, and your accountant is not likely to take the time to ask you about every deduction you’re entitled to. As former IRS Commissioner Mark Everson admitted, “If you don’t claim it, you don’t get it.
Get hands-on experience in performing simple to complex mobile forensics techniques Retrieve and analyze data stored not only on mobile devices but also through the cloud and other connected mediums A practical guide to leveraging the power of mobile forensics on popular mobile platforms with lots of tips, tricks, and caveats.
The Chirp GPS app is a top-ranked location sharing app available for Apple and Android that is super easy to use, and most of all, it's reliable.
Write and run code every step of the way, using Android Studio to create apps that integrate with other apps, download and display pictures from the web, play sounds, and more. Each chapter and app has been designed and tested to provide the knowledge and experience you need to get started in Android development.
This content is made possible by a guest author, or sponsor; it is not written by and does not necessarily reflect the views of App Developer Magazine's editorial staff.