How to Add Voice to Your Mobile App

December 03, 2018

Consumers enjoy using voice activated technology in a variety of situations, and it's becoming easier than ever for companies to provide reliable voice features. Learn 7 key reasons your business should invest in voice technology for your app.

Recent studies all point to the same truth: We’re on the verge of a voice revolution.

A growing number of people enjoy speech interactions with their smartphones, digital assistants, and household devices. Soon, voice-enabled gadgets will dominate at home and at work.

According to the June 2018 Voice Shopping Consumer Adoption Report, one in five American adults has already tried shopping via voice. More than half of consumers (56%) used smartphones to access voice technology.

In 2016, vocal searches accounted for 20% of all Google mobile inquiries. By 2020, this amount will be somewhere near 50%. Experts predict that the frequency of voice interactions will only increase.

As CEO of a software development company, I’ve been keeping a close eye on this trend. I belong to 42% of smart speaker owners who have bought more than one voice-equipped device.

In this article, I will share 7 benefits of voice technology, as well as insight into how voice works and ways to integrate voice into mobile apps.

7 Benefits of Voice Technology

There are 7 key factors that make voice appealing to consumers. Voice is:

  1. Natural and intuitive
  2. Faster than typing
  3. Hands-free
  4. Multilingual
  5. Simple
  6. Fun
  7. Capable of improving customer satisfaction

1. Voice Is Natural

Spoken language is the most natural, fast, and effective method of human communication.

Voice technology, however, still poses a challenge to hardware, which can’t always pick up on accents, pauses, mispronunciations, and background noise.

Fortunately, the situation is getting better every year. In 2017, Google achieved human parity in speech recognition accuracy (95%), hitting 4.9% error rate.

Voice technology is getting better, leading to fewer errors.

Other tech giants, IBM and Microsoft, reached 5.5% and 5.1% word error rate correspondingly. The results are very close to the human error rate (5% or lower).

As time moves on, technology will increasingly mirror natural human language, improving user experience with voice technology.

2. Voice Is Fast

People like using voice technology because it saves time.

Researchers from Stanford University tested two popular methods of interaction with smartphones: speech recognition and typing on touchscreen. The volunteers spoke either English or Mandarin Chinese.

In both cases, speech recognition software analyzed voice input nearly three times faster than people managed to type.

For now, speed is the number one reason people like voice search, while no-typing takes second place.

3. Voice Is Hands-Free

Voice-driven technologies make daily life easier and more convenient – even when your hands aren’t free.

Research shows that people take advantage of vocal commands when they can’t use their hands. Below is the list of situations that most often trigger voice searching:

  • Driving (58%)
  • Hands full (45%)
  • Hands dirty (39%)
  • Phone out of reach (22%)

Voice technology allows you to interact with your mobile device when your hands are full or it would be dangerous to focus on the screen.

4. Voice Is Multilingual

Voice-activated platforms are constantly developing their multilingual abilities.

By the end of 2018, Google Assistant alone will speak over 30 languages on 95% of all Android mobiles.

No matter what language you speak, chances are you can use voice technology.

5. Voice Is Simple

Voice commands spare you the trouble of browsing through long lists of contacts or endless folders. So it's no surprise that 12% of Americans use speech interaction with devices just to avoid confusing menus.

Businesses also benefit from this simplicity, as they skip spending time and money on designing complex multi-level menus.

6. Voice Enhances Customer Satisfaction

For businesses, voice technology offers new opportunities for customer engagement, retention, and satisfaction.

A 2018 survey conducted by DAC, the largest digital media agency in North America, showed that nearly two-thirds of Americans (63%) use voice assistants to find the opening hours of local shops and offices.

What do Americans look for via voice search assistants?

Additionally, over two-fifths of voice assistant users (43%) use them to make online purchases.

The survey also reveals that the ratio of voice shoppers is even higher among millennials (70%).

More and more people use voice search as an easy and effective way to get information and buy things online. You should take the technology seriously if you want to rank high in search results and attract new customers, especially young ones.

7. Voice Is Fun

It may sound odd, but nearly one in five American adults admitted that they like voice assistants because it’s just fun to use them.

More than two-thirds of people (67%) access entertainment through voice technology, using voice platforms to play music and videos.

Popular culture has even picked up on voice technology. In 2017, Saturday Night Live ran a sketch featuring comedians playfully riffing on common ways older generations interact with Amazon’s Alexa.

Options for Investing in Voice Technology

As voice technology’s popularity grows, there are also a growing number of options for businesses interested in adopting it.

The 3 main options for adopting voice technology:

  1. Cloud solutions
  2. Embedded solutions
  3. Third-party solutions

You can select one or a combination of these resources to adopt voice technology for your business.

Decide Between a Cloud or Embedded Solution

First, you should choose one of the two existing deployment models: cloud solutions and embedded solutions.

Cloud solutions allow the majority of voice recognition and text-to-speech tasks to take place in the cloud. Your app will be lightweight and have a minimal effect on performance.

On the negative side, cloud solutions depend on the quality of a user’s internet connection. Developers can limit the number of voices and languages to reduce the final size by 10 times or even more.

In an embedded solution, users can access your app’s voice features offline, avoiding network latency (defined as delays that happen during real-time communication with the cloud platform).

Embedded solutions, however, can run more slowly because they require a significant portion of a smartphone's resources.

Overall, cloud-based solutions are more popular, especially when it comes to online shopping, which needs a stable web connection.

Use Third-Party Resources

Designing speech recognition algorithms from scratch can take more time and resources than the average company can spare.

Fortunately, tech giants like Google, Apple, and Microsoft are releasing AI-resources for app developers worldwide.

Various speech-connected libraries and SDKs are available online. Certain tools are cross-platform, while others are for Android or iOS only.

Top resources you should know include:

  • Siri Shortcuts: Since July, 2018, Apple has allowed developers to extend the capabilities of Siri within their own apps. Siri Shortcuts is a long-awaited feature that creates shortcuts to the most in-demand functions. It also enables you to add custom voice commands. At the time of writing, Siri Shortcut's toolkit is available for beta-testing by registered iOS developers only.
  • Google Cloud Text-to-Speech API: Google has made the voice engine used in Google Assistant and Google Maps publicly available. The 2018 version boasts 32 variants in 14 languages.
  • Azure Speech APIs: At 2018’s Build Developer Conference in Seattle, Microsoft promised to combine its 4 speech-related tools (Bing Speech, Speaker Recognition, Custom Speech Service and Translation Speech) into a unified service. Currently, they are still available as separate APIs.
  • Amazon Transcribe: This API converts speech to text, detects different speakers, and even allows for custom improvement of the speech recognition. (For example, you can customize it for a specific dialect.) The main drawback is that the tool supports only 2 languages: English and Spanish.

Microsoft, Amazon Web Services (AWS), and Google charge fees for using their APIs, though they let developers test the tools for free for a limited number of hours per month.

In some cases, integrating voice means just adding 3 lines of code. The ease of implementation, however, shouldn’t be the only deciding factor. Above all else, you should balance monetizing your app with the money you spend on speech services.

Ultimately, you have plenty of solutions to choose from.

All Companies Should Explore Voice Technology

Adoption rates for voice technology are rising rapidly. People primarily ‘speak’ with their devices only at home or in the office. But the trend will soon impact all areas of life, changing the way we work, get information, and spend and make money.

Human-machine voice communication is not just a trend – it’s part of the future of commerce. By understanding the benefits of voice technology, as well as options for implementing it, you can make an informed decision for your business.

To stay competitive, you should include the innovation in your business development plan – the sooner, the better.


About the Author

Headshot of Dmitriy SushkoDmitry is a co-founder and CEO of DA-14 Software Development, a strategic IT partner for startups and mid-sized organizations. He is a seasoned leader and entrepreneur with a strong background in web and mobile app development, software architecture, and database design. He is a technology enthusiast who is passionate about putting technology to work, creating innovative solutions, and helping businesses move ahead with their ideas.

Need Help Finding an App Development Company?

SCHEDULE A FREE SHORTLIST CONSULTATION WITH A CLUTCH ANALYST

Based on your budget, timeline, and specifications we can help you build a shortlist of companies
that perfectly matches your project needs.

TELL US ABOUT YOUR PROJECT >