With the growing appeal of smart speakers and voice input, consumers are going to increasingly expect audio content. That’s why there’s a budding competition between Google and Amazon to turn text into spoken words. Companies and brands will need to adapt to the spoken-word web.
The Third Wave Of Personal Computing
We are in the midst of the third wave of personal computing innovation.
- First was the introduction of the personal computer itself, with it’s keystroke-driven command line interface,
- Then the graphical user interface and the mouse were introduced,
- We are now living through the age of camera capture and voice activation.
Voice activation, at it’s most challenging, requires the push of a button as in the case of smartphone apps but in the case of smart speakers, merely a wake word. Unlike camera capture, voice activation is by far the more natural method of communicating via technology.
The Present Is Voice Activation
The demand for voice activated devices has already been demonstrated and it’s growing.
Consumers like the ease of use and access to information smart speakers provide:
One in six Americans now own a smart speaker.
As is typical for the adoption of new technologies, younger people are more likely to adopt them than older people:
Nearly half of American smart phone owners use a voice activated app at least once a week:
Smart speaker owners are using voice commands in place of typing or swiping and that behavior is prompting them to use voice activated apps more often on their smart phones:
The top use for 60% of smart speaker owners is to answer general question:
And they are asking for specific types of information depending upon the time of the day:
Two thirds of Amazon Echo owners have asked Alexa to read them the news:
Amazon has got a jump start on the competition, so it currently dominates market share but Google Home owns a sizable chunk and Apple recently released its HomePod speaker:
Smart Speaker Ecosystems
Science fiction writer William Gibson once observed that the future is here, it’s just unevenly distributed.
If the future is the spoken word, then it is imperative for those producing content to understand how it is distributed; turns out, it’s distributed unevenly.
Let’s take a look at the ecosystems of each major smart speaker platform.
Apple HomePod Ecosystem
While late to the game, Apple has significant assets to apply to its new smart speaker product.
With Apple, you start with music. The company is boasting about the HomePod’s high-fidelity sound. With iTunes, HomePod owners can stream their personal music libraries to the speakers. And then there’s Apple Music, the company’s streaming service.
Finally, via iTunes, HomePod owners have access to the largest podcast directory on the planet.
We can expect right integration with Apple’s other product, of course.
And then there’s Siri, the personal digital assistant that is practically synonymous with voice activation. Unfortunately, Apple’s voice technology seriously lags behind Amazon and Google. Walt Mossberg summed up my experiences with Siri for The Verge in 2016:
On too many occasions, Siri either gets things wrong, doesn’t know the answer, or can’t verbalize it. Instead, it shows you a web search result, even when you’re not in a position to read it.
As a smart speaker, HomePod answered 52.3% of queries correctly compared to recent tests of Google Home at 81%, Alexa at 64%, and Cortana at 57%.
Siri can always serve up search results, as it often does, but on a smart speaker that’s not merely useless, it’s practically insulting.
Google Home Ecosystem
Google, conversely, can read search results…aloud. The company has 20 years worth of content it has indexed for its search engine.
The company has been collecting and analyzing voice data since at least 2008, when it introduced voice search capabilities to its iPhone app. The New York Times reported at the time:
Users of the free application, which Apple is expected to make available as soon as Friday through its iTunes store, can place the phone to their ear and ask virtually any question, like “Where’s the nearest Starbucks?” or “How tall is Mount Everest?” The sound is converted to a digital file and sent to Google’s servers, which try to determine the words spoken and pass them along to the Google search engine.
Throw on top of voice recognition, language recognition. In 2015, TechCrunch reported that Google added voice capabilities to Translate to offer real-time voice translation. The Google Translate app for Android and iOS supports more than 100 languages and can translate 37 languages via photo, 32 via voice in “conversation mode”, and 27 via real-time video in “augmented reality mode.”
So, Google’s got real cred in the areas of voice recognition and analysis. Take that technical acumen and apply it to its database of basically the whole web, and you’ve got a vast ecosystem of answers to an infinite number of questions. The Guardian‘s Samuel Gibbs illustrates:
Home really shines when you ask it something obscure or out of the ordinary.
For instance, did you know foxes mate in January and give birth 53 days later? And that urban fox cubs venture out into the open from about April? Neither did I, until Assistant read out the information from Dartford.gov.uk, an answer that was apparently trustworthy and from a site that’s relatively local to where I live.
Assistant is capable of answering questions directly from Google’s built-in encyclopedia, but it’s also capable of performing web searches to find the answer to the question you asked. It’s incredibly powerful
Google Home’s ability to find the correct answer to the question posed and read it back arguably makes it the most versatile and valuable of the smart speakers on the market.
Amazon Echo Ecosystem
As the unquestioned leader in the smart speaker market, Amazon is no slouch when it comes to voice recognition and analysis.
In addition to the data set it has already accumulated through its line of Echo speakers, Amazon has the eCommerce dimension that no one else really has. Amazon has my credit card number, it knows where I live, and it has my purchase history since I became a customer. It also has the largest database of commerce-related search queries conducted within its store. It perfected frictionless commerce on the Web and now it is working to perfect voice commerce.
Where Apple has iTunes/Apple Music and Google has YouTube/Google Play, Amazon has its own Prime Music. Check on the music side of the ledger.
Aside from Music, though, Amazon’s biggest content assets are the Washington Post and Audible.com. Developers can build apps (or Skills, as Amazon calls them) for its speakers but users have to install them, so that’s a barrier to getting third party content into the system.
It is that content deficit Amazon appears to be addressing with the release of the Amazon Polly WordPress plugin.
Polly is a text-t0-speech component of Amazon Web Services that uses advanced deep learning technologies to synthesize speech that sounds like a human voice.
Polly uses machine learning technologies under the hood to deliver more life-like speech. For example, Polly understands that the word “live” would be pronounced differently based on its usage. In the phrases “I live in Seattle” and “Live from New York,” the word is spelled the same but is not spoken in the same way. That means the voices sound more natural than some other, more basic voice-to-text engines.
The Polly speech engine launched with 47 male and female voice and support for 24 languages.
The technology’s capabilities have also evolved, with added support for things like whispering, speech marks, a timbre effect, and dynamic range compression. These sorts of voice technology advancements are also things that make Alexa sound more natural, too.
The plugin is free but Amazon Polly is not.
The Amazon Polly free tier includes 5 million characters per month for speech or Speech Marks requests, for the first 12 months, starting from the first request for speech. The pay-as-you-go option is $4.00 per 1 million characters for speech requests (when outside the free tier).
By creating a WordPress plugin that taps into this service, the plugin’s “Pollycast” feature gives bloggers the ability to essentially turn their text-based posts into podcast episodes with a feed customized for iTunes. That’s a massive benefit for content creators at a very low price point.
It has the added allure of opening up a massive new audience to bloggers via iTunes and other audio aggregators.
The benefit to Amazon, however, is substantial. Not only does it provide a massive data set upon which to train and refine its text-to-speech technology, it also provides a massive new source of audio content to pipe into the Echo, providing a bulwark against Google’s current advantage.
How are you planning for the spoken-word Web?
Get The Success @ Creative PR Newsletter
The mission of the Success @ Creative PR newsletter is to help you succeed in your communications career by giving you valuable tips, trends, cool tools, insights and inspiration that will help you throughout your career. Get new marketing stats every week! Click the button below to subscribe: