Being human: How voice technology is impacting content creation | Heart Internet Blog – Focusing on all aspects of the web

The spoken word is on the rise. The tapping and typing of touchscreen technology is, slowly but surely, giving ground to voice. People are increasingly turning to the voice assistants on their phones and smart speakers to navigate their environment through speech.

Every one of the big players — Amazon, Apple, Google, Microsoft, Samsung — wants a piece of the action. “The last 12 months have been explosive for smart speakers, which have surged into the mass market for two reasons”, Ben Stanton, an analyst at Canalys, told the Guardian earlier this year. “Firstly, smart speakers have become the central control hubs of the smart home ecosystem. Secondly, and most importantly, the price of smart speakers has fallen drastically”.

What’s also true is the machines are simply getting better at talking. The development of natural language processing is humanising the way our devices interact with us. Although current use of voice assistant technology remains limited – finding directions, setting a timer, telling the weather, playing a song – it won’t be long before we are consulting our devices for more complex research or even emotional advice.

This is beginning to have an impact on the way we create content. In the past decade, online content creators have moved sharply in the direction of longer-form content. But does long form cut it for voice? How do we create content that is both optimised for screen-based search and voice?

Voice versus screen

Perhaps the biggest distinction to draw between voice and screen search is in the way results get returned.

With screen search, the user is presented with a list of results. With voice search, we don’t look at a screen unless we choose to. The results are spoken back to us by the machine.

This is where voice’s present limitations kick in. Voice can list out top results for a simple query: ask your smart speaker to name the countries of Europe, and away it goes like a dog playing fetch.

But if you wish to dig deeper—say spell out differences between European countries—the device is far less adept. This compares unfavourably to screen-based search where we are presented with a variety of results that we can visually scan for nuance and complexity.

An example of a Bing search result for the term difference between countries in Europe
Microsoft’s Bing does a good job of contextualising a complex search query

Let’s take another example: Amazon’s Echo, currently the best-selling device in the smart speaker category, is quick to provide a list of top-rated restaurants for a local area. It is not, however, able to explain why it recommends particular establishments. The information is out there on the internet – largely in the form of customer reviews – and is easily accessible through screen-based search, but Amazon isn’t yet prepared to venture a response to this more complex query.

Of course, it is only a matter of time before Amazon sufficiently optimises its database to retrieve customer reviews and deliver insights based on what it reads.

But don’t hold your breath: the process underpinning this is complicated. It requires your device not only to identify and read relevant customer reviews, known as natural language processing, but to reformulate them in a way that helps answer your original question, known as natural language generation.

This involves not simply repeating verbatim what it retrieves from its database. The technology needs to be able to generate insights that answer your question based on its contextual understanding of the subject.

Think snippets

Reaching a point where machines can naturally generate insights and engage us in the kind of conversation we would expect when speaking with a friend or calling a customer service representative is a few years away still. It is the Holy Grail for artificial intelligence researchers, and the prize for the companies that dominate this space will be huge.

For now, one thing voice assistants enthusiastically embrace is text snippets. These are the clear, authoritative definitions that find their way into, for example, Google’s knowledge graphs and sidebars right at the top of its search results.

An example of a Google search result for the term what is hypoglycemia
Google works hard to bring as much relevant information as possible to the top of its search results

Snippets are great for screen-based search and they work great for voice search too. You have a direct question (“What is hypoglycemia?) and a Wiki-style answer that would normally appear at top of the results page can be voiced back by your smart device.

So what does this mean for content creation? How can you structure digital content both with screen and voice audiences in mind?

According to reporting from Search Engine Journal, John Mueller, webmaster trends analyst at Google, recommends content creators and SEOs focus on making it easy for search engines to understand quickly what your material is about. He also advises that content be “written in a way that can be read aloud”.

“If you write naturally and you write in a clear kind of language that’s consistent across the type of queries you want to target, that’s the type of information that we could pick up for voice as well,” he commented in a hangout earlier this year.

Don’t over-optimise for voice, though. An extensive collection of snippets that might be perfect for voice won’t do well for screen consumption. People want to read well-constructed information that flows, not a series of disparate entries written with database retrieval in mind.

By the same token, don’t make it difficult for the machines to get to your information. Too much data buried in tables or overlaid in images will likely be overlooked for voice.

Speaking up

There is little doubt that we are moving into an era of more interactive and conversational content. The days of people blankly staring into their devices for hours on end are numbered.

Design your content with this in mind. Avoid corporate speak, and think in terms of what people might be asking about your industry and how your organisation can answer these questions. Loosen up your content and give it voice.

It seems perhaps topsy-turvy, but the more human, helpful, and natural we can make ourselves and our online content, the more we will appeal to the machines. Whoever would have thought that being human is possibly the best thing we have going for ourselves these days?


Please remember that all comments are moderated and any links you paste in your comment will remain as plain text. If your comment looks like spam it will be deleted. We're looking forward to answering your questions and hearing your comments and opinions!

Leave a reply

Comments are closed.

Drop us a line 0330 660 0255 or email