Everybody is talking about smart speakers. I know people who have one. Yet I always wondered what kind of value smart speakers can bring. Reflecting on my own behaviour I couldn’t picture myself talking to my smartphone and I wondered: are smart speakers a fade and the next big disillusion (like virtual reality was).
In today’s article I’ve compiled a series of statistics to build a solid argumentation on why I think the smart speakers trend may be a fade and why smart speakers may be in fact the new tablet. If you are a smart speakers expert and want to contradict me, please do so at the end of this article. Critics are helpful.
Smart speakers on the hype cycle
Technological hypes fade away as fast as they had come.
2017 was the year of deep learning, machine learning and virtual assistants. In 2018 virtual assistants were already on the downside path. Astonishingly, none of the trends featured in 2017 and 2018 are due to mature in the next 2 years.
Where are smart speakers ? Well, smart speakers tap into many different technologies of course but they belong to the more general trend called “connected homes” which is due to mature in the next 5 to 10 years and is on the downward path on the 2018 Gartner hype cycle graph.
Market research statistics about Smart speakers
- How many smart speakers have been sold so far ? 100m smart speakers devices sold by the end of the year 2018
- How many devices can be connected to Alexa ? 4000 devices at the beginning of 2018 that could be connected to Alexa ; today it’s more than 20000 different devices can be controlled through Alexa
- What are smart speakers used for ? Different statistics exist that answer this question. Despite variations the trends are clear : listening to radio is the #1 usage while online purchases is the least used functionality
- Deloitte 2019 prediction reports gives following stats
- 60% to play music
- 52% for weather updates
- 39% to set alarms
- 39% to search for general information
- 38% for amusement
- at the 2018 IBC conferences, Fabrice Rousseau (Head of Alexa Skills at Amazon) mentioned following figures :
- 90% used to ask a question
- 65% to listen to radio
- 5% will be used to make an online purchase
5 Limits of current smart speakers
Limit #1: languages
The first limit is a technological one : language recognition. Currently 95% of smart speakers have been sold in English-speaking countries. While speach recognition had still issues with various English pronunciations, technology has improved.
The situation depicted by the video below might already belong to the past.
Limit #2: smart speakers are not profitable
Sales of smart speakers are driven by promotional actions and huge discounts. Smart speakers are heavily subsidized by manufacturers in the hope of making it the next platforms for all in-house usages.
The real platforms however are not smart speakers. The 3 devices mostly used today are smartphones, television and computers.
Limit #3 : talking to a machine is not a natural user’s habit
The Deloitte predictions 2019 report pinpoints the weakness of general voice recognition as a general habit of technology users. Their survey is a very simple one but the results are powerful. It shows without any possible ambiguity that a vast majority of smartphone, tablet and computer users are NOT using the voice assistant functionality of their devices.
While I can only speculate on the reasons behind this, I suspect that it’s a combination of users still preferring to type because of lower performances of voice recognition (see limit #4 below) and of self-image (honestly, don’t we look ridiculous when we TRY to speak to a machine that barely understands voice instructions and returns most of the time unexpected results).
Limit #4 : typing may still be more efficient
The other limit of smart speakers (and of any voiced controlled device in fact) is that voice recognition technology is far from being perfect and that a fair percentage of voice queries are not being understood. If you have tried the smart assistant on your smartphone, tablet or computer or ever used voice control, haven’t you felt embarrassed of talking to a machine in the hope it will return what you were expecting? The risk of lower-than-average results (which does materialize more often than not) ensures great frustration on the part of the users who eventually prefer to keep using traditional writing to search for information?. This is reflected in the 4th rank of “searching information” in the lit of usages. In short, owner of smart speakers use them for riskless queries which, unfortunately, also happen to be those with the least added value.
Limit #5 : privacy issues
Several scandals have harmed the image of voice controlled systems. Embarked microphones have been accused to spy on users. The Samsung’s smart TV was one of the first cases. A more recent case was the Google’s Nest alarms that had an embarked micro that no one was aware of. Clearly smart speakers and their microphones listening to our home environment can be perceived as a threat since no one realy knows what is captured and how it is used (besides voice controlling).
“the combination of music, weather and alarm setting makes smart speakers look much more like an updated bedsite or kitchen radio than a fundamentally disruptive device”
Smart speakers : the next big desillusion ?
I love that conclusion of Deloitte’s report : “the combination of music, weather and alarm setting makes smart speakers look much more like an updated bedsite or kitchen radio than a fundamentally disruptive device”.
I would have put it differently however (but equally ironical) : smart speakers is the new VR revolution.
Smart speakers have penetrated the market because of low market prices but they are indeed notjhing more than a niche device for micro needs that will soon be forgotten in the drawer (just like your tablet).
The smart speakers / smart assistant revolution is not for now. I think it will take many more years for users to trust smart speakers; not only the privacy side of the device but also the reliability of the technology embarked and the realization of a promise of efficiency. Those are prerequisites for users to accept the image they will one day project of themselves talking to a machine.