The speech APIs are the latest in a series of AI-based technologies Baidu has released publicly, these include facial recognition, optical character recognition, natural language processing and others. In September, the company also open sourced its deep learning framework PaddlePaddle, an easy-to-use platform allowing developers to apply deep learning to their products and services.
We are at the dawn of the AI era. By opening our AI technologies, we will make it easier for everyone to create AI-enabled applications,” says Andrew Ng, chief scientist of Baidu.
The newly released speech technologies are being used in a range of products and services from Baidu and its partners. Long Utterance Speech Recognition enables products to automatically transcribe long audio clips such as interviews, speeches and lectures. Far-Field Speech Recognition enables the recognition of speech from audio sources that are up to 16 feet away, such as voice controlled televisions.
Baidu’s deep learning-based Expressive Speech Synthesis provides a collection of realistic voices, differing in tone and accents, that can be used for devices to read audio books or news aloud –a service already available in Baidu’s products to enhance users’ experiences. With Wake Word technology (previously released as an earlier version) developers can create customized short words or phrases that can be spoken to “wake up” devices, without additional user input needed. For example, a user can take a selfie with his or her phone by just uttering the word “cheese.”
The four APIs are new additions to Baidu’s existing speech API families, which include Speech Recognition, Speech Synthesis, Wake Word and User-Defined Semantics. Baidu launched its first speech recognition in 2013 and has since seen rapid growth in speech use, both within Baidu and among its partners.
In just three years, the daily requests for speech recognition grew from 5 million in 2013 to 140 million this year, and the number of daily requests for speech synthesis stands today at 200 million. In the meantime, the number of developers using Baidu’s speech system has also grown from 10,000 in 2014 to 140,000 this year.
Speech technologies will revolutionize how we interact with technology. Baidu is accelerating this change by opening up its speech technologies to everyone,” says Adam Coates, director of Baidu’s Silicon Valley AI Lab.