Deeptone Acoustic API
Deeptone Acoustic API

Machine-learning-powered voice analytics API, using the latest DeepTone™ models.

About OTO

After Apple acquired the SRI International spin-off Siri, a group of SRI scientists decided to continue pushing the frontiers of speech understanding by combining their deep expertise in behavioral science and artificial intelligence. Fascinated by the technology and the potential of voice analytics, Teo Borschberg and Nicolas Perony founded OTO as an SRI International spin-off.

About this API

This API allows you to process entire audio files as well as audio streams using a variety of our models. Each model is optimized to give insights on a certain voice property. Currently, the following models are available:

  • speech - can detect music, human speech or other sounds in the provided audio
  • gender - can classify gender of the speaker
  • arousal - can classify the level of speaker's arousal (low, neutral, high)
  • speech-rt - can detect speech, music or other sounds in the provided audio. Compared to the Speech model, the SpeechRT model can react to changes in the audio much faster. The SpeechRT model is our default speech detection model currently, as it offers the best performance on common use cases.
  • emotions - can classify the emotion of the speaker (happy, irritated, neutral, tired)

Additionally, since finding voice properties only works if there is voice to classify, all the models are combined with our Volume and SpeechRT models. That way we can also detect silence and speech in provided audio and return more relevant results.

When submitting a file processing job you need to provide the url of the WAV file you would like to process or send the data of a local wav file directly with the request. The response will contain a link under which the results will be available, once the processing job is finished.

Simple Transparent Pricing

No long term commitments. One click upgrade/downgrade or cancellation. No questions asked.

Free Plan

No credit cards required
50 Requests / Daily, 250 Requests / Monthly

Gold Plan

Monthly subscription
250 Requests / Daily, 7,500 Requests / Monthly

Diamond Plan

Monthly subscription
1,250 Requests / Daily, 36,000 Requests / Monthly

Custom Plan

Monthly subscription
Fully customizable
Speaker Insights from audio
Acoustic Insights from audio
Detect emotions in audio
Detect behaviors in audio

Ready to try it out?

We offer a free plan. No credit cards required!

or see documentation

See Also

View All