After Apple acquired the SRI International spin-off Siri, a group of SRI scientists decided to continue pushing the frontiers of speech understanding by combining their deep expertise in behavioral science and artificial intelligence. Fascinated by the technology and the potential of voice analytics, Teo Borschberg and Nicolas Perony founded OTO as an SRI International spin-off.
About this API
This API allows you to process entire audio files as well as audio streams using a variety of our models. Each model is optimized to give insights on a certain voice property. Currently, the following models are available:
- speech - can detect music, human speech or other sounds in the provided audio
- gender - can classify gender of the speaker
- arousal - can classify the level of speaker's arousal (low, neutral, high)
- speech-rt - can detect speech, music or other sounds in the provided audio. Compared to the Speech model, the SpeechRT model can react to changes in the audio much faster. The SpeechRT model is our default speech detection model currently, as it offers the best performance on common use cases.
- emotions - can classify the emotion of the speaker (happy, irritated, neutral, tired)
Additionally, since finding voice properties only works if there is voice to classify, all the models are combined with our Volume and SpeechRT models. That way we can also detect silence and speech in provided audio and return more relevant results.
When submitting a file processing job you need to provide the url of the WAV file you would like to process or send the data of a local wav file directly with the request. The response will contain a link under which the results will be available, once the processing job is finished.