Speech-to-text (ASR)
Automatic speech recognition that converts spoken audio into written text.
Speech-to-text, or automatic speech recognition (ASR), turns an audio stream into transcribed words. It is the first stage of a translation pipeline: the spoken source language is recognised before it can be translated.
Streaming ASR emits words as they are heard rather than waiting for the end of a sentence, which is essential for low-latency, real-time translation.
Related terms
Talk to anyone, in any language
Real-time translated calls in your own voice, across 24 languages.
View pricing