Vision & Speech

Multimodal

Image understanding, transcription, and voice assistants for apps.

Image understanding and visual analysis

Accurate speech-to-text transcription

Push-to-talk voice assistant experiences

Natural text-to-speech voices

Vision & Speech