"Speech To Text" technology allows the transcription of voice into text from an audio or video source.
Different levels of transcription exist:- Voice detection
- Language detection
- Phonetic transcription
- Partial transcription
- Complete transcription
Voxalead is a technology demonstrator developed by the ExaLabs team that searches inside multimedia content (audio and video).
The demo uses a Speech-to-Text transcription module from LIMSI (IT Laboratory for Mechanics and Engineering Sciences, an entity of CNRS) to transcribe German and French political speeches.
Note: LIMSI can transcript the following languages: French, English, German, Spanish, Arab and Chinese. It is possible to interface with other suppliers to handle other languages or to provide transcriptions.

The demo is composed of two parts:
The player displays the transcription of text next to the video. Clicking on a word brings the video directly to this precise time, with search words indicated in bold face (here: Yann Athus-Berthrand).

The next version will feature a new player as well as the addition of time coded comments. It will then be possible to index those comments and search within them.
Different parameters influence the quality of the transcription:To address all these parameters, several modules exist and are specialized for example by language, by signal origin etc.
The openness of Exalead's technological platform allows integration with all specialized technology suppliers on the market. The Exalabs team is open to propositions from partners who specialize in these domains. To get in touch with the team, click here.
Try the demo here: http://voxaleadnews.labs.exalead.com/
"Speech To Text" technology allows the transcription of voice into text from an audio or video source.
Different levels of transcription exist: