.. | ||
Components | ||
note-resources | ||
Properties | ||
Services | ||
wwwroot | ||
.gitignore | ||
appsettings.Development.json | ||
appsettings.json | ||
AzureAi.Transcriber.csproj | ||
Program.cs | ||
README.md |
Azure Ai Transcribing
This project is meant to demonstrate and document my attempts to create a video transcribe service using Azure AI.
Creating Azure Ai Resource
- I created just plain Azure AI Services resource from Azure Portal
- I chose Sweden Central as location and standard S0 as pricing tier
- I am using Visual Studio license attached Azure account so I have roughly 150€ of free credits
- here's how the portal looked after
- all audio files need to be in specific format, I am using ffmpeg to convert
ffmpeg -i [INPUT].mp3 -acodec pcm_s16le -ac 1 -ar 16000 [OUTPUT].wav
Running this app
- first get your Key and Region data from Azure Portal
- then set environment variables
export SPEECH_KEY=your_key
export SPEECH_REGION=your_region
dotnet run
Using this app
- You need to fill out the
Data
folder with some .wav files - The app will then display those files and you can choose which to transcribe
- The app will then display the transcription
Experimentation results & thoughts
- I used three different files of varying lengths to try to transcribe each
- It seems there is a limit as to how long the audio file can be
- It might be that the silence detection is too strict
- Yeah documentation says as much
This example uses the RecognizeOnceAsync operation to transcribe utterances of up to 30 seconds, or until silence is detected.
- The results that I do get are very readable
- Transcribing entire files needed a different code solution