azure-ai/AzureAi.Transcriber/README.md
2024-05-08 15:32:46 +03:00

1.6 KiB

Azure Ai Transcribing

This project is meant to demonstrate and document my attempts to create a video transcribe service using Azure AI.

Creating Azure Ai Resource

  • I created just plain Azure AI Services resource from Azure Portal
  • I chose Sweden Central as location and standard S0 as pricing tier
  • I am using Visual Studio license attached Azure account so I have roughly 150€ of free credits
  • here's how the portal looked after

Azure AI resource

  • all audio files need to be in specific format, I am using ffmpeg to convert
ffmpeg -i [INPUT].mp3 -acodec pcm_s16le -ac 1 -ar 16000 [OUTPUT].wav

Running this app

  • first get your Key and Region data from Azure Portal
  • then set environment variables
    export SPEECH_KEY=your_key
    export SPEECH_REGION=your_region 
    
    dotnet run

Using this app

  • You need to fill out the Data folder with some .wav files
  • The app will then display those files and you can choose which to transcribe
  • The app will then display the transcription

Experimentation results & thoughts

  • I used three different files of varying lengths to try to transcribe each
  • It seems there is a limit as to how long the audio file can be
  • It might be that the silence detection is too strict
    • Yeah documentation says as much

This example uses the RecognizeOnceAsync operation to transcribe utterances of up to 30 seconds, or until silence is detected.

  • The results that I do get are very readable
  • Transcribing entire files needed a different code solution