tomkarho/azure-ai

tomkarho c0bb7c4cd3

Notes

2024-05-08 15:32:46 +03:00

1.6 KiB

Raw Blame History

Azure Ai Transcribing

This project is meant to demonstrate and document my attempts to create a video transcribe service using Azure AI.

Creating Azure Ai Resource

I created just plain Azure AI Services resource from Azure Portal
I chose Sweden Central as location and standard S0 as pricing tier
I am using Visual Studio license attached Azure account so I have roughly 150€ of free credits
here's how the portal looked after

all audio files need to be in specific format, I am using ffmpeg to convert

ffmpeg -i [INPUT].mp3 -acodec pcm_s16le -ac 1 -ar 16000 [OUTPUT].wav

Running this app

first get your Key and Region data from Azure Portal
then set environment variables

    export SPEECH_KEY=your_key
    export SPEECH_REGION=your_region 
    
    dotnet run

Using this app

You need to fill out the Data folder with some .wav files
The app will then display those files and you can choose which to transcribe
The app will then display the transcription

Experimentation results & thoughts

I used three different files of varying lengths to try to transcribe each
It seems there is a limit as to how long the audio file can be
It might be that the silence detection is too strict
- Yeah documentation says as much

This example uses the RecognizeOnceAsync operation to transcribe utterances of up to 30 seconds, or until silence is detected.

The results that I do get are very readable
Transcribing entire files needed a different code solution