46 lines
1.6 KiB
Markdown
46 lines
1.6 KiB
Markdown
Azure Ai Transcribing
|
|
=====================
|
|
|
|
This project is meant to demonstrate and document my attempts to create a video transcribe service
|
|
using Azure AI.
|
|
|
|
## Creating Azure Ai Resource
|
|
- I created just plain *Azure AI Services* resource from Azure Portal
|
|
- I chose Sweden Central as location and standard S0 as pricing tier
|
|
- I am using Visual Studio license attached Azure account so I have roughly 150€ of free credits
|
|
- here's how the portal looked after
|
|
|
|

|
|
|
|
- all audio files need to be in specific format, I am using ffmpeg to convert
|
|
|
|
```shell
|
|
ffmpeg -i [INPUT].mp3 -acodec pcm_s16le -ac 1 -ar 16000 [OUTPUT].wav
|
|
```
|
|
|
|
## Running this app
|
|
|
|
- first get your Key and Region data from Azure Portal
|
|
- then set environment variables
|
|
|
|
```shell
|
|
export SPEECH_KEY=your_key
|
|
export SPEECH_REGION=your_region
|
|
|
|
dotnet run
|
|
```
|
|
|
|
## Using this app
|
|
- You need to fill out the `Data` folder with some .wav files
|
|
- The app will then display those files and you can choose which to transcribe
|
|
- The app will then display the transcription
|
|
|
|
### Experimentation results & thoughts
|
|
- I used three different files of varying lengths to try to transcribe each
|
|
- It seems there is a limit as to how long the audio file can be
|
|
- It might be that the silence detection is too strict
|
|
- Yeah documentation says as much
|
|
> This example uses the RecognizeOnceAsync operation to transcribe utterances of up to 30 seconds, or until silence is detected.
|
|
- The results that I do get are very readable
|
|
- Transcribing entire files needed a different code solution
|
|
|