azure-ai/AzureAi.Transcriber/README.md

Azure Ai Transcribing
=====================

This project is meant to demonstrate and document my attempts to create a video transcribe service
using Azure AI.

## Creating Azure Ai Resource
- I created just plain *Azure AI Services* resource from Azure Portal
- I chose Sweden Central as location and standard S0 as pricing tier
- I am using Visual Studio license attached Azure account so I have roughly 150€ of free credits
- here's how the portal looked after 

![Azure AI resource](note-resources/azure-ai-resources-list-after-creation.png)

- all audio files need to be in specific format, I am using ffmpeg to convert

```shell
ffmpeg -i [INPUT].mp3 -acodec pcm_s16le -ac 1 -ar 16000 [OUTPUT].wav
```

## Running this app

- first get your Key and Region data from Azure Portal
- then set environment variables

```shell
    export SPEECH_KEY=your_key
    export SPEECH_REGION=your_region 
    
    dotnet run
```

## Using this app
- You need to fill out the `Data` folder with some .wav files
- The app will then display those files and you can choose which to transcribe
- The app will then display the transcription

### Experimentation results & thoughts
- I used three different files of varying lengths to try to transcribe each
- It seems there is a limit as to how long the audio file can be
- It might be that the silence detection is too strict
  - Yeah documentation says as much
> This example uses the RecognizeOnceAsync operation to transcribe utterances of up to 30 seconds, or until silence is detected.
- The results that I do get are very readable
- Transcribing entire files needed a different code solution
Notes 2024-05-08 10:15:01 +00:00			`Azure Ai Transcribing`
			`=====================`

			`This project is meant to demonstrate and document my attempts to create a video transcribe service`
			`using Azure AI.`

			`## Creating Azure Ai Resource`
			`- I created just plain Azure AI Services resource from Azure Portal`
			`- I chose Sweden Central as location and standard S0 as pricing tier`
			`- I am using Visual Studio license attached Azure account so I have roughly 150€ of free credits`
			`- here's how the portal looked after`

			`![Azure AI resource](note-resources/azure-ai-resources-list-after-creation.png)`

			`- all audio files need to be in specific format, I am using ffmpeg to convert`

			```shell
			`ffmpeg -i [INPUT].mp3 -acodec pcm_s16le -ac 1 -ar 16000 [OUTPUT].wav`
			```

			`## Running this app`

			`- first get your Key and Region data from Azure Portal`
			`- then set environment variables`

			```shell
			`export SPEECH_KEY=your_key`
			`export SPEECH_REGION=your_region`

			`dotnet run`
			```

			`## Using this app`
			- You need to fill out the `Data` folder with some .wav files
			`- The app will then display those files and you can choose which to transcribe`
			`- The app will then display the transcription`

			`### Experimentation results & thoughts`
			`- I used three different files of varying lengths to try to transcribe each`
			`- It seems there is a limit as to how long the audio file can be`
			`- It might be that the silence detection is too strict`
			`- Yeah documentation says as much`
			`> This example uses the RecognizeOnceAsync operation to transcribe utterances of up to 30 seconds, or until silence is detected.`
			`- The results that I do get are very readable`
Notes 2024-05-08 12:32:46 +00:00			`- Transcribing entire files needed a different code solution`
Notes 2024-05-08 10:15:01 +00:00