Based on this article https://www.unimelb.edu.au/accessibility/automatic-speech-recognition/getting-started-with-microsoft-azure-speech-to-text it seems that audio needs to be in a very specific format.
"The out of the box speech-to-text Service is available for quick real-time Speech-to-text service and transcription of WAV audio file(s) (16kHz or 8kHz, 16-bit, and mono PCM)."
By the way, official documentation is remarkably mum about this requirement.

## Language switching
Let's switch to Finnish and try this again.
As a source data we use a video in an article https://yle.fi/a/74-20080518. Audio
is recorded with audacity and then exported as wav.

Finnish is notoriously difficult language to learn (or so I've heard) and my experiences with various translation solutions have left absolutely more to be desired. Here's the result of the small news clip.

I would say these results are amazing as far as accuracy is concerned in comparison to other solutions even fiveish years ago. Granted, I haven't had the need to do anything like this so maybe I am hyping over nothing but still, pretty good.