Python Speech Recognition – Artificial Intelligence
What is Python Speech Recognition?
From systems facilitating single speakers and limited vocabularies of around a dozen words, to systems that recognize from multiple speakers and possess huge vocabularies in various languages, we have come a long way. What we do here is- we convert speech from physical sound to electrical signals using a microphone. Then, we use an analogue-to-digital converter to convert this to digital data. Finally, we use multiple models to transcribe audio to text. In the Hidden Markov Model (HMM), we divide the speech signal into 10-millisecond fragments.
Do you know about Recursion in Python
a. Available APIs in Python Speech Recognition
With Python, we have several APIs available:
- apiai
- assemblyai
- google-cloud-speech
- pocketsphinx
- SpeechRecognition
- watson-developer-cloud
- wit
Some Python packages like wit and apiai offer more than just basic speech recognition. Here, though, we will demonstrate SpeechRecognition, which is easier to use. This hard-codes a default API key for the Google Web Speech API.
b. Supported File Types in Python Speech Recognition
- WAV- PCM/LPCM format
- AIFF
- AIFF-C
- FLAC
c. Prerequisites for Python Speech Recognition
You can use pip to install this-
pip install SpeechRecognition
To test the installation, you can import this in the interpreter and check the version-
- >>> import speech_recognition as sr
- >>> sr.__version__
‘3.8.1’
We also download a sample audio from here-
Reading an Audio File in Python
a. The Recognizer class
First, we make an instance of the Recognizer class.
- >>> r=sr.Recognizer()
With Recognizer, we have a method for each API-
- recognize_bing()- Microsoft Bing Speech
- recognize_google()- Google Web Speech API
- recognize_google_cloud()- Google Cloud Speech
- recognize_houndify()- Houndify
- recognize_ibm()- IBM Speech to Text
- recognize_sphinx- CMU Sphinx
- recognize_wit()- Wit.ai
Exempting recognize_sphinx(), you need an Internet connection for anything else you’re working with.
You must read the Python web framework
b. Capturing data with record()
We can have the context manager open the file and read its contents, then record it into an AudioData instance.
- >>> demo=sr.AudioFile('demo.wav')
- >>> with demo as source:
- audio=r.record(source)
To confirm this, try:
- >>> type(audio)
<class ‘speech_recognition.AudioData’>
c. Recognizing Speech in the Audio
Finally, you can call recognize_google() to perform the transcription.
- >>> r.recognize_google(audio)
“The Purge can use within The Smurfs the sheet without playback Mount delivery date habitat of a Vow these days it’s okay microwave devices are installed in Windows to use of lemons next find the password on the site that the houses such hard core in a garbage for the study core exercises talking is hard disk”
Well, you can read audio of a different language using the language parameter-
- r.recognize_google(audio,language='ro-RO') #for Romanian
Reading a Segment of Audio
When you only want to read a part of your audio file, you can use the arguments offset– telling it where to begin (in seconds), and duration– telling it how long to listen.
Let’s take a tour of Python Datetime Module
- >>> with demo as source:
- audio=r.record(source,offset=4,duration=3)
- >>> r.recognize_google(audio)
‘clear the sheet without me back’
Note that this caused issues at the extremes. It heard ‘murfs’, which it translated to ‘clear’. It also heard ‘me back’ instead of ‘playback’ because of the noise in the audio.
If we set the offset to 3.3,
- >>> with demo as source:
- audio=r.record(source,offset=3.3,duration=3)
- >>> r.recognize_google(audio)
‘clear the sheet with Ok’
But check what happens when we set the offset to 2.5-
- >>> with demo as source:
- audio=r.record(source,offset=2.5,duration=3)
- >>> r.recognize_google(audio)
‘National thanks’
Python Speech Recognition – Dealing with Noise
Okay, let’s face it. There will always be noise, no matter how professional appliances you use to record your audio. So let’s better learn to deal with it. The method adjust_for_ambient_noise() reads the first second of a file stream to calibrate the recognizer to the audio’s noise level. This often consumes that part of the audio, and it doesn’t make it to the transcription.
Do you know about Python Property
- >>> with demo as source:
- r.adjust_for_ambient_noise(source)
- audio=r.record(source,offset=2.5,duration=3)
- >>> r.recognize_google(audio)
‘clear the sheet’
We can provide this an argument for how long it should listen for noise so it can calibrate the recognizer. Let’s see how it produces two entirely different outputs for a difference as low as 0.005-
- >>> with demo as source:
- r.adjust_for_ambient_noise(source,duration=0.51)
- audio=r.record(source,offset=2.5,duration=3)
- >>> r.recognize_google(audio)
‘National thanks’
- >>> with demo as source:
- r.adjust_for_ambient_noise(source,duration=0.515)
- audio=r.record(source,offset=2.5,duration=3)
- >>> r.recognize_google(audio)
‘clear the sheet’
As you can see, adjust_for_ambient_noise() is definitely not a miracle worker. To get around this, you can use an audio-editing software like Audacity to preprocess the audio.
Working With Microphones
To be able to work with your own voice with speech recognition, you need the PyAudio package. You can install it with pip-
pip install PyAudio
Or you can download and install the binaries with pip. Download link-
https://www.lfd.uci.edu/~gohlke/pythonlibs/#pyaudio
Then:
pip install [file_name_for_binary]
For example:
pip install PyAudio-0.2.11-cp37-cp37m-win32.whl
a. The Microphone class
Like Recognizer for audio files, we will need Microphone for real-time speech data. Since we installed new packages, let’s exit our interpreter and open another session.
- >>> import speech_recognition as sr
2. >>> r=sr.Recognizer()
Now, let’s create an instance of Microphone.
- >>> mic=sr.Microphone()
Microphone has a static method to list out all microphones available-
- >>> sr.Microphone.list_microphone_names()
Get the best guide for Python Career
See Also-