How to Convert Speech to Text in Python

Speech recognition is the ability of a computer software to identify words and phrases in spoken language and convert them to human readable text. In this tutorial, we will use SpeechRecognition Python library to do that.

First, let’s install the library:

pip3 install speech_recognition

Okey, open up a new Python file and follow along:

import speech_recognition as sr

The nice thing about this library is it supports several recognition engines:

We gonna use Google Speech Recognition here, as it doesn’t require any API key.

Reading from a File

Make sure you have an audio file in the current directory that contains english speech:

filename = "16-122828-0002.wav"

This file was grabbed from LibriSpeech dataset, but you can bring anything you want, just change the name of the file, let’s initialize our speech recognizer:

# initialize the recognizer
r = sr.Recognizer()

The below code is responsible for loading the audio file, and converting the speech into text using Google Speech Recognition:

# open the file
with sr.AudioFile(filename) as source:
    # listen for the data (load audio to memory)
    audio_data = r.record(source)
    # recognize (convert from speech to text)
    text = r.recognize_google(audio_data)

his will hear from your microphone for 5 seconds and then tries to convert that speech into text !

It is pretty similar to the previous code, but we are using Microphone() object here to read the audio from the default microphone, and then we used duration parameter in record() function to stop reading after 5 seconds and then uploads the audio data to Google to get the output text.

You can also use offset parameter in record() function to start recording after offset seconds.

Also, you can recognize different languages by passing language parameter to recognize_google() function. For instance, if you want to recognize spanish speech, you would use:

text = r.recognize_google(audio_data, language="es-ES")

Check out supported languages in this stackoverflow answer.


As you can see, it is pretty easy and simple to use this library for converting speech to text. This library is widely used out there in the wild, make sure you master it, check their official documentation.


Deja una respuesta

Introduce tus datos o haz clic en un icono para iniciar sesión:

Logo de

Estás comentando usando tu cuenta de Salir /  Cambiar )

Imagen de Twitter

Estás comentando usando tu cuenta de Twitter. Salir /  Cambiar )

Foto de Facebook

Estás comentando usando tu cuenta de Facebook. Salir /  Cambiar )

Conectando a %s

Este sitio usa Akismet para reducir el spam. Aprende cómo se procesan los datos de tus comentarios.