Pocketsphinx Speech to Text Tutorial in Python
This tutorial will focus on how to use pocketsphinx for speech to text in python. Using CMU Sphinx with python is a non complicated task, when you install all the relevant packages.
What is CMU Sphinx and Pocketsphinx?
CMU Sphinx, called Sphinx in short is a group of speech recognition system developed at Carnegie Mellon University [Wikipedia].
PocketSphinx: A version of Sphinx specialized for embedded systems. This is a most popular version of Sphinx for mobile phone development.
How to Install PocketSphinx?
To install PocketSphinx, you need to install Sphinx base package on your machine first. Goto Sphinx website and download the package as per your operating system.
- After installation of Sphinx, from https://github.com/cmusphinx/sphinxbase
- Install PocketSphinx, from https://github.com/cmusphinx/pocketsphinx
- Use PIP for installing PocketSphinx Library in Python
pip install pocketsphinx
If above doesn’t work, sometimes you need to upgrade the version of pip, use following commands then:
python -m pip install --upgrade pip setuptools wheel pip install --upgrade pocketsphinx
Now start writing code for testing your first program, following is the first test program you can use in Python.
# Code retested by KhalsaLabs # You can use your own audio file in code # Raw or wav files would work perfectly # For mp3 files, you need to modify code (add codex) from __future__ import print_function import os from pocketsphinx import Pocketsphinx, get_model_path, get_data_path model_path = get_model_path() data_path = get_data_path() config = { 'hmm': os.path.join(model_path, 'en-us'), 'lm': os.path.join(model_path, 'en-us.lm.bin'), 'dict': os.path.join(model_path, 'cmudict-en-us.dict') } ps = Pocketsphinx(**config) ps.decode( audio_file=os.path.join(data_path, 'goforward.raw'), # add your audio file here buffer_size=2048, no_search=False, full_utt=False ) print(ps.hypothesis())
I will explain the working of code, step by step in another post.