Pocket Sphinx Python - KhalsaLabs

Pocketsphinx Speech to Text Tutorial in Python

This tutorial will focus on how to use pocketsphinx for speech to text in python. Using CMU Sphinx with python is a non complicated task, when you install all the relevant packages.


What is CMU Sphinx and Pocketsphinx?
CMU Sphinx, called Sphinx in short is a group of speech recognition system developed at Carnegie Mellon University [Wikipedia].
PocketSphinx: A version of Sphinx specialized for embedded systems. This is a most popular version of Sphinx for mobile phone development.

How to Install PocketSphinx?
To install PocketSphinx, you need to install Sphinx base package on your machine first. Goto Sphinx website and download the package as per your operating system.

  • After installation of Sphinx, from https://github.com/cmusphinx/sphinxbase
  • Install PocketSphinx, from https://github.com/cmusphinx/pocketsphinx
  • Use PIP for installing PocketSphinx Library in Python
    pip install pocketsphinx

    If above doesn’t work, sometimes you need to upgrade the version of pip, use following commands then:

    python -m pip install --upgrade pip setuptools wheel
    pip install --upgrade pocketsphinx

    Now start writing code for testing your first program, following is the first test program you can use in Python.

# Code retested by KhalsaLabs
# You can use your own audio file in code
# Raw or wav files would work perfectly
# For mp3 files, you need to modify code (add codex)

from __future__ import print_function
import os
from pocketsphinx import Pocketsphinx, get_model_path, get_data_path

model_path = get_model_path()
data_path = get_data_path()

config = {
'hmm': os.path.join(model_path, 'en-us'),
'lm': os.path.join(model_path, 'en-us.lm.bin'),
'dict': os.path.join(model_path, 'cmudict-en-us.dict')

ps = Pocketsphinx(**config)
audio_file=os.path.join(data_path, 'goforward.raw'), # add your audio file here


I will explain the working of code, step by step in another post.

04 comments on “Pocketsphinx Speech to Text Tutorial in Python

  • john Barney , Direct link to comment

    I am writing an application for my wife who is blind. I am using Python 3.6 with the free ‘Google Speech Recognition API ‘ It does everything I want but with the web so SLOW it takes minutes to respond.
    Can you help! I think I need an off line product to help with the speed issue.
    Note: this is an unpaid gig so I am looking or a free license.
    Thank you in advance!
    Cheers, John

    • Harman Singh , Direct link to comment

      Hi John, I have seen your comment a long time but the website was bombarded with spam comments. I was unable to reply you. I want to confirm that were you able to make something or make your program to work faster?

Leave a comment

Your email address will not be published.


Subscribe to Khalsa Labs