May 2025

Installing Whisper: A Step-by-Step Setup Guide

OpenAI Whisper is a powerful transcription tool and we’ll guide you step-by-step through installation and setup.

Installing Whisper: Step-by-Step Guide

Whisper is one of the most powerful tools for automatic speech recognition. Developed by OpenAI, it’s free to use, supports multiple languages, and works impressively well even with noisy recordings. But how do you actually install it?

In this guide, you’ll get clear, easy-to-follow instructions on how to install Whisper on Windows, macOS, or Linux — no programming experience required.

Preparation: What You Need Before Installing Whisper

Before installing Whisper, a bit of setup is needed. Don’t worry — no coding skills required, just a bit of patience.

Install Python

Whisper runs on Python. Recommended versions: 3.8 to 3.11.

Download it from python.org
On Windows, make sure to check “Add Python to PATH” during installation
Verify installation:

bash

CopyEdit

python --version

Set Up a Virtual Environment (Recommended)

A virtual environment keeps the Whisper setup isolated and clean.

bash

CopyEdit

# Windows or macOS/Linux python -m venv whisper-env # Activate the environment # macOS/Linux source whisper-env/bin/activate # Windows whisper-env\Scripts\activate.bat

Install FFmpeg

Whisper relies on FFmpeg to handle audio formats.

Windows (using Chocolatey):

bash

CopyEdit

choco install ffmpeg

macOS (using Homebrew):

bash

CopyEdit

brew install ffmpeg

Linux (Debian/Ubuntu):

bash

CopyEdit

sudo apt update && sudo apt install ffmpeg

Verify installation:

bash

CopyEdit

ffmpeg -version

Installing Whisper: Step by Step

Step 1: Install Whisper via pip

With Python and FFmpeg set up, install Whisper:

bash

CopyEdit

pip install -U openai-whisper

If you run into errors (e.g., related to tiktoken or Rust), try:

bash

CopyEdit

pip install --upgrade pip

You may also need the Rust compiler (rustup.rs) if dependencies fail to compile.

Step 2: Run a Test Transcription

Place an audio file (e.g., example.mp3) in your working directory. Then create a Python script:

python

CopyEdit

import whisper model = whisper.load_model("small") result = model.transcribe("example.mp3") print(result["text"])

Run it:

bash

CopyEdit

python transcribe.py

The model will download automatically on first use.

Installing Whisper by Operating System

Windows

1. Install Python and FFmpeg

Ensure they’re in the PATH variable so you can run python and ffmpeg globally.

2. Activate Virtual Environment

bash

CopyEdit

python -m venv whisper-env whisper-env\Scripts\activate.bat

3. Install Whisper and PyTorch (Optional)

For GPU acceleration:

bash

CopyEdit

pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu118 pip install -U openai-whisper

4. Test Transcription

Use the script above. If errors occur, check audio format, file integrity, or try a simpler file.

macOS

1. Install Homebrew, Python, FFmpeg

bash

CopyEdit

brew install python@3.11 brew install ffmpeg

2. Create Virtual Environment

bash

CopyEdit

python3 -m venv whisper-env source whisper-env/bin/activate

3. Install Whisper

bash

CopyEdit

pip install -U openai-whisper

4. Test Transcription

Same process as Windows. On M1/M2 Macs, PyTorch supports Metal acceleration (MPS), which improves speed, especially with larger models.

Linux (Debian/Ubuntu)

1. Install FFmpeg and Python

bash

CopyEdit

sudo apt update sudo apt install ffmpeg python3 python3-pip python3-venv

2. Create and Activate Virtual Environment

bash

CopyEdit

python3 -m venv whisper-env source whisper-env/bin/activate

3. Install Whisper

bash

CopyEdit

pip install -U openai-whisper

4. Optional: PyTorch with CUDA for NVIDIA GPUs

bash

CopyEdit

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118

5. Run Test Script

Use the same method as above.

Whisper Model Sizes and Memory Requirements

Whisper downloads the model when first used. Choose based on your needs:

tiny: Fast, low accuracy (~75 MB)
base: Basic balance (~142 MB)
small: Solid for general use (~244 MB)
medium: High quality (~769 MB)
large: Maximum accuracy (~1.55 GB)

Models are stored in:
~/.cache/whisper

Common Errors and How to Fix Them

FFmpeg Not Found

Check if it’s in your PATH:

bash

CopyEdit

ffmpeg -version

“No module named 'whisper'”

Make sure your virtual environment is active before running scripts.

CUDA Not Recognized

Install PyTorch with the correct CUDA version for your system (CU118, CU121, etc.). You’ll find the correct option on PyTorch’s Get Started page.

Transcription Fails

Verify the audio file is supported and unencrypted. Formats like MP3, WAV, FLAC, M4A, OGG, and AAC usually work. Avoid DRM or variable bitrate issues.

Conclusion: Install Whisper or Choose an Alternative?

With just a few setup steps, Whisper can run on any modern computer. It’s a robust tool for transcription, podcasts, research, or content creation — fully offline and free.

If you'd prefer a no-setup experience, tools like Sally offer Whisper's capabilities with added AI summaries, CRM integration, and a plug-and-play UI.

Voice recognition has never been this accessible. Want to save time? Try Sally for free today.

Jan Bettinger

COO & CPO

“In a fast-paced business world, it is essential to document information accurately and quickly. AI transcription provides a reliable method of tracking meetings and discussions at all times.”

Test Meeting Transcription now!

We'll help you set everything up - just contact us via the form.

Test Now Or: Arrange a Demo Appointment