Get More Out of Your Video Calls: Convert Recordings to Transcripts with FFMPEG and OpenAI Whisper

Last updated on Mar 15, 2023 4 min read

Are you tired of sifting through hours of video call recordings to find that one crucial detail? Do you need help taking accurate notes during a fast-paced virtual meeting? If so, converting your video call recordings to transcripts can save you time and improve your productivity.

This blog post will show you how to use FFMPEG and OpenAI Whisper to turn your Zoom, Google Meet, Microsoft Teams, or other video call recordings into easily searchable and accessible transcripts.

What are FFMPEG and OpenAI Whisper?

FFMPEG is a powerful open-source tool that allows you to manipulate and convert video and audio files in various formats. OpenAI Whisper is an advanced transcription tool that uses AI to transcribe audio and video files with remarkable accuracy.

How to Convert Video Call Recordings to Transcripts Using FFMPEG and OpenAI Whisper

1) Export your video call recording as an MP4 file

Before converting your video call recording to a transcript, you must export it as an MP4 file. See the links below for instructions:

2) Install FFMPEG

FFMPEG is a command-line tool; you’ll need to download and install it on your computer before using it. You can download FFMPEG for free from the official website or install it using a package manager like apt on Ubuntu and Windows WSL. For Ubuntu and Windows WSL users, it’s as easy as:

sudo apt update
sudo apt install -y ffmpeg

3) Convert the MP4 file to an audio-only file using FFMPEG

Once you have installed FFMPEG, open your terminal and navigate to the folder where your MP4 file is saved. Then, enter the following command to convert the video file to an audio-only file:

ffmpeg -i input.mp4 -vn -acodec copy output.aac

4) Install OpenAI Whisper

Whisper is a general-purpose speech recognition AI model, and it is trained on a large dataset of diverse audio. It is also a multi-task model that can perform multilingual speech recognition, speech translation, and language identification. The model is open source and available to download from its GitHub repository or available as a command line tool through Python.

To install the model as a command line tool, install it with pipx. This official Python tool allows you to install and run Python applications in isolated environments to avoid dependency issues:

# install pipx
python3 -m pip install --user pipx
python3 -m pipx ensurepath

# install openai whisper
pipx install openai-whisper

Note that you will also require nvidia-cudnn which can be installed on Ubuntu and Windows WSL using:

sudo apt update
sudo apt install -y nvidia-cudnn

5) Process the Audio File with OpenAI Whisper

Now that you have an audio file of your video call recording, you can process it with OpenAI Whisper to generate a transcript. You will need a computer with a modern NVIDIA GPU (e.g., a gaming PC). Run the following:

whisper output.aac

This will automatically download the small model and export the transcript into several formats, including TXT, JSON, SRT, and VTT:

.
├── output.aac.json
├── output.aac.srt
├── output.aac.tsv
├── output.aac.txt
└── output.aac.vtt

Why is This Workflow Useful?

Converting your video call recordings to transcripts offers several benefits, including:

Time savings: Rather than spending hours sifting through a long recording, you can quickly search and find the information you need in a transcript.
Improved accuracy: Transcripts are often more accurate than notes taken during a fast-paced virtual meeting.
Accessibility: Transcripts make it easier for people with hearing impairments to participate in virtual meetings and events.

Converting your video call recordings to transcripts using FFMPEG and OpenAI Whisper is a simple and effective way to save time, improve accuracy, and increase accessibility. Following the steps outlined in this blog post, you can quickly and easily turn virtual meetings into searchable and accessible transcripts.