Get More Out of Your Video Calls: Convert Recordings to Transcripts with FFMPEG and OpenAI Whisper
Are you tired of sifting through hours of video call recordings to find that one crucial detail? Do you need help taking accurate notes during a fast-paced virtual meeting? If so, converting your video call recordings to transcripts can save you time and improve your productivity.
This blog post will show you how to use FFMPEG and OpenAI Whisper to turn your Zoom, Google Meet, Microsoft Teams, or other video call recordings into easily searchable and accessible transcripts.
What are FFMPEG and OpenAI Whisper?
FFMPEG is a powerful open-source tool that allows you to manipulate and convert video and audio files in various formats. OpenAI Whisper is an advanced transcription tool that uses AI to transcribe audio and video files with remarkable accuracy.
How to Convert Video Call Recordings to Transcripts Using FFMPEG and OpenAI Whisper
1) Export your video call recording as an MP4 file
Before converting your video call recording to a transcript, you must export it as an MP4 file. See the links below for instructions:
- Zoom: Finding and viewing local recordings
- Google Meet: Record a video meeting
- Microsoft Teams: Play and share a meeting recording in Teams
2) Install FFMPEG
FFMPEG is a command-line tool; you’ll need to download and install it on your computer before using it. You can download FFMPEG for free from the official website or install it using a package manager like
apt on Ubuntu and Windows WSL. For Ubuntu and Windows WSL users, it’s as easy as:
sudo apt update sudo apt install -y ffmpeg
3) Convert the MP4 file to an audio-only file using FFMPEG
Once you have installed FFMPEG, open your terminal and navigate to the folder where your MP4 file is saved. Then, enter the following command to convert the video file to an audio-only file:
ffmpeg -i input.mp4 -vn -acodec copy output.aac
4) Install OpenAI Whisper
Whisper is a general-purpose speech recognition AI model, and it is trained on a large dataset of diverse audio. It is also a multi-task model that can perform multilingual speech recognition, speech translation, and language identification. The model is open source and available to download from its GitHub repository or available as a command line tool through Python.
To install the model as a command line tool, install it with
pipx. This official Python tool allows you to install and run Python applications in isolated environments to avoid dependency issues:
# install pipx python3 -m pip install --user pipx python3 -m pipx ensurepath # install openai whisper pipx install openai-whisper
Note that you will also require
nvidia-cudnn which can be installed on Ubuntu and Windows WSL using:
sudo apt update sudo apt install -y nvidia-cudnn
5) Process the Audio File with OpenAI Whisper
Now that you have an audio file of your video call recording, you can process it with OpenAI Whisper to generate a transcript. You will need a computer with a modern NVIDIA GPU (e.g., a gaming PC). Run the following:
This will automatically download the
small model and export the transcript into several formats, including TXT, JSON, SRT, and VTT:
. ├── output.aac.json ├── output.aac.srt ├── output.aac.tsv ├── output.aac.txt └── output.aac.vtt
Why is This Workflow Useful?
Converting your video call recordings to transcripts offers several benefits, including:
- Time savings: Rather than spending hours sifting through a long recording, you can quickly search and find the information you need in a transcript.
- Improved accuracy: Transcripts are often more accurate than notes taken during a fast-paced virtual meeting.
- Accessibility: Transcripts make it easier for people with hearing impairments to participate in virtual meetings and events.
Converting your video call recordings to transcripts using FFMPEG and OpenAI Whisper is a simple and effective way to save time, improve accuracy, and increase accessibility. Following the steps outlined in this blog post, you can quickly and easily turn virtual meetings into searchable and accessible transcripts.
- Mentor of the Year Nomination: Startup Community Gala 2021
- 10 Things I Wish They Taught in Engineering School
- Hardware in the Loop: Training Robot Contact in an Unstructured Environment
- Python Snippets: Dropping Infinite Values From Dataframes in Pandas
- Pushing Streaming Data to Microsoft Power BI for Data Visualization Using Python