Table of Contents
Implementation
1. Getting OpenAI API Key
- Navigate to https://platform.openai.com/signup and sign up for an account.
- Navigate to “Personal” -> “View API Keys”
- Click on “Create new Secret Key”
2. Install OpenAI module
In your command line run
pip install openai
3. Generating the Transcript
Modify and run the following Python script
from openai import OpenAI
client = OpenAI(api_key=<YOUR API KEY HERE>)
f = open("./your-file-to-transcribe.mp3", "rb")
transcript = client.audio.transcriptions.create(file=f, model="whisper-1")
print(transcript.text)
Code language: PHP (php)
Example
Back Story from 2018
In 2018 I wrote a blog post titled Transcribing Speech to Text with Python and Google Cloud Speech API. Back then, the task was complex because the API only accepted audio clips of up to 60 seconds. To work around this, I had to split my audio into smaller files, transcribe them individually, and then combine them into a single text file (making sure to maintain order etc).
Today, OpenAI announced release of Wisper API and I decided to compare it to my results from 2018.
Note: My implementation in 2018 was flawed. For example, I did a hard split by 30 second mark, instead of finding a point of silence. Google gets a lot of credit for having this amazing technology 5 years ago. Google did update their API since then to accept longer files.
This post is not comparing Google to OpenAI. This post is comparing how much developer experience and quality of Audio transcription improved over time.
In 2018 I had 3 types of audio files:
- My wife, a native English speaker, read something out loud as if she was dictating to Siri for about 1.5 minutes, recorded on iPhone 6s.
- A snippet of Sport Radio Station.
- Famous (but very low quality) 1934 speech by Winston Churchill, titled The Threat of Nazi Germany.
My 2018 version did well on the high-quality audio, but struggled with the last speech.
OpenAI excelled with all of the three files.
Furthermore, my 2018 implementation required some engineering. OpenAI version is just a few lines of glue code in Python.
See for yourself.
A Speech by Winston Churchill Transcribed
Many people think that the best way to escape war is to dwell upon its horrors and to imprint them vividly upon the minds of the younger generation. They flaunt the grisly photographs before their eyes. They fill their ears with tales of carnage. They dilate upon the ineptitude of generals and admirals. They denounce the crime and insensate folly of human strife. Now all this teaching ought to be very useful in preventing us from attacking or invading any other country, if anyone outside a madhouse wished to do so. But how would it help us if we were attacked or invaded ourselves? That is the question we have to ask. Would the invaders consent to visit Lord Beaverbrook’s exhibition or listen to the impassioned appeals of Mr. Lloyd George? Would they agree to meet that famous South African, General Smuts, and have their inferiority complex removed in friendly, reasonable debate? I doubt it. I have borne responsibility for the safety of this country in grievous times. I gravely doubt it. But even if they did, I am not so sure we should convince them and persuade them to go back quietly home. They might say, it seems to me, you are rich, we are poor. You seem well fed, we are hungry. You have been victorious, we have been defeated. You have valuable colonies, we have none. You have your navy, where is ours? You have had the past, let us have the future. Above all, I fear, they would say, you are weak and we are strong. After all, my friends, only a few hours away by air there dwells a nation of nearly 70 million of the most educated, industrious, scientific, disciplined people in the world, who are being taught from childhood to think of war as a glorious exercise and death in battle as the noblest fate for man. There is a nation which has abandoned all its liberties in order to augment its collective strength. There is a nation which with all its strength and virtue is in the grip of a group of ruthless men preaching a gospel of intolerance and racial pride, unrestrained by law, by parliament, or by public opinion. In that country all pacifist speeches, all morbid war books are forbidden or suppressed and their authors rigorously imprisoned. From their new table of commandment they have omitted, thou shalt not kill. It is but 20 years since these neighbors of ours fought almost the whole world and almost defeated them. Now they are rearming with the utmost speed. And ready to their hands is this new lamentable weapon of the air, against which our navy is no defense, and before which women and children, the weak and frail, the pacifist and the jingo, the warrior and the civilian, the front-line trenches and the cottage home, all lie in equal and impartial peril. Nay, worse still, for with the new weapon has come a new method, or rather has come back the most brutish methods of ancient barbarism, namely the possibility of compelling the submission of races by terrorizing and torturing their civil population. And worst of all, the more civilized the country is, the larger and more splendid its city, the more intricate the structure of its social and economic life, the more it is vulnerable, the more it is at the mercy of those who may make it their prey. Now these are facts, hard, grim, indisputable facts, and in face of these facts I ask again, what are we to do?
Whisper results in 2023
Note: Missing sentences and funny words like DVD on the left and near perfect transcription on the right.
Conclusion
The future is now.
I encourage you to glance through my post from 2018 and compare how much effort it took me to go through a similar exercise back then.
In 2023, this whole exercise and the blog post took me about an hour to complete.
It was a special moment for me to listen to Winston Churchill give his historic speech and follow along, line by line, with a near-perfect transcription generated by OpenAI. The future is now…