Here are the steps to install the Speech to Text Python program:
-
Make sure that you have Python3 and pip installed on your system.
-
Visit https://platform.openai.com/account/api-keys and make an API Key if you do not have one. Note that you have to activate a payment method too because the API is not free, although it is reasonably priced (1hr of transcription will cost you $0.36 at the moment of writing this.)
-
Save the API key as an OS variable using this guide here: https://help.openai.com/en/articles/5112595-best-practices-for-api-key-safety. For linux (bash) the commands are the following:
$ echo "export OPENAI_API_KEY='PASTE-YOUR-KEY-HERE'" >> ~/.bash_profile $ source ~/.bash_profile
Verify that you now have the OpenAI API key as an OS environment variable:
$ echo $OPENAI_API_KEY
If your API key is displayed in your terminal then you are good to go.
-
Create a Python virtual environment in the folder where you want to run the program (Linux, should be similar on Mac, or use Windows terminal and look up the slight differences):
$ pip install virtualenv $ mkdir speech2text $ virtualenv speech2text $ source speech2text/bin/activate -
Install the Python libraries inside your virtualenv:
pip install gradio pyautogui openai -
Copy paste the code below into a file and save the filen as .py, e.g. speech2text.py. The Gradio and OpenAI parts of the code is from https://www.linkedin.com/pulse/create-talking-bot-new-chatgpt-whisper-api-using-python-leo-wang, I have customized it slightly to add the PyAutoGUI functionality:
import gradio as gr
import openai
import os
import pyautogui
# Set the region where the text to be selected and copied will appear
# Use a graphics program or other tool to get the coordinates
# Where you would like your transcribtion to appear
TEXT_REGION = (55, 182, 500, 200) # (left, top, width, height)
# Load the API key from your OS environment
openai.api_key = os.environ["OPENAI_API_KEY"]
# Note: You need to be using OpenAI Python v0.27.0 for the code below to work
def transcribe(audio):
print(audio)
os.rename(audio, audio + '.wav')
audio_file = open(audio + '.wav', "rb")
transcript = openai.Audio.transcribe("whisper-1", audio_file)
# Convert the "text" value to a string
textStr = str(transcript["text"])
# Run process_text
process_text(textStr)
# Return value to gr.Interface
return transcript["text"]
def process_text(textStr):
print("Response: " + textStr)
# Move and click text box / document / whatever
pyautogui.moveTo(TEXT_REGION[0]+10, TEXT_REGION[1]+10, duration=0.5)
pyautogui.click()
# Loop the string to simulate typing
for char in textStr:
pyautogui.typewrite(char)
recSend = gr.Interface(
fn=transcribe, inputs=gr.Audio(source="microphone",
type="filepath"),
outputs="text"
)
recSend.launch()
If you want to just have the response in the Gradio web interface and not move the mouse and write output whereever, you can comment out the line “process_text(textStr)”. If you want the PyAutoGUI functionality, make sure to set the coordinates.
-
Save the file and run it in the terminal using gradio NOT python:
gradio speecht2text.py``
You will now have Gradio interface running on http://127.0.0.1:7860/ (localhost), so just open that in your web browser and click “Record from microphone”, and then “Submit”.
This program can obviously be customized more in order to automate it to your needs, I will probably make some changes, but the PyAutoGUI functionality makes it quite useful in all its simplicity.