Here’s a step-by-step tutorial on how to use the OpenAI API to generate text using GPT-3 and save the results to a pandas DataFrame:
- Import the required libraries: To get started, you’ll need to import the following libraries: “openai”, “pandas”, and “numpy”.
- Set up your OpenAI API key: You’ll need to create an account with OpenAI and obtain an API key to use their language models. Once you have your API key, you can store it in an environment variable or in a file on your local machine. In this tutorial, we’ll assume that you’ve stored your API key as an environment variable called “OPENAI_API_KEY”.
- Create an empty pandas DataFrame: Before generating any text, you’ll need to create an empty pandas DataFrame to store the results. You can create an empty DataFrame with two columns named “Text” and “Correction”
- Define the input prompt: In this tutorial, we’ll use the input prompt “Correct this to standard English:mum not school back” as an example. You can define your own input prompt as a string variable.
- Generate the text: To generate the corrected text, you’ll use the “openai.Completion.create()” method from the OpenAI API. This method takes several parameters, including the “engine” to use, the “prompt” to provide to the language model, and various parameters that control the behavior of the language model.
- Extract the generated text: Once the text has been generated, you can extract it from the response using the “text” attribute of the first choice in the response.
- Add the prompt and generated text to the DataFrame: Finally, you can add the input prompt and the generated text as a new row in the pandas DataFrame using the “append” method.
How to use GPT-3 in Python With the OpenAI API¶
First import the following libraries: “os”, “openai”, “pandas”, and “numpy”.¶
import os
import openai
import pandas as pd
import numpy as np
Add API key as an environmental variable¶
%env OPENAI_API_KEY= “your open ai key"
env: OPENAI_API_KEY=“your open ai key"
sets the OpenAI API key by reading it from the environment variable using “os.getenv”¶
openai.api_key = os.getenv("OPENAI_API_KEY")
Create an empty pandas DataFrame called “savedpred” with two columns named “Text” and “Correction”.¶
savedpred=pd.DataFrame(columns=['Text','Correction'])
The main purpose of this code is to generate a correction for the given prompt using OpenAI’s GPT-3 language model, and then append the prompt and the generated correction as a new row in the “savedpred” DataFrame.
The code defines a GPT prompt as “Correct this to standard English: mum not school back” and generates a response to this prompt using OpenAI’s GPT-3 language model. The response is generated with the following parameters:
# Define the prompt
gpt_prompt = "Correct this to standard English:mum not school back"
# Generate the response
response = openai.Completion.create(
#specify which pre-trained language model we want to use for generating text
engine="text-davinci-002",
prompt=gpt_prompt,
#Select Parameters
temperature=0.5,
max_tokens=256,
top_p=1.0,
frequency_penalty=0.0,
presence_penalty=0.0
)
PARAMETERS MEANING¶
engine: the pre-trained language model to use (“text-davinci-002”)¶
prompt: the GPT prompt to use¶
temperature: a parameter that controls the “creativity” of the generated text¶
max_tokens: the maximum number of tokens (words) in the generated text¶
top_p: a parameter that controls the probability mass threshold for generating each token¶
frequency_penalty: a parameter that discourages the model from repeating the same phrase or sentence¶
presence_penalty: a parameter that discourages the model from generating text that is not related to the prompt¶
Save and keep track of the different model predictions in a table.¶
The generated text is then added to the “savedpred” DataFrame as a new row with the prompt in the “Text” column and the generated correction in the “Correction” column, using the “append” function.
# Create a new row with the prompt and response
new_row = {'Text': gpt_prompt, 'Correction': response['choices'][0]['text'].strip()}
# Add the new row to the DataFrame
savedpred = savedpred.append(new_row, ignore_index=True)
<ipython-input-41-be1199579d10>:5: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. savedpred = savedpred.append(new_row, ignore_index=True)
savedpred
Text | Correction | |
---|---|---|
0 | Correct this to standard English:mum not schoo… | Mom isn’t back from school yet. |
PUT IT IN A FUNCTION¶
query_history = []
def correct_english(prompt):
# Define the DataFrame to store the results
savedpred = pd.DataFrame(columns=['Text', 'Correction'])
# Generate the response
response = openai.Completion.create(
#specify which pre-trained language model we want to use for generating text
engine="text-davinci-002",
prompt=f"Correct this to standard English: {prompt}",
#Select Parameters
temperature=0.5,
max_tokens=256,
top_p=1.0,
frequency_penalty=0.0,
presence_penalty=0.0
)
# Create a new row with the prompt and response
new_row = {'Text': prompt, 'Correction': response['choices'][0]['text'].strip()}
# Add the new row to the DataFrame
savedpred = savedpred.append(new_row, ignore_index=True)
# Append the prompt and correction to the query history
query_history.append((prompt, savedpred['Correction'][0]))
# Return the corrected text
return savedpred['Correction'][0]
# Call the function a few times to populate the query history
correct_english("i goes to the store last tueday")
<ipython-input-43-7c8718d4162d>:24: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. savedpred = savedpred.append(new_row, ignore_index=True)
'I went to the store last Tuesday.'
correct_english("im school off tomorr")
<ipython-input-43-7c8718d4162d>:24: FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead. savedpred = savedpred.append(new_row, ignore_index=True)
"I'm off to school tomorrow."
# Print the query history
for query in query_history:
print("Prompt:", query[0])
print("Correction:", query[1])
print()
Prompt: i goes to the store last tueday Correction: I went to the store last Tuesday. Prompt: im school off tomorr Correction: I'm off to school tomorrow.
That’s it! You can repeat steps 5-7 with different input prompts to generate more text and add it to the DataFrame. You can also use the various parameters in step 5 to control the behavior of the language model and experiment with different settings.