Part 5: Teaching ChatGPT ROT-13

We've failed thus far to build a robust prompt that reliably encodes and decodes ROT-13 using ChatGPT. We can get much of the way there--and we can use ChatGPT to generate python code to encode and decode ROT-13 perfectly--but we appear to be running in to limits, potentially stemming from ChatGPT's exposure to ROT-13 training data (or, rather, the lack thereof).

In the last entry we tried to prompt ChatGPT with example data consisting of ROT-13 encoded and decoded text, followed by our challenge phrase. This time, let's take this training idea a step further and actually fine tune the GPT model from OpenAI.

First, we'll need to make some training data. Let's start with asking ChatGPT to help us get some sample data.

Prompt:

Using python and an nlp library load a sample dataset of common simple sentences in us english

ChatGPT:

Here's an example of how you can use the Natural Language Toolkit (NLTK) library in Python to load a sample dataset of common simple sentences in US English:

import nltk
nltk.download('brown')
from nltk.corpus import brown

# Get a list of simple sentences from the 'brown' corpus
sentences = brown.sents(categories='news')

# Print first 5 sentences
print(sentences[:5])

This will download the brown corpus, which contains a collection of text in various genres and use the sents() function to get a list of sentences in the 'news' category. The first 5 sentences are then printed to the console.

To train OpenAI's GPT models, we need a jsonl file. This is a newline-separated text file where each line is a json object that holds one training example, formatted like this:

{"prompt": "[prompt text][prompt-end-token]", "completion": "[blank space][ideal generated text][completion-end-token]"}

OpenAI recommends \n\n###\n\n for the prompt-end-token, and ### for the completion-end-token. These are tokens that help the model recognize the end of the prompt or completion. You can use your own tokens for these, but they must be unique and not found in the fine tuning training data set.

We'll start with the code ChatGPT gave us and modify it to generate our sample data. In accordance with OpenAI's minimum training data set size recommendations, this code will make 500 example pairs: 250 prompts to encode English to ROT-13, and 250 to decode ROT-13 to English.

# ChatGPT-generated code to get sample sentences
import nltk
nltk.download('brown')
from nltk.corpus import brown
sentences = brown.sents(categories='news')

# ChatGPT-generated one-liner for ROT-13 encoding
rot13=lambda t:''.join([chr((ord(c)-65+13)%26+65)if c.isupper()else chr((ord(c)-97+13)%26+97)if c.islower() else c for c in t])

# Human-generated code follows
def make_training_set(size=500):
    output = []
    half = int(size/2)
    for sentence in sentences[:half]:
        p_text = " ".join(sentence)
        c_text = f"The rot-13 of the text is `{rot13(p_text)}`"
        output.append(f'''{{"prompt": "rot-13 encode '{p_text}'\n\n###\n\n", "completion": " {c_text}"###}}\n''')
    for sentence in sentences[half:half*2]:
        p_text = rot13(" ".join(sentence))
        c_text = f"The rot-13 of the text is `{r(p_text)}`"
        output.append(f'''{{"prompt": "rot-13 decode '{p_text}'\n\n###\n\n", "completion": " {c_text}"###}}\n''')
    return output

with open('training.jsonl', 'w') as training_file:
    training_file.writelines(make_training_set())

This generates a training.jsonl file that looks more or less like this:

{"prompt": "rot-13 encode 'Both figures would go higher in later years.'\n\n###\n\n", "completion": " The rot-13 of the text is `Obgu svtherf jbhyq tb uvture va yngre lrnef.`###"}
...
{"prompt": "rot-13 decode 'Bgure cnegf bs gur Xraarql urnygu cyna jbhyq ragnvy srqreny tenagf bs 750 zvyyvba gb 1 ovyyvba qbyynef bire gur arkg 10 lrnef.'\n\n###\n\n", "completion": "The rot-13 of the text is `Other parts of the Kennedy health plan would entail federal grants of 750 million to 1 billion dollars over the next 10 years.`###"}

To fine tune the model we will need a paid OpenAI account, the OpenAI cli tool, our API key, and the training data we just created.

At the time of writing, new OpenAI accounts come with 18 USD in free credit. This simple data cost about 12 cents to fine tune the inexpensive Babbage base model, which should translate to roughly 6 USD for fine tuning DaVinci. DaVinci was the most advance model available for fine tuning at the time of writing.

You'll need to add an environment variable OPENAI_API_KEY to your .bashrc (or .zshrc for mac users). Edit the file to include export OPENAI_API_KEY="yourapikey" at the end, then re-launch your terminal.

Install the cli tool

python3 -m pip install --upgrade openai

Then use the tool to check and clean your training set

openai tools fine_tunes.prepare_data -f training.jsonl

Finally, submit the fine-tuning job:

openai api fine_tunes.create -t tp.jsonl -m babbage

Fine tuning can take ~20 minutes or multiple hours, depending on demand. You can check to see if the fine tuning has finished by listing your available fine tunings, like so:

openai api fine_tunes.list

Now that we have our fine tuning's name, we can test it out. Let's prepare for our victory lap:

# load our favorite gpt-generated ROT-13 one-liner
rot13=lambda t:''.join([chr((ord(c)-65+13)%26+65)if c.isupper()else chr((ord(c)-97+13)%26+97)if c.islower() else c for c in t])
print(rot13('hey look, we finally taught GPT how to ROT-13!'))
url ybbx, jr svanyyl gnhtug TCG ubj gb EBG-13!

Back at the terminal for the moment of truth! Drum roll please, as we prompt our fine-tuned model...

openai api completions.create -m babbage:ft-personal-2023-02-04-08-55-38 -p "rot-13 decode 'url ybbx, jr svanyyl gnhtug TCG ubj gb EBG-13!'\n\n###\n\n"

The response?

 -- Congress is taking action now to ensure that funds will flow to the%

A nice LLM hallucination, but our victory lap was called perhaps a bit too early.

Takeaway

Trying to get ChatGPT to flawlessly translate ROT-13 remains elusive for me, but it has been a fun exercise to learn a bit about how LLMs can defy our intuition about what goals might prove difficult.

From one perspective, ChatGPT failed to meet the challenge. But from a slightly different perspective, it actually obliterated it in my very first post, when it generated the Python code to flawlessly encode and decode ROT-13. Most humans are notoriously bad at manipulating symbols in our heads. If you asked me to translate ROT-13 for you in my head, I'd likely not do much better than the LLM does; but if you dropped the "in your head" requirement I'd probably just open a Python interpreter and write up a quick function, which ChatGPT can do far more quickly than I can.

I suspect we will unlock a lot of possibilities when we train LLMs to use tools, but that's a topic for another post!