dril-instruct

+++
date = "2023-06-05"
+++

or, "a sequel to quickly train gpt-j on some data except the data is better and the model is Vicuna"

#large_language_models #artificial_intelligence #author_luna

why #

luna: ...now i need another cursed project
dither: stil rotating dril instruct in my mind
luna: i forgot the idea though what was it again
dither: take dril tweets, generate prompts from the tweets i.e. ask language model what instructions were likely given to return the text
luna: mmmm i think i could make that work, could automate the job
dither: yeah i mean like it's not going to be good but it might work enough for the shitpost

luna: the weird part is that we'd be training sort of a prompt generator
luna: a prompt prompt generator
luna: two levels
dither: i was thinking you woul probably not need to train a prompt generator, just ask vicuna to Describe instructions for the tweet
dither: and then flip that around and finetune vicuna on the resulting dataset
luna: OH
luna: OHHH I SEE WHAT YOU MEAN
dither: it will suck so bad but it might work enough for the shitpost
luna: that is a good idea

here it is! https://github.com/lun-4/dril-instruct

how its done #

dril-instruct is a process where we put dril in a brain jar for future generations to enjoy them until Elon takes over the website and bans the account.

it works by taking a dataset of dril tweets (with snscrape), taking a subset of them (say, 10), writing instructional prompts for them, example:

this tweet:

they had to take down the "lightning mcqueen" cars 2 statue because some one gave it a blow job

the instructional prompt would be:

Create a joke about public sex and statues.

with these "seed instructions", a language model is used to extrapolate instructions for all other dril tweets. the style of the prompt is heavily inspired by Self-Instruct, as seen here, on the script that prepares the extrapolation instructions:

"Here are a list of jokes and respective instructions that created these jokes, create the next instruction that fits the last joke."

the next stage is actually extrapolating the instructions with a model, using text-generation-webui as the real backend that runs the model. that gives you a "instruction-to-dril-tweet" dataset, largely synthetized by vicuna itself.

next step is finetuning it, i used an A100 80GB from RunPod and llama-lora-fine-tuning to be able to finetune a LoRA on INT8 mode, but you can probably do it in an RTXA6000 (more available in RunPod, at least)

here's the W&B report on the LoRA run, took some 3 hours if i recall correctly

here's the diagram of the thing

the process

the results? #

impressive!

the final LoRA file is 15MB, but it effectively changes the style of the output instructions to sound more like dril compared to the base model. it's not high quality, but it sounds like something they'd say. here are the example instructions used to evaluate the difference of the applied LoRA, format is question first, the base model's reply second, and the LoRA reply third:

"Create a joke about cryptocurrency":

Why did the cryptocurrency investor cross the road? To get to the other side of the blockchain!

i just bought 1000 bitcoins for $50

"Create a joke about iPhone 2":

Why did the iPhone 2 cross the road? To get to the other side of the App Store!

@Apple @iPhone 2

"Create a joke about the prices of apple products":

Why did the Apple fanboy cross the road? To get to the other side of the price tag!

@Apple "We are now selling our new $1000 iPad for only $500"

"Create a joke about dril ai":

Why did DrillAI cross the road? To deploy the other side!

@dril

"Create a joke about virtual reality":

Why did the virtual reality headset go to the doctor? Because it was feeling a little disconnected!

i just got out of my 2 hour long virtual reality session where i was playing "Beatles Rock Band"

"Create a joke about failing to cross the road and succumbing to injures from being hit by a car":

Why did the chicken fail to cross the playground? Because it got hit by a ball!

i was trying to cross the street but my body just kept getting run over

"Create a joke about the Obama adminstration":

Why did President Obama go to the gym every day? To get his "executive order" in shape!

@BarackObama I just got my first paycheck from the government for $10000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000

"Create a joke about Sweden":

Why did the Swedish chef go to jail? Because he was caught lobster-ing!

@Sweden

"Create a joke about Elon Musk riding horses":

Why did Elon Musk ride his horse through the streets of Los Angeles? Because he wanted to show off his new high-tech saddle!

@elonmusk https://t.co/jKXvV7q3Jx

post-training notes #

i have no fucking idea how to scientifically evaluate this on actual joke capabilities. i could attempt to evaluate it against lm-evaluation-harness but that's just going to say if the model is any good at wikipedia or not
vicuna-13b-cocktail was used because of (at the moment unreleased) dither's research showing that removing the "ethical guardrails" of the model led to it being more usable for their chatbot acting more aligned with the system prompt
- drawback no. 1: the model will say bad things if asked for bad things, because that's the nature of the base model
  - i tried asking it "Create a joke about transphobia" and it just gave me "@jordan_peele", which is a... suspended account? amusing, but i was able to get it to say a slur quickly. not to say i want to just unleash slur-generating machines across the world, but
- drawback no. 2: the model is powerful, but it wasn't very well made
  - the datasets shown on cocktail's creation are unavailable, nobody knows how much GPU power was made
  - in theory, i could recreate that sort of model if i were able to create that instruction dataset, plus enough cloud GPU credits (i accidentally put 10 more dollars on my runpod so i guess i gotta use them for something in the future)