xesite/blog/robocadey-2022-04-30.markdown

475 lines
19 KiB
Markdown
Raw Permalink Normal View History

---
title: "robocadey: Shitposting as a Service"
date: 2022-04-30
tags:
- gpt2
- machinelearning
- python
- golang
- art
vod:
twitch: https://www.twitch.tv/videos/1471211336
youtube: https://youtu.be/UAd-mWMG198
---
<noscript>
[Hey, you need to enable JavaScript for most of the embedded posts in this
article to work. Sorry about this, we are working on a better solution, but this
is what we have right now.](conversation://Mara/hacker)
</noscript>
What is art? Art is when you challenge the assumptions that people make about a
medium and use that conflict to help them change what they think about that
medium. Let's take "Comedian" by Maurizio Cattelan for example:
![A banana duct-taped to an artist's
canvas](https://cdn.christine.website/file/christine-static/blog/merlin_165616527_d76f38fc-e45d-4913-9780-1cc939750197-superJumbo.jpg)
By my arbitrary definition above, this is art. This takes assumptions that you
have about paintings (you know, that they use paint on the canvas) and discards
them. This lets you change what you think art is. Art is not about the medium or
the things in it. Art is the expression of these things in new and exiting ways.
<xeblog-conv name="Cadey" mood="coffee">Originally I was going to use some
Banksky art here, but for understandable reasons it's quite difficult to get
images of Banksky art.</xeblog-conv>
One of my favorite kinds of art is the "uncanny valley" of realism. Let's take
Death Stranding as an example of this. Death Stranding is a video game that was
released in 2019 for the PlayStation 4 and is one of my favorite games of all
time. The game has a very hyper-realistic art style that is firmly in the
centre of the uncanny valley:
![A picture of Death Stranding gameplay, showing the protagonist Sam Porter
Bridges attempting to climb a sheer cliff face using a rope that another player
left
behind](https://cdn.christine.website/file/christine-static/blog/20220202215156_3.jpg)
This game mixes very realistic scenery with a story about dead bodies turning
into antimatter and you being a UPS delivery person that saves America. This is
art to me. This transformed what a video game could be, even if the entire game
boils down to Kojima themed fetch quests. Oh and trying not to die even though
you can't die but when you die it's really bad.
I want to create this kind of art, and I think I have found a good medium to do
this with. I write a lot on this little independent site called Twitter. This is
one of the main things that I write on, and through the process of the last 8
years or so, I've written a shockingly large amount of things. I post a lot of
weird things there as well as a lot of boring/normal things.
However a lot of my posts boil down to creating a "stream of consciousness", or
using it as a way to help deal with intrusive thoughts. There's a certain art to
this, as it is a candid exchange between the author and the reader. The reader
doesn't get all the context (heck, I doubt that I have all the context lol), but
from there they get to put the pieces together.
So, when thinking about trying to get into the uncanny valley with this kind of
art medium, my mind goes back to the old days on IRC channels. Many IRC channels
run bots to help them run the channel or purely for amusement. One of my
favorite kinds of bots is a [Markov
chain](https://en.wikipedia.org/wiki/Markov_chain) bot. These kinds of bots
learn patterns in text and then try to repeat them at random. With enough
training data, it can be fairly convincing at first glance. However, you need _a
lot_ of training data to get there. More training data than I have ever tweeted.
This ends up creating a situation where the markov bot is right in the uncanny
valley of realism. At first glance it is something that isn't not plausibly
human. It looks like a bot, but it also looks like a human, but it also looks
like a bot. It appears to be in the middle. I like this from an artistic
standpoint because this challenges your assumptions that bots need to be
obviously bots and humans need to be obviously human.
In the past I have ran a service I call `cadeybot`. It took all of my Discord
messages, fed them into a Markov chain, and then attempted to create new
messages as a result. This worked pretty well, but we ran into an issue where it
would basically regurgitate its training data. So when people thought it was
being novel about roasting people, someone would search the chat and find out
that I said those exact words 2 years ago.
This isn't really exciting from an artistic point of view. You could get the
same result from randomly replying with old chat messages without any additional
data in the mix.
I haven't run `cadeybot` in some time because of this. It gets really boring
really fast.
However, I was looking at some DALL-E generated images and then inspiration
struck:
<xeblog-conv name="Mara" mood="hmm">What if I fed all those tweets into
[GPT-2](https://en.wikipedia.org/wiki/GPT-2)?</xeblog-conv>
So I did that. I made [@robocadey@botsin.space](https://botsin.space/@robocadey)
as a fediverse bot that generates new content based on everything I've ever
tweeted.
<iframe src="https://botsin.space/@robocadey/108219835651549836/embed"
class="mastodon-embed" style="max-width: 100%; border: 0" width="500"
height="245" allowfullscreen="allowfullscreen"></iframe>
## Data
The first step of this is getting all of my tweet data out of Twitter. This
was a lot easier than I thought. All I had to do was submit a GDPR data request,
wait a few days for the cloud to think and then I got a 3 gigabyte zip file full
of everything I've ever tweeted. Cool!
Looking through the dump, I found a 45 megabyte file called `tweets.js`. This
looked like it could be important! So I grabbed it and looked at the first few
lines:
```javascript
$ head tweet.js
window.YTD.tweet.part0 = [
{
"tweet" : {
"retweeted" : false,
"source" : "<a href=\"http://www.bitlbee.org/\" rel=\"nofollow\">BitlBee</a>",
"entities" : {
"hashtags" : [ ],
"symbols" : [ ],
"user_mentions" : [
{
```
So it looks like most of this is really just a giant block of data that's
stuffed into JavaScript so that the embedded HTML can show off everything you've
ever tweeted. Neat, but I only need the tweet contents. We can strip off the
preamble with `sed`, and then grab the first entry out of `tweets.js` with a
command like this:
```json
$ cat tweet.js | sed 's/window.YTD.tweet.part0 = //' | jq .[0]
{
"tweet": {
"retweeted": false,
"source": "<a href=\"http://www.bitlbee.org/\" rel=\"nofollow\">BitlBee</a>",
"entities": {
"hashtags": [],
"symbols": [],
"user_mentions": [
{
"name": "@Lyude@queer.party🌹",
"screen_name": "_Lyude",
"indices": [
"0",
"7"
],
"id_str": "1568160860",
"id": "1568160860"
}
],
"urls": []
},
"display_text_range": [
"0",
"83"
],
"favorite_count": "0",
"in_reply_to_status_id_str": "481634023295709185",
"id_str": "481634194729488386",
"in_reply_to_user_id": "1568160860",
"truncated": false,
"retweet_count": "0",
"id": "481634194729488386",
"in_reply_to_status_id": "481634023295709185",
"created_at": "Wed Jun 25 03:05:15 +0000 2014",
"favorited": false,
"full_text": "@_Lyude but how many licks does it take to get to the centre of a tootsie roll pop?",
"lang": "en",
"in_reply_to_screen_name": "_Lyude",
"in_reply_to_user_id_str": "1568160860"
}
}
```
It looks like most of what I want is in `.tweet.full_text`, so let's make a
giant text file with everything in it:
```sh
sed 's/window.YTD.tweet.part0 = //' < tweets.js \
| jq '.[] | [ select(.tweet.retweeted == false) ] | .[].tweet.full_text' \
| sed -r 's/\s*\.?@[A-Za-z0-9_]+\s*//g' \
| grep -v 'RT:' \
| jq --slurp . \
| jq -r .[] \
| sed -e 's!http[s]\?://\S*!!g' \
| sed '/^$/d' \
> tweets.txt
```
This does a few things:
1. Removes that twitter preamble so jq is happy
2. Removes all at-mentions from the training data (so the bot doesn't go on a
mentioning massacre)
3. Removes the "retweet" prefixed tweets from the dataset
4. Removes all urls
5. Removes all blank lines
This should hopefully cut out all the irrelevant extra crap and let the machine
learning focus on my text, which is what I actually care about.
## Getting It Up
As a prototype, I fed this all into Markov chains. This is boring, but I was
able to graft together a few projects to get that prototype up quickly. After
some testing, I ended up with things like this:
<iframe src="https://botsin.space/@robocadey/108201675365283068/embed"
class="mastodon-embed" style="max-width: 100%; border: 0" width="500"
height="225" allowfullscreen="allowfullscreen"></iframe>
This was probably the best thing to come out of the Markov chain testing phase,
the rest of it was regurgitating old tweets.
While I was doing this, I got GPT-2 training thanks to [this iPython
notebook](https://colab.research.google.com/github/sarthakmalik/GPT2.Training.Google.Colaboratory/blob/master/Train_a_GPT_2_Text_Generating_Model_w_GPU.ipynb).
I uploaded my 1.5 megabyte tweets.txt file and let the big pile of linear
algebra mix around for a bit.
Once it was done, I got a one gigabyte tarball that I extracted into a new
folder imaginatively named `gpt2`. Now I had the model, all I needed to do was
run it. So I wrote some Python:
```python
#!/usr/bin/env python3
import gpt_2_simple as gpt2
import json
import os
import socket
import sys
from datetime import datetime
sockpath = "/xe/gpt2/checkpoint/server.sock"
sess = gpt2.start_tf_sess()
gpt2.load_gpt2(sess, run_name='run1')
if os.path.exists(sockpath):
os.remove(sockpath)
sock = socket.socket(socket.AF_UNIX)
sock.bind(sockpath)
print("Listening on", sockpath)
sock.listen(1)
while True:
connection, client_address = sock.accept()
try:
print("generating shitpost")
result = gpt2.generate(sess,
length=512,
temperature=0.8,
nsamples=1,
batch_size=1,
return_as_list=True,
top_p=0.9,
)[0].split("\n")[1:][:-1]
print("shitpost generated")
connection.send(json.dumps(result).encode())
finally:
connection.close()
server.close()
os.remove("/xe/gpt2/checkpoint/server.sock")
```
And I used a Dockerfile to set up its environment:
```Dockerfile
FROM python:3
RUN pip3 install gpt-2-simple
WORKDIR /xe/gpt2
COPY . .
CMD python3 main.py
```
Then I bind-mounted the premade model into the container and asked it to think
up something for me. I got back a list of replies and then I knew it was good to
go:
```json
[
"oh dear. I don't know if you're the best mannered technologist you've come to expect from such a unique perspective. On the technical side of things, you're a world-class advocate for open source who recently lost an argument over the state of the open source world to bitter enemies like Python.",
"I also like your approach to DNS! One step at a time. More info here: ",
"tl;dr: it's a bunch of random IP addresses and the outcome is a JSON file that you fill out in as you go.",
"datasoftware.reddit.com/r/programmingcirclejerk-memes",
"datasoftware.reddit.com/r/programmingcirclejerk-memes",
"datasoftware.reddit.com/r/programmingcirclejerk-memes",
"datasoftware.reddit.com/r/programmingcirclejerk-memes",
"Oh dear, can we third-person?",
"A group of us is a CVE-1918 impact statement",
"Is that breaking news?",
"Lol datasom shitposting omg ",
"I'm gonna be on the list for #Giving is easy, don't look so far ahead ",
"Oh dear. Welcome to ThePandora: ",
"I use a lot of shift lol",
"I thought you were an orca",
"Foone, my old computer crashed. What happened to your hard drive? ",
"Yeah I know some of those things should be automated, but this is about experimentation and experimentation is what makes me happy",
"Am I? ",
"Experiment is my favorite part of the article",
"Yes I can, scroll past the how to read words videos",
"I was able to see into space but I cannot seen into your eyes",
"This is with a virtual keyboard/MAC address field",
"Yes but with the keymap \"~M\"",
"Yes this is a structural change, I am trying to tease things out a bit. I am trying to make it slightly different sounding with the key mapping. I am trying to make it different sounding sounding.",
"The main thing I am trying to do is make it easy to type backwards. This is going to take experimentation. I am trying to make it slightly different sounding.",
"Is this vehicle of mercy?",
"God i forgot "
]
```
However, this involved using Docker. Docker is decent, but if I have the ability
not to, I don't want to use Docker. A friend of mine named `ckie` saw that I was
using Docker for this and decided to package the `gpt_2_simple` library [into
nixpkgs](https://github.com/NixOS/nixpkgs/pull/170713). They also made it easy
for me to pull it into robocadey's environment and then I ripped out Docker,
never to return.
Now the bot could fly. Here was the first thing it posted after it got online
with GPT-2 in a proper way:
<iframe src="https://botsin.space/@robocadey/108209326706890695/embed"
class="mastodon-embed" style="max-width: 100%; border: 0" width="500"
height="175" height=allowfullscreen="allowfullscreen"></iframe>
I can't make this up.
## Art Gallery
Here are some of my favorite posts it's made. Most of them could pass off as my
tweets.
<iframe src="https://botsin.space/@robocadey/108209924883002812/embed"
class="mastodon-embed" style="max-width: 100%; border: 0" width="400"
height="190" allowfullscreen="allowfullscreen"></iframe>
<iframe src="https://botsin.space/@robocadey/108212424672000652/embed"
class="mastodon-embed" style="max-width: 100%; border: 0" width="400"
height="190" allowfullscreen="allowfullscreen"></iframe>
<iframe src="https://botsin.space/@robocadey/108215827551779879/embed"
class="mastodon-embed" style="max-width: 100%; border: 0" width="400"
height="210" allowfullscreen="allowfullscreen"></iframe>
<iframe src="https://botsin.space/@robocadey/108218889999336372/embed"
class="mastodon-embed" style="max-width: 100%; border: 0" width="400"
height="210" allowfullscreen="allowfullscreen"></iframe>
<iframe src="https://botsin.space/@robocadey/108218894030986305/embed"
class="mastodon-embed" style="max-width: 100%; border: 0" width="800"
height="250" allowfullscreen="allowfullscreen"></iframe>
Some of them get somber and are unintentionally a reflection on the state of the
world we find ourselves in.
<iframe src="https://botsin.space/@robocadey/108219835651549836/embed"
class="mastodon-embed" style="max-width: 100%; border: 0" width="400"
height="280" allowfullscreen="allowfullscreen"></iframe>
<iframe src="https://botsin.space/@robocadey/108218522810351900/embed"
class="mastodon-embed" style="max-width: 100%; border: 0" width="400"
height="280" allowfullscreen="allowfullscreen"></iframe>
<iframe src="https://botsin.space/@robocadey/108217161432474717/embed"
class="mastodon-embed" style="max-width: 100%; border: 0" width="400"
height="345" allowfullscreen="allowfullscreen"></iframe>
<iframe src="https://botsin.space/@robocadey/108216170547691864/embed"
class="mastodon-embed" style="max-width: 100%; border: 0" width="400"
height="280" allowfullscreen="allowfullscreen"></iframe>
Others are silly.
<iframe src="https://botsin.space/@robocadey/108217116321450713/embed"
class="mastodon-embed" style="max-width: 100%; border: 0" width="400"
height="200" allowfullscreen="allowfullscreen"></iframe>
<iframe src="https://botsin.space/@robocadey/108218107689729996/embed"
class="mastodon-embed" style="max-width: 100%; border: 0" width="400"
height="200" allowfullscreen="allowfullscreen"></iframe>
<iframe src="https://botsin.space/@robocadey/108215257978801615/embed"
class="mastodon-embed" style="max-width: 100%; border: 0" width="400"
height="180" allowfullscreen="allowfullscreen"></iframe>
I say things like this:
<iframe src="https://pony.social/@cadey/108218301565484230/embed"
class="mastodon-embed" style="max-width: 100%; border: 0" width="400"
allowfullscreen="allowfullscreen"></iframe><script
src="https://pony.social/embed.js" async="async"></script>
and it fires back with:
<iframe src="https://botsin.space/@robocadey/108218304118515023/embed"
class="mastodon-embed" style="max-width: 100%; border: 0" width="400"
height="180" allowfullscreen="allowfullscreen"></iframe>
This is art. It looks like a robot pretending to be a human and just barely
passing at it. This helps you transform your expectations about what human and
bot tweets really are.
<iframe src="https://botsin.space/@robocadey/108213387014890181/embed"
class="mastodon-embed" style="max-width: 100%; border: 0" width="400"
height="200" allowfullscreen="allowfullscreen"></iframe>
If you want to influence `robocadey` into giving you an artistic experience,
mention it on the fediverse by adding `@robocadey@botsin.space` to your posts.
It will think a bit and then reply with a brand new post for you.
## Setting It Up
You probably don't want to do this, but if you're convinced you do then here's
some things that may help you.
1. Use the systemd units in `/run` of [github:Xe/x](https://github.com/Xe/x).
2. Put your model into a squashfs volume that you mount to the
`/var/lib/private/xeserv.robocadey-gpt2/checkpoint` folder.
3. Don't expect any warranty, reliability promises or assistance setting this
up. I made this for myself, not for others. Its source code is made available
to make the code part of that art, but the code is not the art that it makes.
Good luck.
---
I guess what I think about art is that it's not just the medium. It's not just
the expression. It's the combination of it all. The expression, the medium, the
circumstances, all of that leads into what art really is. I could say that art
is the intangible expressions, emotions, and whatever that you experience when
looking at things; but that sounds really really pretentious, so let's just say
that art doesn't exist. Well it does, but only in the mind of the viewer.
There's not some objective scale that can say that something is or is not an
art. Art is imagined and we are conditioned to believe that things are or are
not art based on our upbringing.
I feel that as a shitposter my goal is to challenge people's "objective sense"
of what "can" and "can't" be art by sitting right in the middle of the two and
laughing. Projects like `robocadey` are how I make art. It's like what 200 lines
of code at most. You could probably recreate most of it based on the contents of
this post alone. I wonder if part of the art here comes from the fact that most
of this is so iterative yet so novel. Through the iteration process I end up
creating novelty.
You could also say that art is the antidote to the kind of suffering that comes
from the fundamental dissatisfactions that people have with everyday life. By
that defintion, I think that `robocadey` counts as art.
Either way, it's fun to do these things. I hope that this art can help inspire
you to think differently about the world. Even though it's through a chatbot
that says things like this:
<iframe src="https://botsin.space/@robocadey/108215945151030016/embed"
class="mastodon-embed" style="max-width: 100%; border: 0" width="400"
height="200" allowfullscreen="allowfullscreen"></iframe>
What is this if not art?