forked from cadey/xesite
shitposting as a service
Signed-off-by: Xe Iaso <me@christine.website>
This commit is contained in:
parent
021f70fd90
commit
dacc7159d7
|
@ -0,0 +1,474 @@
|
|||
---
|
||||
title: "robocadey: Shitposting as a Service"
|
||||
date: 2022-04-30
|
||||
tags:
|
||||
- gpt2
|
||||
- machinelearning
|
||||
- python
|
||||
- golang
|
||||
- art
|
||||
vod:
|
||||
twitch: https://www.twitch.tv/videos/1471211336
|
||||
youtube: https://youtu.be/UAd-mWMG198
|
||||
---
|
||||
|
||||
<noscript>
|
||||
|
||||
[Hey, you need to enable JavaScript for most of the embedded posts in this
|
||||
article to work. Sorry about this, we are working on a better solution, but this
|
||||
is what we have right now.](conversation://Mara/hacker)
|
||||
|
||||
</noscript>
|
||||
|
||||
What is art? Art is when you challenge the assumptions that people make about a
|
||||
medium and use that conflict to help them change what they think about that
|
||||
medium. Let's take "Comedian" by Maurizio Cattelan for example:
|
||||
|
||||
![A banana duct-taped to an artist's
|
||||
canvas](https://cdn.christine.website/file/christine-static/blog/merlin_165616527_d76f38fc-e45d-4913-9780-1cc939750197-superJumbo.jpg)
|
||||
|
||||
By my arbitrary definition above, this is art. This takes assumptions that you
|
||||
have about paintings (you know, that they use paint on the canvas) and discards
|
||||
them. This lets you change what you think art is. Art is not about the medium or
|
||||
the things in it. Art is the expression of these things in new and exiting ways.
|
||||
|
||||
<xeblog-conv name="Cadey" mood="coffee">Originally I was going to use some
|
||||
Banksky art here, but for understandable reasons it's quite difficult to get
|
||||
images of Banksky art.</xeblog-conv>
|
||||
|
||||
One of my favorite kinds of art is the "uncanny valley" of realism. Let's take
|
||||
Death Stranding as an example of this. Death Stranding is a video game that was
|
||||
released in 2019 for the PlayStation 4 and is one of my favorite games of all
|
||||
time. The game has a very hyper-realistic art style that is firmly in the
|
||||
centre of the uncanny valley:
|
||||
|
||||
![A picture of Death Stranding gameplay, showing the protagonist Sam Porter
|
||||
Bridges attempting to climb a sheer cliff face using a rope that another player
|
||||
left
|
||||
behind](https://cdn.christine.website/file/christine-static/blog/20220202215156_3.jpg)
|
||||
|
||||
This game mixes very realistic scenery with a story about dead bodies turning
|
||||
into antimatter and you being a UPS delivery person that saves America. This is
|
||||
art to me. This transformed what a video game could be, even if the entire game
|
||||
boils down to Kojima themed fetch quests. Oh and trying not to die even though
|
||||
you can't die but when you die it's really bad.
|
||||
|
||||
I want to create this kind of art, and I think I have found a good medium to do
|
||||
this with. I write a lot on this little independent site called Twitter. This is
|
||||
one of the main things that I write on, and through the process of the last 8
|
||||
years or so, I've written a shockingly large amound of things. I post a lot of
|
||||
weird things there as well as a lot of boring/normal things.
|
||||
|
||||
However a lot of my posts boil down to creating a "stream of consciousness", or
|
||||
using it as a way to help deal with intrusive thoughts. There's a certain art to
|
||||
this, as it is a candid exchange between the author and the reader. The reader
|
||||
doesn't get all the context (heck, I doubt that I have all the context lol), but
|
||||
from there they get to put the pieces together.
|
||||
|
||||
So, when thinking about trying to get into the uncanny valley with this kind of
|
||||
art medium, my mind goes back to the old days on IRC channels. Many IRC channels
|
||||
run bots to help them run the channel or purely for amusement. One of my
|
||||
favorite kinds of bots is a [Markov
|
||||
chain](https://en.wikipedia.org/wiki/Markov_chain) bot. These kinds of bots
|
||||
learn patterns in text and then try to repeat them at random. With enough
|
||||
training data, it can be fairly convincing at first glance. However, you need _a
|
||||
lot_ of training data to get there. More training data than I have ever tweeted.
|
||||
|
||||
This ends up creating a situation where the markov bot is right in the uncanny
|
||||
valley of realism. At first glance it is something that isn't not plausibly
|
||||
human. It looks like a bot, but it also looks like a human, but it also looks
|
||||
like a bot. It appears to be in the middle. I like this from an artistic
|
||||
standpoint because this challenges your assumptions that bots need to be
|
||||
obviously bots and humans need to be obviously human.
|
||||
|
||||
In the past I have ran a service I call `cadeybot`. It took all of my Discord
|
||||
messages, fed them into a Markov chain, and then attempted to create new
|
||||
messages as a result. This worked pretty well, but we ran into an issue where it
|
||||
would basically regurgitate its training data. So when people thought it was
|
||||
being novel about roasting people, someone would search the chat and find out
|
||||
that I said those exact words 2 years ago.
|
||||
|
||||
This isn't really exciting from an artistic point of view. You could get the
|
||||
same result from randomly replying with old chat messages without any additional
|
||||
data in the mix.
|
||||
|
||||
I haven't run `cadeybot` in some time because of this. It gets really boring
|
||||
really fast.
|
||||
|
||||
However, I was looking at some DALL-E generated images and then inspiration
|
||||
struck:
|
||||
|
||||
<xeblog-conv name="Mara" mood="hmm">What if I fed all those tweets into
|
||||
[GPT-2](https://en.wikipedia.org/wiki/GPT-2)?</xeblog-conv>
|
||||
|
||||
So I did that. I made [@robocadey@botsin.space](https://botsin.space/@robocadey)
|
||||
as a fediverse bot that generates new content based on everything I've ever
|
||||
tweeted.
|
||||
|
||||
<iframe src="https://botsin.space/@robocadey/108219835651549836/embed"
|
||||
class="mastodon-embed" style="max-width: 100%; border: 0" width="500"
|
||||
height="245" allowfullscreen="allowfullscreen"></iframe>
|
||||
|
||||
## Data
|
||||
|
||||
The first step of this is getting all of my tweet data out of Twitter. This
|
||||
was a lot easier than I thought. All I had to do was submit a GDPR data request,
|
||||
wait a few days for the cloud to think and then I got a 3 gigabyte zip file full
|
||||
of everything I've ever tweeted. Cool!
|
||||
|
||||
Looking through the dump, I found a 45 megabyte file called `tweets.js`. This
|
||||
looked like it could be important! So I grabbed it and looked at the first few
|
||||
lines:
|
||||
|
||||
```javascript
|
||||
$ head tweet.js
|
||||
window.YTD.tweet.part0 = [
|
||||
{
|
||||
"tweet" : {
|
||||
"retweeted" : false,
|
||||
"source" : "<a href=\"http://www.bitlbee.org/\" rel=\"nofollow\">BitlBee</a>",
|
||||
"entities" : {
|
||||
"hashtags" : [ ],
|
||||
"symbols" : [ ],
|
||||
"user_mentions" : [
|
||||
{
|
||||
```
|
||||
|
||||
So it looks like most of this is really just a giant block of data that's
|
||||
stuffed into JavaScript so that the embedded HTML can show off everything you've
|
||||
ever tweeted. Neat, but I only need the tweet contents. We can strip off the
|
||||
preamble with `sed`, and then grab the first entry out of `tweets.js` with a
|
||||
command like this:
|
||||
|
||||
```json
|
||||
$ cat tweet.js | sed 's/window.YTD.tweet.part0 = //' | jq .[0]
|
||||
{
|
||||
"tweet": {
|
||||
"retweeted": false,
|
||||
"source": "<a href=\"http://www.bitlbee.org/\" rel=\"nofollow\">BitlBee</a>",
|
||||
"entities": {
|
||||
"hashtags": [],
|
||||
"symbols": [],
|
||||
"user_mentions": [
|
||||
{
|
||||
"name": "@Lyude@queer.party🌹",
|
||||
"screen_name": "_Lyude",
|
||||
"indices": [
|
||||
"0",
|
||||
"7"
|
||||
],
|
||||
"id_str": "1568160860",
|
||||
"id": "1568160860"
|
||||
}
|
||||
],
|
||||
"urls": []
|
||||
},
|
||||
"display_text_range": [
|
||||
"0",
|
||||
"83"
|
||||
],
|
||||
"favorite_count": "0",
|
||||
"in_reply_to_status_id_str": "481634023295709185",
|
||||
"id_str": "481634194729488386",
|
||||
"in_reply_to_user_id": "1568160860",
|
||||
"truncated": false,
|
||||
"retweet_count": "0",
|
||||
"id": "481634194729488386",
|
||||
"in_reply_to_status_id": "481634023295709185",
|
||||
"created_at": "Wed Jun 25 03:05:15 +0000 2014",
|
||||
"favorited": false,
|
||||
"full_text": "@_Lyude but how many licks does it take to get to the centre of a tootsie roll pop?",
|
||||
"lang": "en",
|
||||
"in_reply_to_screen_name": "_Lyude",
|
||||
"in_reply_to_user_id_str": "1568160860"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
It looks like most of what I want is in `.tweet.full_text`, so let's make a
|
||||
giant text file with everything in it:
|
||||
|
||||
```sh
|
||||
sed 's/window.YTD.tweet.part0 = //' < tweets.js \
|
||||
| jq '.[] | [ select(.tweet.retweeted == false) ] | .[].tweet.full_text' \
|
||||
| sed -r 's/\s*\.?@[A-Za-z0-9_]+\s*//g' \
|
||||
| grep -v 'RT:' \
|
||||
| jq --slurp . \
|
||||
| jq -r .[] \
|
||||
| sed -e 's!http[s]\?://\S*!!g' \
|
||||
| sed '/^$/d' \
|
||||
> tweets.txt
|
||||
```
|
||||
|
||||
This does a few things:
|
||||
|
||||
1. Removes that twitter preamble so jq is happy
|
||||
2. Removes all at-mentions from the training data (so the bot doesn't go on a
|
||||
mentioning massacre)
|
||||
3. Removes the "retweet" prefixed tweets from the dataset
|
||||
4. Removes all urls
|
||||
5. Removes all blank lines
|
||||
|
||||
This should hopefully cut out all the irrelevant extra crap and let the machine
|
||||
learning focus on my text, which is what I actually care about.
|
||||
|
||||
## Getting It Up
|
||||
|
||||
As a prototype, I fed this all into Markov chains. This is boring, but I was
|
||||
able to graft together a few projects to get that prototype up quickly. After
|
||||
some testing, I ended up with things like this:
|
||||
|
||||
<iframe src="https://botsin.space/@robocadey/108201675365283068/embed"
|
||||
class="mastodon-embed" style="max-width: 100%; border: 0" width="500"
|
||||
height="225" allowfullscreen="allowfullscreen"></iframe>
|
||||
|
||||
This was probably the best thing to come out of the Markov chain testing phase,
|
||||
the rest of it was regurgitating old tweets.
|
||||
|
||||
While I was doing this, I got GPT-2 training thanks to [this iPython
|
||||
notebook](https://colab.research.google.com/github/sarthakmalik/GPT2.Training.Google.Colaboratory/blob/master/Train_a_GPT_2_Text_Generating_Model_w_GPU.ipynb).
|
||||
I uploaded my 1.5 megabyte tweets.txt file and let the big pile of linear
|
||||
algebra mix around for a bit.
|
||||
|
||||
Once it was done, I got a one gigabyte tarball that I extracted into a new
|
||||
folder imaginatively named `gpt2`. Now I had the model, all I needed to do was
|
||||
run it. So I wrote some Python:
|
||||
|
||||
```python
|
||||
#!/usr/bin/env python3
|
||||
|
||||
import gpt_2_simple as gpt2
|
||||
import json
|
||||
import os
|
||||
import socket
|
||||
import sys
|
||||
from datetime import datetime
|
||||
|
||||
sockpath = "/xe/gpt2/checkpoint/server.sock"
|
||||
|
||||
sess = gpt2.start_tf_sess()
|
||||
gpt2.load_gpt2(sess, run_name='run1')
|
||||
|
||||
if os.path.exists(sockpath):
|
||||
os.remove(sockpath)
|
||||
|
||||
sock = socket.socket(socket.AF_UNIX)
|
||||
sock.bind(sockpath)
|
||||
|
||||
print("Listening on", sockpath)
|
||||
sock.listen(1)
|
||||
|
||||
while True:
|
||||
connection, client_address = sock.accept()
|
||||
try:
|
||||
print("generating shitpost")
|
||||
result = gpt2.generate(sess,
|
||||
length=512,
|
||||
temperature=0.8,
|
||||
nsamples=1,
|
||||
batch_size=1,
|
||||
return_as_list=True,
|
||||
top_p=0.9,
|
||||
)[0].split("\n")[1:][:-1]
|
||||
print("shitpost generated")
|
||||
connection.send(json.dumps(result).encode())
|
||||
finally:
|
||||
connection.close()
|
||||
|
||||
server.close()
|
||||
os.remove("/xe/gpt2/checkpoint/server.sock")
|
||||
```
|
||||
|
||||
And I used a Dockerfile to set up its environment:
|
||||
|
||||
```Dockerfile
|
||||
FROM python:3
|
||||
RUN pip3 install gpt-2-simple
|
||||
WORKDIR /xe/gpt2
|
||||
COPY . .
|
||||
CMD python3 main.py
|
||||
```
|
||||
|
||||
Then I bind-mounted the premade model into the container and asked it to think
|
||||
up something for me. I got back a list of replies and then I knew it was good to
|
||||
go:
|
||||
|
||||
```json
|
||||
[
|
||||
"oh dear. I don't know if you're the best mannered technologist you've come to expect from such a unique perspective. On the technical side of things, you're a world-class advocate for open source who recently lost an argument over the state of the open source world to bitter enemies like Python.",
|
||||
"I also like your approach to DNS! One step at a time. More info here: ",
|
||||
"tl;dr: it's a bunch of random IP addresses and the outcome is a JSON file that you fill out in as you go.",
|
||||
"datasoftware.reddit.com/r/programmingcirclejerk-memes",
|
||||
"datasoftware.reddit.com/r/programmingcirclejerk-memes",
|
||||
"datasoftware.reddit.com/r/programmingcirclejerk-memes",
|
||||
"datasoftware.reddit.com/r/programmingcirclejerk-memes",
|
||||
"Oh dear, can we third-person?",
|
||||
"A group of us is a CVE-1918 impact statement",
|
||||
"Is that breaking news?",
|
||||
"Lol datasom shitposting omg ",
|
||||
"I'm gonna be on the list for #Giving is easy, don't look so far ahead ",
|
||||
"Oh dear. Welcome to ThePandora: ",
|
||||
"I use a lot of shift lol",
|
||||
"I thought you were an orca",
|
||||
"Foone, my old computer crashed. What happened to your hard drive? ",
|
||||
"Yeah I know some of those things should be automated, but this is about experimentation and experimentation is what makes me happy",
|
||||
"Am I? ",
|
||||
"Experiment is my favorite part of the article",
|
||||
"Yes I can, scroll past the how to read words videos",
|
||||
"I was able to see into space but I cannot seen into your eyes",
|
||||
"This is with a virtual keyboard/MAC address field",
|
||||
"Yes but with the keymap \"~M\"",
|
||||
"Yes this is a structural change, I am trying to tease things out a bit. I am trying to make it slightly different sounding with the key mapping. I am trying to make it different sounding sounding.",
|
||||
"The main thing I am trying to do is make it easy to type backwards. This is going to take experimentation. I am trying to make it slightly different sounding.",
|
||||
"Is this vehicle of mercy?",
|
||||
"God i forgot "
|
||||
]
|
||||
```
|
||||
|
||||
However, this involved using Docker. Docker is decent, but if I have the ability
|
||||
not to, I don't want to use Docker. A friend of mine named `ckie` saw that I was
|
||||
using Docker for this and decided to package the `gpt_2_simple` library [into
|
||||
nixpkgs](https://github.com/NixOS/nixpkgs/pull/170713). They also made it easy
|
||||
for me to pull it into robocadey's environment and then I ripped out Docker,
|
||||
never to return.
|
||||
|
||||
Now the bot could fly. Here was the first thing it posted after it got online
|
||||
with GPT-2 in a proper way:
|
||||
|
||||
<iframe src="https://botsin.space/@robocadey/108209326706890695/embed"
|
||||
class="mastodon-embed" style="max-width: 100%; border: 0" width="500"
|
||||
height="175" height=allowfullscreen="allowfullscreen"></iframe>
|
||||
|
||||
I can't make this up.
|
||||
|
||||
## Art Gallery
|
||||
|
||||
Here are some of my favorite posts it's made. Most of them could pass off as my
|
||||
tweets.
|
||||
|
||||
<iframe src="https://botsin.space/@robocadey/108209924883002812/embed"
|
||||
class="mastodon-embed" style="max-width: 100%; border: 0" width="400"
|
||||
height="190" allowfullscreen="allowfullscreen"></iframe>
|
||||
|
||||
<iframe src="https://botsin.space/@robocadey/108212424672000652/embed"
|
||||
class="mastodon-embed" style="max-width: 100%; border: 0" width="400"
|
||||
height="190" allowfullscreen="allowfullscreen"></iframe>
|
||||
|
||||
<iframe src="https://botsin.space/@robocadey/108215827551779879/embed"
|
||||
class="mastodon-embed" style="max-width: 100%; border: 0" width="400"
|
||||
height="210" allowfullscreen="allowfullscreen"></iframe>
|
||||
|
||||
<iframe src="https://botsin.space/@robocadey/108218889999336372/embed"
|
||||
class="mastodon-embed" style="max-width: 100%; border: 0" width="400"
|
||||
height="210" allowfullscreen="allowfullscreen"></iframe>
|
||||
|
||||
<iframe src="https://botsin.space/@robocadey/108218894030986305/embed"
|
||||
class="mastodon-embed" style="max-width: 100%; border: 0" width="800"
|
||||
height="250" allowfullscreen="allowfullscreen"></iframe>
|
||||
|
||||
Some of them get somber and are unintentionally a reflection on the state of the
|
||||
world we find ourselves in.
|
||||
|
||||
<iframe src="https://botsin.space/@robocadey/108219835651549836/embed"
|
||||
class="mastodon-embed" style="max-width: 100%; border: 0" width="400"
|
||||
height="280" allowfullscreen="allowfullscreen"></iframe>
|
||||
|
||||
<iframe src="https://botsin.space/@robocadey/108218522810351900/embed"
|
||||
class="mastodon-embed" style="max-width: 100%; border: 0" width="400"
|
||||
height="280" allowfullscreen="allowfullscreen"></iframe>
|
||||
|
||||
<iframe src="https://botsin.space/@robocadey/108217161432474717/embed"
|
||||
class="mastodon-embed" style="max-width: 100%; border: 0" width="400"
|
||||
height="345" allowfullscreen="allowfullscreen"></iframe>
|
||||
|
||||
<iframe src="https://botsin.space/@robocadey/108216170547691864/embed"
|
||||
class="mastodon-embed" style="max-width: 100%; border: 0" width="400"
|
||||
height="280" allowfullscreen="allowfullscreen"></iframe>
|
||||
|
||||
Others are silly.
|
||||
|
||||
<iframe src="https://botsin.space/@robocadey/108217116321450713/embed"
|
||||
class="mastodon-embed" style="max-width: 100%; border: 0" width="400"
|
||||
height="200" allowfullscreen="allowfullscreen"></iframe>
|
||||
|
||||
<iframe src="https://botsin.space/@robocadey/108218107689729996/embed"
|
||||
class="mastodon-embed" style="max-width: 100%; border: 0" width="400"
|
||||
height="200" allowfullscreen="allowfullscreen"></iframe>
|
||||
|
||||
<iframe src="https://botsin.space/@robocadey/108215257978801615/embed"
|
||||
class="mastodon-embed" style="max-width: 100%; border: 0" width="400"
|
||||
height="180" allowfullscreen="allowfullscreen"></iframe>
|
||||
|
||||
I say things like this:
|
||||
|
||||
<iframe src="https://pony.social/@cadey/108218301565484230/embed"
|
||||
class="mastodon-embed" style="max-width: 100%; border: 0" width="400"
|
||||
allowfullscreen="allowfullscreen"></iframe><script
|
||||
src="https://pony.social/embed.js" async="async"></script>
|
||||
|
||||
and it fires back with:
|
||||
|
||||
<iframe src="https://botsin.space/@robocadey/108218304118515023/embed"
|
||||
class="mastodon-embed" style="max-width: 100%; border: 0" width="400"
|
||||
height="180" allowfullscreen="allowfullscreen"></iframe>
|
||||
|
||||
This is art. It looks like a robot pretending to be a human and just barely
|
||||
passing at it. This helps you transform your expectations about what human and
|
||||
bot tweets really are.
|
||||
|
||||
<iframe src="https://botsin.space/@robocadey/108213387014890181/embed"
|
||||
class="mastodon-embed" style="max-width: 100%; border: 0" width="400"
|
||||
height="200" allowfullscreen="allowfullscreen"></iframe>
|
||||
|
||||
If you want to influence `robocadey` into giving you an artistic experience,
|
||||
mention it on the fediverse by adding `@robocadey@botsin.space` to your posts.
|
||||
It will think a bit and then reply with a brand new post for you.
|
||||
|
||||
## Setting It Up
|
||||
|
||||
You probably don't want to do this, but if you're convinced you do then here's
|
||||
some things that may help you.
|
||||
|
||||
1. Use the systemd units in `/run` of [github:Xe/x](https://github.com/Xe/x).
|
||||
2. Put your model into a squashfs volume that you mount to the
|
||||
`/var/lib/private/xeserv.robocadey-gpt2/checkpoint` folder.
|
||||
3. Don't expect any warranty, reliability promises or assistance setting this
|
||||
up. I made this for myself, not for others. Its source code is made available
|
||||
to make the code part of that art, but the code is not the art that it makes.
|
||||
|
||||
Good luck.
|
||||
|
||||
---
|
||||
|
||||
I guess what I think about art is that it's not just the medium. It's not just
|
||||
the expression. It's the combination of it all. The expression, the medium, the
|
||||
circumstances, all of that leads into what art really is. I could say that art
|
||||
is the intangible expressions, emotions, and whatever that you experience when
|
||||
looking at things; but that sounds really really pretentious, so let's just say
|
||||
that art doesn't exist. Well it does, but only in the mind of the viewer.
|
||||
|
||||
There's not some objective scale that can say that something is or is not an
|
||||
art. Art is imagined and we are conditioned to believe that things are or are
|
||||
not art based on our upbringing.
|
||||
|
||||
I feel that as a shitposter my goal is to challenge people's "objective sense"
|
||||
of what "can" and "can't" be art by sitting right in the middle of the two and
|
||||
laughing. Projects like `robocadey` are how I make art. It's like what 200 lines
|
||||
of code at most. You could probably recreate most of it based on the contents of
|
||||
this post alone. I wonder if part of the art here comes from the fact that most
|
||||
of this is so iterative yet so novel. Through the iteration process I end up
|
||||
creating novelty.
|
||||
|
||||
You could also say that art is the antidote to the kind of suffering that comes
|
||||
from the fundamental dissatisfactions that people have with everyday life. By
|
||||
that defintion, I think that `robocadey` counts as art.
|
||||
|
||||
Either way, it's fun to do these things. I hope that this art can help inspire
|
||||
you to think differently about the world. Even though it's through a chatbot
|
||||
that says things like this:
|
||||
|
||||
<iframe src="https://botsin.space/@robocadey/108215945151030016/embed"
|
||||
class="mastodon-embed" style="max-width: 100%; border: 0" width="400"
|
||||
height="200" allowfullscreen="allowfullscreen"></iframe>
|
||||
|
||||
What is this if not art?
|
Loading…
Reference in New Issue