VTubing on Linux

Signed-off-by: Xe <me@christine.website>
This commit is contained in:
Cadey Ratio 2022-01-15 14:21:00 -05:00
parent b0e2ed1da8
commit 061d069f72
4 changed files with 384 additions and 0 deletions

View File

@ -0,0 +1,233 @@
---
title: VTubing on Linux
date: 2022-01-15
series: vtuber
tags:
- envtuber
- nixos
- yearofthelinuxdesktop
---
In my [last post](/blog/vtubing-setup-2022-01-13) I went through my VTubing
setup on Windows and all the "generations" of setup that I've done over the last
year. Thanks to the meddling of a certain nerd who is in the chat watching me
write this, I have figured out a way to run this setup on Linux. The ultimate
goal for this phase is to get all this running on my work laptop so I can use it
for a webcam. However this post is just going to cover the Linux setup bits.
## Differences Between OSes
On Windows, this setup is really straightforward. VSeeFace provides a [webcam
driver](https://www.vseeface.icu/#virtual-camera) that makes the output of the
VSeeFace app pretend to be a USB webcam. Google Meets, OBS and the like can then
pick that up like it was a normal webcam. The overall flow looks like this:
![The webcam connects over USB to VSeeFace, VSeeFace pretends to be a webcam to
OBS and OBS sends video frames to
Twitch.](/static/blog/vtubing-linux/windows.svg)
This doesn't work at all on Linux though. There's no real way to get VSeeFace (a
windows application that runs under Unity) to directly pretend to be a webcam at
this moment.
[Pedantically, you can probably get away with doing this using a combination of
PipeWire, Video4Linux or some other incarnation like that, but the main point
here is that VSeeFace is a Windows app and I don't think it's possible to make
Linux-specific calls like that. Feel free to prove me
wrong.](conversation://Mara/hacker)
So, instead we need to have VSeeFace directly output to OBS. This makes the flow
look something like this:
![The webcam connects over USB to OpenSeeFace, OpenSeeFace sends UDP packets to
VSeeFace, OBS grabs the VSeeFace window via XComposite, OBS then sends video
frames to Twitch.](/static/blog/vtubing-linux/nixos.svg)
The main difference is that for some reason VSeeFace on Linux can't capture the
webcam directly. This isn't an issue however because
[OpenSeeFace](https://github.com/emilianavt/OpenSeeFace) can capture the webcam
and then send the face capture data directly to VSeeFace instead. Then OBS can
grab VSeeFace via XComposite like normal.
[There may be a way to do this in Wayland, however we haven't figured that out
yet. Please let me know if you figure out a way to get this working in
Wayland.](conversation://Mara/hacker)
One of the major usability differences here is that OpenSeeFace has support for
tracking blinking. However, at the same time my avatar opens its eyes really
slowly when I do blink. There's probably a slider I need to set to make this
less...horrible, but overall it does work! I don't get this on Windows, that's
interesting.
[Kieto, his eyes closed!](conversation://Numa/delet)
## Failed Attempts
One of the biggest stumbling points was the fact that VSeeFace is distributed as
a 64 bit application. Somehow my naive usage of Wine in its default config
caused me to create a 32 bit Wine prefix (it was then I learned that there are
such things as 32 and 64 bit prefixes and how they are mutually incompatible),
which made it impossible to launch VSeeFace because Wine would reject it for
being a 64 bit program.
I went through several rounds of nuking `~/.wine`, trying to run it again,
setting various weird environment variables, setting build overrides, it was a
catastrophe.
Other people have reported that you need to use
[Lutris](https://dumbotaku.com/info/401) to install and use VSeeFace on Linux.
This did not work. This did not work at all. Trying to do it this way on a NixOS
machine was an absolute waste of my time and was demoralizing and frustrating.
[I think it has to do with the fact that Lutris really really really really
wants to have its own special snowflake vendored copies of Wine/Proton and it
will fight you if you try to have your way otherwise.](conversation://Cadey/coffee)
Then I realized that I was doing all this on my work laptop. This laptop is
fairly standard, but also incredibly cursed in its own unique and fun ways. It
shipped with Windows, but also with all the annoying "screw you for wanting to
use Linux" settings turned on. Getting to the point where a NixOS ISO would boot
was an exercise in tedium and randomly flipping settings on and off.
So on the request of the aforementioned meddler, I tried running VSeeFace on my
gaming tower.
It worked first try.
[AAAAAA](conversation://Cadey/coffee)
## How To Make This Creative Abomination Come To Fruition on NixOS
The easiest part of getting all this working is to download VSeeFace. You just
[download the .zip](https://www.vseeface.icu/) from the main page and extract
into your Downloads folder.
Then you need to add the following to your `configuration.nix` file:
```nix
# ...
environment.systemPackages = with pkgs; [
# vseeface
wine64
winetricks
];
# ...
```
Rebuild and then this will put Wine (as `wine64`) in your `$PATH`. Now you need
to install the Arial font using winetricks:
```console
$ env WINE=wine64 winetricks arial
```
This will take a moment to create your Wine prefix in `~/.wine` and populate it
with the needed fonts. VSeeFace uses the Arial font everywhere in the UI, so
this is not an optional step.
Now, clone OpenSeeFace to somewhere:
```console
$ git clone https://github.com/emilianavt/OpenSeeFace ~/tmp/OpenSeeFace
```
And then copy in this `shell.nix` file into the root of the git repo:
```nix
{ pkgs ? import <nixpkgs> { } }:
(pkgs.buildFHSUserEnv {
name = "pipzone";
targetPkgs = pkgs:
(with pkgs; [
python39
python39Packages.pip
python39Packages.virtualenv
libGL
libGLU
glib
]);
runScript = "bash";
}).env
```
Then run `nix-shell` to activate an environment that will pretend to be a normal
Linux system and paste in these commands to set up the Python environment:
```
python -m venv .venv
source .venv/bin/activate
pip3 install onnxruntime opencv-python pillow numpy
```
This will install the dependencies into a python venv.
[We can't really use a normal Nix packaging flow here because <a
href="https://github.com/jonringer/nixpkgs/commit/bc2b132f98b48220fa5ec148aa2ba170aeb9a891">onnixruntime
was removed from nixpkgs</a>. This is okay though, we can hack around
this!](conversation://Mara/hacker)
Then you can run OpenSeeFace and you will see many lines of output:
```console
$ python facetracker.py -c 0 -W 1280 -H 720 --discard-after 0 --scan-every 0 --no-3d-adapt 1 --max-feature-updates 900
```
This will show many lines that look something like this:
```
Took 20.50ms (detect: 0.00ms, crop: 0.82ms, track: 17.70ms, 3D points: 1.93ms)
Confidence[0]: 0.9148 / 3D fitting error: 12.7974 / Eyes: O, O
```
This dumps most of the internal state of the face tracking algorithm. VSeeFace
will pick up on this and then turn that into movement instructions for your
waifu.
Finally you can make an XComposite capture in OBS and then use that to get
things through to Twitch that way.
## Nice Wrapper Script
[All these instructions are lame, I just wanna get it done
fast!](conversation://Numa/delet)
You can get this all running with a super hacky script like this!
```shell
#!/usr/bin/env nix-shell
#! nix-shell -p wget -p git -p winetricks -p wine64 -i bash
mkdir -p ~/tmp/VTubing
cd ~/tmp/VTubing
wget https://github.com/emilianavt/VSeeFaceReleases/releases/download/v1.13.37b/VSeeFace-v1.13.37b.zip
unzip VSeeFace-v1.13.37b.zip
WINE=wine64 winetricks arial
git clone https://github.com/emilianavt/OpenSeeFace
(cd OpenSeeFace && wget -O shell.nix https://gist.githubusercontent.com/Xe/d739fd94c81c1690645c8f4607058488/raw/100c8c5e43ed8dc4b19b890173234ff28b0f9c7e/shell.nix | base64 -d > shell.nix && nix-shell) &
(cd VSeeFace && wine64 VSeeFace.exe) &
wait
```
This will get you everything set up and ready to go in a flash! No warranty.
[You should really do this automagically with Nix.](conversation://Mara/hmm)
[Yes, I should, but that is for another day. This day is not today.](conversation://Cadey/coffee)
---
I'm really glad that I have this working on Linux though. I feel really bad
about being known as a Linux enthusiast but then all of my streams are visibly
using Windows. It's totally valid to want to start out on Windows because it's
easier though. This stuff is baroque and complicated. Hopefully this will make
the path a bit clearer if you want to do VTubing on Linux like I am.
This article was written live on Twitch! Check out the stream vod
[here](https://www.twitch.tv/videos/1264594247), and in a few days it will be live on YouTube
[here](https://youtu.be/cSR1ZA012aQ). Follow [my channel](https://twitch.tv/princessxen)
and get notified when I go live with more writing.

View File

@ -1,6 +1,7 @@
---
title: How I VTuber
date: 2022-01-13
series: vtuber
tags:
- ENVtuber
---

View File

@ -0,0 +1,75 @@
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN"
"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<!-- Generated by graphviz version 2.40.1 (20161225.0304)
-->
<!-- Title: G Pages: 1 -->
<svg width="316pt" height="318pt"
viewBox="0.00 0.00 316.17 317.60" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
<g id="graph0" class="graph" transform="scale(1 1) rotate(0) translate(4 313.6)">
<title>G</title>
<polygon fill="#ffffff" stroke="transparent" points="-4,4 -4,-313.6 312.174,-313.6 312.174,4 -4,4"/>
<g id="clust1" class="cluster">
<title>cluster_1</title>
<polygon fill="#d3d3d3" stroke="#d3d3d3" points="8,-8 8,-230.8 230,-230.8 230,-8 8,-8"/>
<text text-anchor="middle" x="119" y="-214.2" font-family="Times,serif" font-size="14.00" fill="#000000">NixOS</text>
</g>
<!-- webcam -->
<g id="node1" class="node">
<title>webcam</title>
<ellipse fill="none" stroke="#000000" cx="80" cy="-291.6" rx="43.4183" ry="18"/>
<text text-anchor="middle" x="80" y="-287.4" font-family="Times,serif" font-size="14.00" fill="#000000">webcam</text>
</g>
<!-- losf -->
<g id="node3" class="node">
<title>losf</title>
<ellipse fill="#ffffff" stroke="#ffffff" cx="80" cy="-180" rx="64.2436" ry="18"/>
<text text-anchor="middle" x="80" y="-175.8" font-family="Times,serif" font-size="14.00" fill="#000000">OpenSeeFace</text>
</g>
<!-- webcam&#45;&gt;losf -->
<g id="edge1" class="edge">
<title>webcam&#45;&gt;losf</title>
<path fill="none" stroke="#000000" d="M80,-273.1715C80,-255.539 80,-228.6924 80,-208.3391"/>
<polygon fill="#000000" stroke="#000000" points="83.5001,-208.0855 80,-198.0856 76.5001,-208.0856 83.5001,-208.0855"/>
<text text-anchor="middle" x="93.6129" y="-243" font-family="Times,serif" font-size="14.00" fill="#000000">USB</text>
</g>
<!-- twitch -->
<g id="node2" class="node">
<title>twitch</title>
<ellipse fill="none" stroke="#000000" cx="273" cy="-107" rx="35.3489" ry="18"/>
<text text-anchor="middle" x="273" y="-102.8" font-family="Times,serif" font-size="14.00" fill="#000000">twitch</text>
</g>
<!-- lvsf -->
<g id="node4" class="node">
<title>lvsf</title>
<ellipse fill="#ffffff" stroke="#ffffff" cx="114" cy="-34" rx="50.3567" ry="18"/>
<text text-anchor="middle" x="114" y="-29.8" font-family="Times,serif" font-size="14.00" fill="#000000">VSeeFace</text>
</g>
<!-- losf&#45;&gt;lvsf -->
<g id="edge2" class="edge">
<title>losf&#45;&gt;lvsf</title>
<path fill="none" stroke="#000000" d="M82.7781,-161.739C85.736,-143.2703 90.8362,-113.9646 97.0042,-89 99.2096,-80.0738 102.0249,-70.4601 104.7268,-61.8057"/>
<polygon fill="#000000" stroke="#000000" points="108.0909,-62.7767 107.8034,-52.1857 101.4236,-60.6444 108.0909,-62.7767"/>
<text text-anchor="middle" x="110.9979" y="-102.8" font-family="Times,serif" font-size="14.00" fill="#000000">UDP</text>
</g>
<!-- lobs -->
<g id="node5" class="node">
<title>lobs</title>
<ellipse fill="#ffffff" stroke="#ffffff" cx="192" cy="-180" rx="29.6339" ry="18"/>
<text text-anchor="middle" x="192" y="-175.8" font-family="Times,serif" font-size="14.00" fill="#000000">OBS</text>
</g>
<!-- lobs&#45;&gt;twitch -->
<g id="edge4" class="edge">
<title>lobs&#45;&gt;twitch</title>
<path fill="none" stroke="#000000" d="M208.7833,-164.8744C220.0653,-154.7066 235.1385,-141.1221 247.8461,-129.6695"/>
<polygon fill="#000000" stroke="#000000" points="250.3124,-132.1585 255.3977,-122.8638 245.6261,-126.9586 250.3124,-132.1585"/>
</g>
<!-- lobs&#45;&gt;lvsf -->
<g id="edge3" class="edge">
<title>lobs&#45;&gt;lvsf</title>
<path fill="none" stroke="#000000" d="M180.9095,-162.9172C174.0943,-152.1868 165.337,-137.9693 158.2258,-125 146.7004,-103.9802 134.8947,-79.5101 126.383,-61.2421"/>
<polygon fill="#000000" stroke="#000000" points="129.5067,-59.658 122.1336,-52.0495 123.1527,-62.5952 129.5067,-59.658"/>
<text text-anchor="middle" x="193.3871" y="-102.8" font-family="Times,serif" font-size="14.00" fill="#000000">XComposite</text>
</g>
</g>
</svg>

After

Width:  |  Height:  |  Size: 4.1 KiB

View File

@ -0,0 +1,75 @@
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN"
"http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<!-- Generated by graphviz version 2.40.1 (20161225.0304)
-->
<!-- Title: G Pages: 1 -->
<svg width="288pt" height="340pt"
viewBox="0.00 0.00 288.17 340.43" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink">
<g id="graph0" class="graph" transform="scale(1 1) rotate(0) translate(4 336.4313)">
<title>G</title>
<polygon fill="#ffffff" stroke="transparent" points="-4,4 -4,-336.4313 284.174,-336.4313 284.174,4 -4,4"/>
<g id="clust1" class="cluster">
<title>cluster_0</title>
<polygon fill="#d3d3d3" stroke="#d3d3d3" points="8,-8 8,-253.6313 202,-253.6313 202,-8 8,-8"/>
<text text-anchor="middle" x="105" y="-237.0313" font-family="Times,serif" font-size="14.00" fill="#000000">windows</text>
</g>
<!-- webcam -->
<g id="node1" class="node">
<title>webcam</title>
<ellipse fill="none" stroke="#000000" cx="66" cy="-314.4313" rx="43.4183" ry="18"/>
<text text-anchor="middle" x="66" y="-310.2313" font-family="Times,serif" font-size="14.00" fill="#000000">webcam</text>
</g>
<!-- wvsf -->
<g id="node3" class="node">
<title>wvsf</title>
<ellipse fill="#ffffff" stroke="#ffffff" cx="66" cy="-202.8313" rx="50.3567" ry="18"/>
<text text-anchor="middle" x="66" y="-198.6313" font-family="Times,serif" font-size="14.00" fill="#000000">VSeeFace</text>
</g>
<!-- webcam&#45;&gt;wvsf -->
<g id="edge1" class="edge">
<title>webcam&#45;&gt;wvsf</title>
<path fill="none" stroke="#000000" d="M66,-296.0028C66,-278.3703 66,-251.5237 66,-231.1704"/>
<polygon fill="#000000" stroke="#000000" points="69.5001,-230.9168 66,-220.9168 62.5001,-230.9169 69.5001,-230.9168"/>
<text text-anchor="middle" x="79.6129" y="-265.8313" font-family="Times,serif" font-size="14.00" fill="#000000">USB</text>
</g>
<!-- twitch -->
<g id="node2" class="node">
<title>twitch</title>
<ellipse fill="none" stroke="#000000" cx="245" cy="-129.8313" rx="35.3489" ry="18"/>
<text text-anchor="middle" x="245" y="-125.6313" font-family="Times,serif" font-size="14.00" fill="#000000">twitch</text>
</g>
<!-- wcd -->
<g id="node4" class="node">
<title>wcd</title>
<ellipse fill="#ffffff" stroke="#ffffff" cx="104" cy="-45.4156" rx="46.4831" ry="29.3315"/>
<text text-anchor="middle" x="104" y="-49.6156" font-family="Times,serif" font-size="14.00" fill="#000000">Webcam</text>
<text text-anchor="middle" x="104" y="-32.8156" font-family="Times,serif" font-size="14.00" fill="#000000">Driver</text>
</g>
<!-- wvsf&#45;&gt;wcd -->
<g id="edge2" class="edge">
<title>wvsf&#45;&gt;wcd</title>
<path fill="none" stroke="#000000" d="M70.3591,-184.7737C76.2169,-160.5076 86.7858,-116.7257 94.5186,-84.6925"/>
<polygon fill="#000000" stroke="#000000" points="97.9756,-85.2867 96.92,-74.7446 91.1711,-83.644 97.9756,-85.2867"/>
</g>
<!-- wobs -->
<g id="node5" class="node">
<title>wobs</title>
<ellipse fill="#ffffff" stroke="#ffffff" cx="164" cy="-202.8313" rx="29.6339" ry="18"/>
<text text-anchor="middle" x="164" y="-198.6313" font-family="Times,serif" font-size="14.00" fill="#000000">OBS</text>
</g>
<!-- wobs&#45;&gt;twitch -->
<g id="edge4" class="edge">
<title>wobs&#45;&gt;twitch</title>
<path fill="none" stroke="#000000" d="M180.7833,-187.7056C192.0653,-177.5379 207.1385,-163.9534 219.8461,-152.5008"/>
<polygon fill="#000000" stroke="#000000" points="222.3124,-154.9898 227.3977,-145.6951 217.6261,-149.7899 222.3124,-154.9898"/>
</g>
<!-- wobs&#45;&gt;wcd -->
<g id="edge3" class="edge">
<title>wobs&#45;&gt;wcd</title>
<path fill="none" stroke="#000000" d="M157.2339,-185.0797C147.9295,-160.6688 130.9424,-116.1015 118.655,-83.8645"/>
<polygon fill="#000000" stroke="#000000" points="121.8385,-82.3893 115.0063,-74.2917 115.2975,-84.8825 121.8385,-82.3893"/>
<text text-anchor="middle" x="166.8745" y="-125.6313" font-family="Times,serif" font-size="14.00" fill="#000000">Webcam</text>
</g>
</g>
</svg>

After

Width:  |  Height:  |  Size: 4.1 KiB