xesite/blog/fun-with-redirection-2021-0...

383 lines
14 KiB
Markdown

---
title: Fun with Redirection
date: 2021-09-22
author: Twi
tags:
- shell
- redirection
- osdev
---
When you're hacking in the shell or in a script, sometimes you want to change
how the output of a command is routed. Today I'm gonna cover common shell
redirection tips and tricks that I use every day at work and how it all works
under the hood.
Let's say you're trying to capture the output of a command to a file, such as
`uname -av`:
```console
$ uname -av
Linux shachi 5.13.15 #1-NixOS SMP Wed Sep 8 06:50:21 UTC 2021 x86_64 GNU/Linux
```
You could copy that to the clipboard and paste it into a file, but there is a
better way thanks to the `>` operator:
```console
$ uname -av > uname.txt
$ cat uname.txt
Linux shachi 5.13.15 #1-NixOS SMP Wed Sep 8 06:50:21 UTC 2021 x86_64 GNU/Linux
```
Let's say you want to run this on a few machines and put all of the output into
`uname.txt`. You could write a shell script loop like this:
```sh
# make sure the file doesn't already exist
rm -f uname.txt
for host in shachi chrysalis kos-mos ontos pneuma
do
ssh $host -- uname -av >> uname.txt
done
```
Then `uname.txt` should look like this:
```
Linux shachi 5.13.15 #1-NixOS SMP Wed Sep 8 06:50:21 UTC 2021 x86_64 GNU/Linux
Linux chrysalis 5.10.63 #1-NixOS SMP Wed Sep 8 06:49:02 UTC 2021 x86_64 GNU/Linux
Linux kos-mos 5.10.45 #1-NixOS SMP Fri Jun 18 08:00:06 UTC 2021 x86_64 GNU/Linux
Linux ontos 5.10.52 #1-NixOS SMP Tue Jul 20 14:05:59 UTC 2021 x86_64 GNU/Linux
Linux pneuma 5.10.57 #1-NixOS SMP Sun Aug 8 07:05:24 UTC 2021 x86_64 GNU/Linux
```
Now let's say you want to extract all of the hostnames from that `uname.txt`.
The pattern of the file seems to specify that fields are separated by spaces and
the hostname seems to be the second space-separated field in each line. You can
use the `cut` command to select that small subset from each line, and you can
feed the `cut` command's standard input using the `<` operator:
```console
$ cut -d ' ' -f 2 < uname.txt
shachi
chrysalis
kos-mos
ontos
pneuma
```
[It's worth noting that a lot of these core CLI utilities are built on the idea
that they are _filters_, or things that take one infinite stream of text in on
one end and then return another stream of text out the other
end. This is done through a channel called "standard input/output", where
standard input refers to input to the command and standard output refers to the
output of the command.](conversation://Mara/hacker)
[That's a great metaphor, let's build onto it using the `|` (pipe)
operator. The pipe operator lets you pipe the standard output of one command to
the standard input of another.](conversation://Cadey/enby)
[You mentioned that you can pass files as input and output for commands, does
this mean that standard input and standard output are
files?](conversation://Mara/happy)
[Precisely! They are just files that are automatically open for every process.
Usually commands will output to standard out and some will also accept input via
standard in.](conversation://Cadey/enby)
[Doesn't that have some level of overhead though? Isn't it expensive to spin up
a whole heckin' `cat` process for that?](conversation://Mara/hmm)
[Not on any decent system made in the last 20 years. This may have some impact
on Windows (because they have core architectural mistakes that make processes
take up to 100 milliseconds to spin up), but this is about Unix/Linux. I think
these should work on Windows too if you use Cygwin, but if you're using WSL you
shouldn't have any real issues there](conversation://Cadey/coffee)
Let's say we want to rewrite that `cut` command above to use pipes. You could
write it like this:
```sh
cat uname.txt | cut -d ' ' -f 2
```
[The mnemonic we use for remembering the `cut` command is that fields are
separated by the `d`elimiter and you cut out the nth
`f`ield/s.](conversation://Mara/hacker)
This will get you the exact same output:
```console
$ cat uname.txt | cut -d ' ' -f 2
shachi
chrysalis
kos-mos
ontos
pneuma
```
Personally I prefer writing shell pipelines like that as it makes it a bit
easier to tack on more specific selectors or operations as you go along. For
example, if you wanted to sort them you could pipe the result to `sort`:
```console
$ cat uname.txt | cut -d ' ' -f 2 | sort
chrysalis
kos-mos
ontos
pneuma
shachi
```
This lets you gradually build up a shell pipeline as you drill down to the data
you want in the format you want.
[I wanted to save this compiler error to a file but it didn't work. I tried
doing this:](conversation://Mara/hmm)
```console
$ rustc foo.rs > foo.log
```
But the output printed to the screen instead of the file:
```console
$ rustc foo.rs > foo.log
error: expected one of `!` or `::`, found `main`
--> foo.rs:1:5
|
1 | fun main() {}
| ^^^^ expected one of `!` or `::`
error: aborting due to previous error
$ cat foo.log
$
```
This happens because there are actually _two_ output streams per program. There
is the standard out stream and there is also a standard error stream. The reason
that standard error exists is so that you can see if any errors have happened if
you redirect standard out.
Sometimes standard out may not be a stream of text, say you have a compressed
file you want to analyze and there's an issue with the decompression. If the
decompressor wrote its errors to the standard output stream, it could confuse or
corrupt your analysis.
However, we can redirect standard error in particular by modifying how we
redirect to the file:
```console
$ rustc foo.rs 2> foo.log
$ cat foo.log
error: expected one of `!` or `::`, found `main`
--> foo.rs:1:5
|
1 | fun main() {}
| ^^^^ expected one of `!` or `::`
error: aborting due to previous error
```
[Where did the `2` come from?](conversation://Mara/wat)
So I mentioned earlier that redirection modifies the standard input and output
of programs. This is not entirely true, but it was a convenient half-truth to
help build this part of the explanation.
For every process on a Unix-like system (such as Linux and macOS), the kernel
stores a list of active file-like objects. This includes real files on the
filesystem, pipes between processes, network sockets, and more. When a program
reads or writes a file, they tell the kernel which file they want to use by
giving it a number index into that list, starting at zero. Standard in/out/error
are just the conventional names for the first three open files in the list, like
this:
| File Descriptor | Purpose |
| :------ | :------- |
| 0 | Standard input |
| 1 | Standard output |
| 2 | Standard error |
Shell redirection simply changes which files are in that list of open files when
the program starts running.
That is why you use a `2` there, because you are telling the shell to change
file descriptor number `2` of the `rustc` process to point to the filesystem
file `foo.log`, which in turn makes the standard error of `rustc` get written to
that file for you.
In turn, this also means that `cat foo.txt > foo2.txt` is actually a shortcut
for saying `cat foo.txt 1> foo2.txt`, but the `1` can be omitted there because
standard out is usually the "default" output that most of these kind of
pipelines cares about.
[How would I get both standard output and standard error in the same
file?](conversation://Mara/hmm)
The cool part about the `>` operator is that it doesn't just stop with output to
files on the desk, you can actually have one file descriptor get pointed to
another. Let's say you have a need for both standard out and standard error to
go to the same file. You can do this with a command like this:
```
$ rustc foo.rs > foo.log 2>&1
```
This tells the shell to point standard out to `foo.log`, and then standard
error to standard out (which is now `foo.log`). There's a footgun here though;
the order of the redirects matters. Consider the following:
```
$ rustc foo.rs 2>&1 > foo.log
error: expected one of `!` or `::`, found `main`
--> foo.rs:1:5
|
1 | fun main() {}
| ^^^^ expected one of `!` or `::`
error: aborting due to previous error
$ cat foo.log
$ # foo.log is empty, why???
```
We wanted to redirect stderr to `foo.log`, but that didn't happen. Why? Well,
the shell considers our redirects one at a time from left to right. When the
shell sees `2>&1`, it hasn't considered `> foo.log` yet, so standard out (`1`)
is still our terminal. It dutifully redirects stderr to the terminal, which is
where it was already going anyway. Then it sees `1 > foo.log`, so it redirects
standard out to `foo.log`. That's the end of it though. It doesn't
retroactively redirect standard error to match the new standard out, so our
errors get dumped to our terminal instead of the file.
Confusing right? Lucky for us, there's a short form that redirects both at the
same time, making this mistake impossible:
```
$ rustc foo.rs &> foo.log
```
This will put standard out and standard error to `foo.log` the same way that
`> foo.log 2>&1` will.
[Will that work in every shell?](conversation://Mara/hmm)
[It's a bourne shell (`bash`) extension, but I've tested it in `zsh` and `fish`.
You can also do `&|` to pipe both standard out and standard error at the same
time in the same way you'd do `2>&1 | whatever`.](conversation://Cadey/enby)
You can also use this with `>>`:
```
$ rustc foo.rs &>> foo.log
$ cat foo.log
error: expected one of `!` or `::`, found `main`
--> foo.rs:1:5
|
1 | fun main() {}
| ^^^^ expected one of `!` or `::
error: aborting due to previous error
error: expected one of `!` or `::`, found `main`
--> foo.rs:1:5
|
1 | fun main() {}
| ^^^^ expected one of `!` or `::`
error: aborting due to previous error
```
[How do I redirect standard in to a file?](conversation://Mara/hmm)
Well, you don't. Standard in is an input, so you can change where it comes
_from_, not where it goes.
But, maybe you want to make a copy of a program's input and send it somewhere
else. There is a way to do _that_ using a command called `tee`. `tee` copies
its standard input to standard output, but it also writes a second copy to a
file. For example:
```console
$ dmesg | tee dmesg.txt | grep 'msedge'
[ 70.585463] traps: msedge[4715] trap invalid opcode ip:5630ddcedc4c sp:7ffd41f67700 error:0 in msedge[5630d8fc2000+952d000]
[ 70.702544] traps: msedge[4745] trap invalid opcode ip:5630ddcedc4c sp:7ffd41f67700 error:0 in msedge[5630d8fc2000+952d000]
[ 70.806296] traps: msedge[4781] trap invalid opcode ip:5630ddcedc4c sp:7ffd41f67700 error:0 in msedge[5630d8fc2000+952d000]
[ 70.918095] traps: msedge[4889] trap invalid opcode ip:5630ddcedc4c sp:7ffd41f67700 error:0 in msedge[5630d8fc2000+952d000]
[ 71.031938] traps: msedge[4926] trap invalid opcode ip:5630ddcedc4c sp:7ffd41f67700 error:0 in msedge[5630d8fc2000+952d000]
[ 71.138974] traps: msedge[4935] trap invalid opcode ip:5630ddcedc4c sp:7ffd41f67700 error:0 in msedge[5630d8fc2000+952d000]
[ 1169.163603] traps: msedge[35719] trap invalid opcode ip:556a93951c4c sp:7ffc533f35c0 error:0 in msedge[556a8ec26000+952d000]
[ 1213.301722] traps: msedge[36054] trap invalid opcode ip:55a245960c4c sp:7ffe6d169b40 error:0 in msedge[55a240c35000+952d000]
[10963.234459] traps: msedge[104732] trap invalid opcode ip:55fdb864fc4c sp:7ffc996dfee0 error:0 in msedge[55fdb3924000+952d000]
```
This would put the output of the `dmesg` command (read from kernel logs) into
`dmesg.txt`, as well as sending it into the grep command. You might want to do
this when debugging long command pipelines to see exactly what is going into a
program that isn't doing what you expect.
Redirections also work in scripts too. You can also set "default" redirects for
every command in a script using the `exec` command:
```sh
exec > out.log 2> error.log
ls
rustc foo.rs
```
This will have the file listing from `ls` written to `out.log` and any errors
from `rustc` written to `error.log`.
A lot of other shell tricks and fun is built on top of these fundamentals. For
example you can take a folder, zip it up and then unzip it over on another
machine using a command like this:
```
$ tar cz ./blog | ssh pneuma tar xz -C ~/code/christine.website/blog
```
This will run `tar` to create a compressed copy of the `./blog` folder and then
pipe that to tar on another computer to extract that into
`~/code/christine.website/blog`. It's just pipes and redirection all the way
down! Deep inside `ssh` it's really just piping output of commands back and
forth over an encrypted network socket. Connecting to an IRC server is just
piping in and out data to the chat server, even more so if you use TLS to
connect there. In a way you can model just about everything in Unix with pipes
and file descriptors because that is the cornerstone of its design: Everything
is a file.
[This doesn't mean it's literally a file on the disk, it means you can _interact
with_ just about everything using the same system interface as you do with
files. Even things like hard disks and video cards.](conversation://Mara/hacker)
Here's a fun thing to do. Using [`curl`](https://curl.se/) to read the contents
of a URL and [`jq`](https://stedolan.github.io/jq/) to select out bits from a
JSON stream, you can make a script that lets you read the most recent title from
my blog's [JSONFeed](/blog.json):
```sh
#!/usr/bin/env bash
# xeblog-post.sh
curl -s https://xeiaso.net/blog.json | jq -r '.items[0] | "\(.title) \(.url)"'
```
At the time of writing this post, here is the output I get from this command:
```
$ ./xeblog-post.sh
Anbernic RG280M Review https://xeiaso.net/blog/rg280m-review
```
What else could you do with pipes and redirection? The cloud's the limit!
---
Thanks to violet spark, cadence, and AstroSnail for looking over this post and
fact-checking as well as helping mend some of the brain dump and awkward
wording into more polished sentences.