forked from cadey/xesite
383 lines
14 KiB
Markdown
383 lines
14 KiB
Markdown
---
|
|
title: Fun with Redirection
|
|
date: 2021-09-22
|
|
author: Twi
|
|
tags:
|
|
- shell
|
|
- redirection
|
|
- osdev
|
|
---
|
|
|
|
When you're hacking in the shell or in a script, sometimes you want to change
|
|
how the output of a command is routed. Today I'm gonna cover common shell
|
|
redirection tips and tricks that I use every day at work and how it all works
|
|
under the hood.
|
|
|
|
Let's say you're trying to capture the output of a command to a file, such as
|
|
`uname -av`:
|
|
|
|
```console
|
|
$ uname -av
|
|
Linux shachi 5.13.15 #1-NixOS SMP Wed Sep 8 06:50:21 UTC 2021 x86_64 GNU/Linux
|
|
```
|
|
|
|
You could copy that to the clipboard and paste it into a file, but there is a
|
|
better way thanks to the `>` operator:
|
|
|
|
```console
|
|
$ uname -av > uname.txt
|
|
$ cat uname.txt
|
|
Linux shachi 5.13.15 #1-NixOS SMP Wed Sep 8 06:50:21 UTC 2021 x86_64 GNU/Linux
|
|
```
|
|
|
|
Let's say you want to run this on a few machines and put all of the output into
|
|
`uname.txt`. You could write a shell script loop like this:
|
|
|
|
```sh
|
|
# make sure the file doesn't already exist
|
|
rm -f uname.txt
|
|
|
|
for host in shachi chrysalis kos-mos ontos pneuma
|
|
do
|
|
ssh $host -- uname -av >> uname.txt
|
|
done
|
|
```
|
|
|
|
Then `uname.txt` should look like this:
|
|
|
|
```
|
|
Linux shachi 5.13.15 #1-NixOS SMP Wed Sep 8 06:50:21 UTC 2021 x86_64 GNU/Linux
|
|
Linux chrysalis 5.10.63 #1-NixOS SMP Wed Sep 8 06:49:02 UTC 2021 x86_64 GNU/Linux
|
|
Linux kos-mos 5.10.45 #1-NixOS SMP Fri Jun 18 08:00:06 UTC 2021 x86_64 GNU/Linux
|
|
Linux ontos 5.10.52 #1-NixOS SMP Tue Jul 20 14:05:59 UTC 2021 x86_64 GNU/Linux
|
|
Linux pneuma 5.10.57 #1-NixOS SMP Sun Aug 8 07:05:24 UTC 2021 x86_64 GNU/Linux
|
|
```
|
|
|
|
Now let's say you want to extract all of the hostnames from that `uname.txt`.
|
|
The pattern of the file seems to specify that fields are separated by spaces and
|
|
the hostname seems to be the second space-separated field in each line. You can
|
|
use the `cut` command to select that small subset from each line, and you can
|
|
feed the `cut` command's standard input using the `<` operator:
|
|
|
|
```console
|
|
$ cut -d ' ' -f 2 < uname.txt
|
|
shachi
|
|
chrysalis
|
|
kos-mos
|
|
ontos
|
|
pneuma
|
|
```
|
|
|
|
[It's worth noting that a lot of these core CLI utilities are built on the idea
|
|
that they are _filters_, or things that take one infinite stream of text in on
|
|
one end and then return another stream of text out the other
|
|
end. This is done through a channel called "standard input/output", where
|
|
standard input refers to input to the command and standard output refers to the
|
|
output of the command.](conversation://Mara/hacker)
|
|
|
|
[That's a great metaphor, let's build onto it using the `|` (pipe)
|
|
operator. The pipe operator lets you pipe the standard output of one command to
|
|
the standard input of another.](conversation://Cadey/enby)
|
|
|
|
[You mentioned that you can pass files as input and output for commands, does
|
|
this mean that standard input and standard output are
|
|
files?](conversation://Mara/happy)
|
|
|
|
[Precisely! They are just files that are automatically open for every process.
|
|
Usually commands will output to standard out and some will also accept input via
|
|
standard in.](conversation://Cadey/enby)
|
|
|
|
[Doesn't that have some level of overhead though? Isn't it expensive to spin up
|
|
a whole heckin' `cat` process for that?](conversation://Mara/hmm)
|
|
|
|
[Not on any decent system made in the last 20 years. This may have some impact
|
|
on Windows (because they have core architectural mistakes that make processes
|
|
take up to 100 milliseconds to spin up), but this is about Unix/Linux. I think
|
|
these should work on Windows too if you use Cygwin, but if you're using WSL you
|
|
shouldn't have any real issues there](conversation://Cadey/coffee)
|
|
|
|
Let's say we want to rewrite that `cut` command above to use pipes. You could
|
|
write it like this:
|
|
|
|
```sh
|
|
cat uname.txt | cut -d ' ' -f 2
|
|
```
|
|
|
|
[The mnemonic we use for remembering the `cut` command is that fields are
|
|
separated by the `d`elimiter and you cut out the nth
|
|
`f`ield/s.](conversation://Mara/hacker)
|
|
|
|
This will get you the exact same output:
|
|
|
|
```console
|
|
$ cat uname.txt | cut -d ' ' -f 2
|
|
shachi
|
|
chrysalis
|
|
kos-mos
|
|
ontos
|
|
pneuma
|
|
```
|
|
|
|
Personally I prefer writing shell pipelines like that as it makes it a bit
|
|
easier to tack on more specific selectors or operations as you go along. For
|
|
example, if you wanted to sort them you could pipe the result to `sort`:
|
|
|
|
```console
|
|
$ cat uname.txt | cut -d ' ' -f 2 | sort
|
|
chrysalis
|
|
kos-mos
|
|
ontos
|
|
pneuma
|
|
shachi
|
|
```
|
|
|
|
This lets you gradually build up a shell pipeline as you drill down to the data
|
|
you want in the format you want.
|
|
|
|
[I wanted to save this compiler error to a file but it didn't work. I tried
|
|
doing this:](conversation://Mara/hmm)
|
|
|
|
```console
|
|
$ rustc foo.rs > foo.log
|
|
```
|
|
|
|
But the output printed to the screen instead of the file:
|
|
|
|
```console
|
|
$ rustc foo.rs > foo.log
|
|
error: expected one of `!` or `::`, found `main`
|
|
--> foo.rs:1:5
|
|
|
|
|
1 | fun main() {}
|
|
| ^^^^ expected one of `!` or `::`
|
|
|
|
error: aborting due to previous error
|
|
|
|
$ cat foo.log
|
|
$
|
|
```
|
|
|
|
This happens because there are actually _two_ output streams per program. There
|
|
is the standard out stream and there is also a standard error stream. The reason
|
|
that standard error exists is so that you can see if any errors have happened if
|
|
you redirect standard out.
|
|
|
|
Sometimes standard out may not be a stream of text, say you have a compressed
|
|
file you want to analyze and there's an issue with the decompression. If the
|
|
decompressor wrote its errors to the standard output stream, it could confuse or
|
|
corrupt your analysis.
|
|
|
|
However, we can redirect standard error in particular by modifying how we
|
|
redirect to the file:
|
|
|
|
```console
|
|
$ rustc foo.rs 2> foo.log
|
|
$ cat foo.log
|
|
error: expected one of `!` or `::`, found `main`
|
|
--> foo.rs:1:5
|
|
|
|
|
1 | fun main() {}
|
|
| ^^^^ expected one of `!` or `::`
|
|
|
|
error: aborting due to previous error
|
|
```
|
|
|
|
[Where did the `2` come from?](conversation://Mara/wat)
|
|
|
|
So I mentioned earlier that redirection modifies the standard input and output
|
|
of programs. This is not entirely true, but it was a convenient half-truth to
|
|
help build this part of the explanation.
|
|
|
|
For every process on a Unix-like system (such as Linux and macOS), the kernel
|
|
stores a list of active file-like objects. This includes real files on the
|
|
filesystem, pipes between processes, network sockets, and more. When a program
|
|
reads or writes a file, they tell the kernel which file they want to use by
|
|
giving it a number index into that list, starting at zero. Standard in/out/error
|
|
are just the conventional names for the first three open files in the list, like
|
|
this:
|
|
|
|
| File Descriptor | Purpose |
|
|
| :------ | :------- |
|
|
| 0 | Standard input |
|
|
| 1 | Standard output |
|
|
| 2 | Standard error |
|
|
|
|
Shell redirection simply changes which files are in that list of open files when
|
|
the program starts running.
|
|
|
|
That is why you use a `2` there, because you are telling the shell to change
|
|
file descriptor number `2` of the `rustc` process to point to the filesystem
|
|
file `foo.log`, which in turn makes the standard error of `rustc` get written to
|
|
that file for you.
|
|
|
|
In turn, this also means that `cat foo.txt > foo2.txt` is actually a shortcut
|
|
for saying `cat foo.txt 1> foo2.txt`, but the `1` can be omitted there because
|
|
standard out is usually the "default" output that most of these kind of
|
|
pipelines cares about.
|
|
|
|
[How would I get both standard output and standard error in the same
|
|
file?](conversation://Mara/hmm)
|
|
|
|
The cool part about the `>` operator is that it doesn't just stop with output to
|
|
files on the desk, you can actually have one file descriptor get pointed to
|
|
another. Let's say you have a need for both standard out and standard error to
|
|
go to the same file. You can do this with a command like this:
|
|
|
|
```
|
|
$ rustc foo.rs > foo.log 2>&1
|
|
```
|
|
|
|
This tells the shell to point standard out to `foo.log`, and then standard
|
|
error to standard out (which is now `foo.log`). There's a footgun here though;
|
|
the order of the redirects matters. Consider the following:
|
|
|
|
```
|
|
$ rustc foo.rs 2>&1 > foo.log
|
|
error: expected one of `!` or `::`, found `main`
|
|
--> foo.rs:1:5
|
|
|
|
|
1 | fun main() {}
|
|
| ^^^^ expected one of `!` or `::`
|
|
|
|
error: aborting due to previous error
|
|
$ cat foo.log
|
|
$ # foo.log is empty, why???
|
|
```
|
|
|
|
We wanted to redirect stderr to `foo.log`, but that didn't happen. Why? Well,
|
|
the shell considers our redirects one at a time from left to right. When the
|
|
shell sees `2>&1`, it hasn't considered `> foo.log` yet, so standard out (`1`)
|
|
is still our terminal. It dutifully redirects stderr to the terminal, which is
|
|
where it was already going anyway. Then it sees `1 > foo.log`, so it redirects
|
|
standard out to `foo.log`. That's the end of it though. It doesn't
|
|
retroactively redirect standard error to match the new standard out, so our
|
|
errors get dumped to our terminal instead of the file.
|
|
|
|
Confusing right? Lucky for us, there's a short form that redirects both at the
|
|
same time, making this mistake impossible:
|
|
|
|
```
|
|
$ rustc foo.rs &> foo.log
|
|
```
|
|
|
|
This will put standard out and standard error to `foo.log` the same way that
|
|
`> foo.log 2>&1` will.
|
|
|
|
[Will that work in every shell?](conversation://Mara/hmm)
|
|
|
|
[It's a bourne shell (`bash`) extension, but I've tested it in `zsh` and `fish`.
|
|
You can also do `&|` to pipe both standard out and standard error at the same
|
|
time in the same way you'd do `2>&1 | whatever`.](conversation://Cadey/enby)
|
|
|
|
You can also use this with `>>`:
|
|
|
|
```
|
|
$ rustc foo.rs &>> foo.log
|
|
$ cat foo.log
|
|
error: expected one of `!` or `::`, found `main`
|
|
--> foo.rs:1:5
|
|
|
|
|
1 | fun main() {}
|
|
| ^^^^ expected one of `!` or `::
|
|
|
|
error: aborting due to previous error
|
|
|
|
error: expected one of `!` or `::`, found `main`
|
|
--> foo.rs:1:5
|
|
|
|
|
1 | fun main() {}
|
|
| ^^^^ expected one of `!` or `::`
|
|
|
|
error: aborting due to previous error
|
|
```
|
|
|
|
[How do I redirect standard in to a file?](conversation://Mara/hmm)
|
|
|
|
Well, you don't. Standard in is an input, so you can change where it comes
|
|
_from_, not where it goes.
|
|
|
|
But, maybe you want to make a copy of a program's input and send it somewhere
|
|
else. There is a way to do _that_ using a command called `tee`. `tee` copies
|
|
its standard input to standard output, but it also writes a second copy to a
|
|
file. For example:
|
|
|
|
```console
|
|
$ dmesg | tee dmesg.txt | grep 'msedge'
|
|
[ 70.585463] traps: msedge[4715] trap invalid opcode ip:5630ddcedc4c sp:7ffd41f67700 error:0 in msedge[5630d8fc2000+952d000]
|
|
[ 70.702544] traps: msedge[4745] trap invalid opcode ip:5630ddcedc4c sp:7ffd41f67700 error:0 in msedge[5630d8fc2000+952d000]
|
|
[ 70.806296] traps: msedge[4781] trap invalid opcode ip:5630ddcedc4c sp:7ffd41f67700 error:0 in msedge[5630d8fc2000+952d000]
|
|
[ 70.918095] traps: msedge[4889] trap invalid opcode ip:5630ddcedc4c sp:7ffd41f67700 error:0 in msedge[5630d8fc2000+952d000]
|
|
[ 71.031938] traps: msedge[4926] trap invalid opcode ip:5630ddcedc4c sp:7ffd41f67700 error:0 in msedge[5630d8fc2000+952d000]
|
|
[ 71.138974] traps: msedge[4935] trap invalid opcode ip:5630ddcedc4c sp:7ffd41f67700 error:0 in msedge[5630d8fc2000+952d000]
|
|
[ 1169.163603] traps: msedge[35719] trap invalid opcode ip:556a93951c4c sp:7ffc533f35c0 error:0 in msedge[556a8ec26000+952d000]
|
|
[ 1213.301722] traps: msedge[36054] trap invalid opcode ip:55a245960c4c sp:7ffe6d169b40 error:0 in msedge[55a240c35000+952d000]
|
|
[10963.234459] traps: msedge[104732] trap invalid opcode ip:55fdb864fc4c sp:7ffc996dfee0 error:0 in msedge[55fdb3924000+952d000]
|
|
```
|
|
|
|
This would put the output of the `dmesg` command (read from kernel logs) into
|
|
`dmesg.txt`, as well as sending it into the grep command. You might want to do
|
|
this when debugging long command pipelines to see exactly what is going into a
|
|
program that isn't doing what you expect.
|
|
|
|
Redirections also work in scripts too. You can also set "default" redirects for
|
|
every command in a script using the `exec` command:
|
|
|
|
```sh
|
|
exec > out.log 2> error.log
|
|
|
|
ls
|
|
rustc foo.rs
|
|
```
|
|
|
|
This will have the file listing from `ls` written to `out.log` and any errors
|
|
from `rustc` written to `error.log`.
|
|
|
|
A lot of other shell tricks and fun is built on top of these fundamentals. For
|
|
example you can take a folder, zip it up and then unzip it over on another
|
|
machine using a command like this:
|
|
|
|
```
|
|
$ tar cz ./blog | ssh pneuma tar xz -C ~/code/christine.website/blog
|
|
```
|
|
|
|
This will run `tar` to create a compressed copy of the `./blog` folder and then
|
|
pipe that to tar on another computer to extract that into
|
|
`~/code/christine.website/blog`. It's just pipes and redirection all the way
|
|
down! Deep inside `ssh` it's really just piping output of commands back and
|
|
forth over an encrypted network socket. Connecting to an IRC server is just
|
|
piping in and out data to the chat server, even more so if you use TLS to
|
|
connect there. In a way you can model just about everything in Unix with pipes
|
|
and file descriptors because that is the cornerstone of its design: Everything
|
|
is a file.
|
|
|
|
[This doesn't mean it's literally a file on the disk, it means you can _interact
|
|
with_ just about everything using the same system interface as you do with
|
|
files. Even things like hard disks and video cards.](conversation://Mara/hacker)
|
|
|
|
Here's a fun thing to do. Using [`curl`](https://curl.se/) to read the contents
|
|
of a URL and [`jq`](https://stedolan.github.io/jq/) to select out bits from a
|
|
JSON stream, you can make a script that lets you read the most recent title from
|
|
my blog's [JSONFeed](/blog.json):
|
|
|
|
```sh
|
|
#!/usr/bin/env bash
|
|
# xeblog-post.sh
|
|
|
|
curl -s https://xeiaso.net/blog.json | jq -r '.items[0] | "\(.title) \(.url)"'
|
|
```
|
|
|
|
At the time of writing this post, here is the output I get from this command:
|
|
|
|
```
|
|
$ ./xeblog-post.sh
|
|
Anbernic RG280M Review https://xeiaso.net/blog/rg280m-review
|
|
```
|
|
|
|
What else could you do with pipes and redirection? The cloud's the limit!
|
|
|
|
---
|
|
|
|
Thanks to violet spark, cadence, and AstroSnail for looking over this post and
|
|
fact-checking as well as helping mend some of the brain dump and awkward
|
|
wording into more polished sentences.
|