Fun with redirection
Signed-off-by: Christine Dodrill <me@christine.website>
This commit is contained in:
parent
0c6eb0474c
commit
dc07e6d938
|
@ -0,0 +1,350 @@
|
|||
---
|
||||
title: Fun with Redirection
|
||||
date: 2021-09-22
|
||||
author: Twi
|
||||
tags:
|
||||
- shell
|
||||
- redirection
|
||||
- osdev
|
||||
---
|
||||
|
||||
When you're hacking in the shell or in a script, sometimes you want to change
|
||||
how the output of a command is routed. Today I'm gonna cover common shell
|
||||
redirection tips and tricks that I use every day at work and how it all works
|
||||
under the hood.
|
||||
|
||||
Let's say you're trying to capture the output of a command to a file, such as
|
||||
`uname -av`:
|
||||
|
||||
```console
|
||||
$ uname -av
|
||||
Linux shachi 5.13.15 #1-NixOS SMP Wed Sep 8 06:50:21 UTC 2021 x86_64 GNU/Linux
|
||||
```
|
||||
|
||||
You could copy that to the clipboard and paste it into a file, but there is a
|
||||
better way thanks to the `>` operator:
|
||||
|
||||
```console
|
||||
$ uname -av > uname.txt
|
||||
$ cat uname.txt
|
||||
Linux shachi 5.13.15 #1-NixOS SMP Wed Sep 8 06:50:21 UTC 2021 x86_64 GNU/Linux
|
||||
```
|
||||
|
||||
Let's say you want to run this on a few machines and put all of the output into
|
||||
`uname.txt`. You could write a shell script loop like this:
|
||||
|
||||
```sh
|
||||
# make sure the file doesn't already exist
|
||||
rm -f uname.txt
|
||||
|
||||
for host in shachi chrysalis kos-mos ontos pneuma
|
||||
do
|
||||
ssh $host -- uname -av >> uname.txt
|
||||
done
|
||||
```
|
||||
|
||||
Then `uname.txt` should look like this:
|
||||
|
||||
```
|
||||
Linux shachi 5.13.15 #1-NixOS SMP Wed Sep 8 06:50:21 UTC 2021 x86_64 GNU/Linux
|
||||
Linux chrysalis 5.10.63 #1-NixOS SMP Wed Sep 8 06:49:02 UTC 2021 x86_64 GNU/Linux
|
||||
Linux kos-mos 5.10.45 #1-NixOS SMP Fri Jun 18 08:00:06 UTC 2021 x86_64 GNU/Linux
|
||||
Linux ontos 5.10.52 #1-NixOS SMP Tue Jul 20 14:05:59 UTC 2021 x86_64 GNU/Linux
|
||||
Linux pneuma 5.10.57 #1-NixOS SMP Sun Aug 8 07:05:24 UTC 2021 x86_64 GNU/Linux
|
||||
```
|
||||
|
||||
Now let's say you want to extract all of the hostnames from that `uname.txt`.
|
||||
The pattern of the file seems to specify that fields are separated by spaces and
|
||||
the hostname seems to be the second space-separated field in each line. You can
|
||||
use the `cut` command to select that small subset from each line, and you can
|
||||
feed the `cut` command's standard input using the `<` operator:
|
||||
|
||||
```console
|
||||
$ cut -d' ' -f2 < uname.txt
|
||||
shachi
|
||||
chrysalis
|
||||
kos-mos
|
||||
ontos
|
||||
pneuma
|
||||
```
|
||||
|
||||
[It's worth noting that a lot of these core CLI utilities are built on the idea
|
||||
that they are _filters_, or things that take one infinite stream of text in on
|
||||
one end and then return another stream of text out the other
|
||||
end. This is done through a channel called "standard input/output", where
|
||||
standard input refers to input to the command and standard output refers to the
|
||||
output of the command.](conversation://Mara/hacker)
|
||||
|
||||
[That's a great metaphor, let's build onto it using the `|` (pipe)
|
||||
operator. The pipe operator lets you pipe the standard output of one command to
|
||||
the standard input of another.](conversation://Cadey/enby)
|
||||
|
||||
[You mentioned that you can pass files as input and output for commands, does
|
||||
this mean that standard input and standard output are
|
||||
files?](conversation://Mara/happy)
|
||||
|
||||
[Precisely! They are just files that are automatically open for every process.
|
||||
Usually commands will output to standard out and some will also accept input via
|
||||
standard in.](conversation://Cadey/enby)
|
||||
|
||||
[Doesn't that have some level of overhead though? Isn't it expensive to spin up
|
||||
a whole heckin' `cat` process for that?](conversation://Mara/hmm)
|
||||
|
||||
[Not on any decent system made in the last 20 years. This may have some impact
|
||||
on Windows (because they have core architectural mistakes that make processes
|
||||
take up to 100 milliseconds to spin up), but this is about Unix/Linux. I think
|
||||
these should work on Windows too if you use Cygwin, but if you're using WSL you
|
||||
shouldn't have any real issues there](conversation://Cadey/coffee)
|
||||
|
||||
Let's say we want to rewrite that `cut` command above to use pipes. You could
|
||||
write it like this:
|
||||
|
||||
```sh
|
||||
cat uname.txt | cut -d' ' -f2
|
||||
```
|
||||
|
||||
[The mnemonic we use for remembering the `cut` command is that fields are
|
||||
separated by the `d`elimiter and you cut out the nth
|
||||
`f`ield/s. You can use ](conversation://Mara/hacker)
|
||||
|
||||
This will get you the exact same output:
|
||||
|
||||
```console
|
||||
$ cat uname.txt | cut -d' ' -f2
|
||||
shachi
|
||||
chrysalis
|
||||
kos-mos
|
||||
ontos
|
||||
pneuma
|
||||
```
|
||||
|
||||
Personally I prefer writing shell pipelines like that as it makes it a bit
|
||||
easier to tack on more specific selectors or operations as you go along. For
|
||||
example, if you wanted to sort them you could pipe the result to `sort`:
|
||||
|
||||
```console
|
||||
$ cat uname.txt | cut -d' ' -f2 | sort
|
||||
chrysalis
|
||||
kos-mos
|
||||
ontos
|
||||
pneuma
|
||||
shachi
|
||||
```
|
||||
|
||||
This lets you gradually build up a shell pipeline as you drill down to the data
|
||||
you want in the format you want.
|
||||
|
||||
[I wanted to save this compiler error to a file but it didn't work. I tried
|
||||
doing this:](conversation://Mara/hmm)
|
||||
|
||||
```console
|
||||
$ rustc foo.rs > foo.log
|
||||
```
|
||||
|
||||
But the output printed to the screen instead of the file:
|
||||
|
||||
```console
|
||||
$ rustc foo.rs > foo.log
|
||||
error: expected one of `!` or `::`, found `main`
|
||||
--> foo.rs:1:5
|
||||
|
|
||||
1 | fun main() {}
|
||||
| ^^^^ expected one of `!` or `::`
|
||||
|
||||
error: aborting due to previous error
|
||||
|
||||
$ cat foo.log
|
||||
$
|
||||
```
|
||||
|
||||
This happens because there are actually _two_ output streams per program. There
|
||||
is the standard out stream and there is also a standard error stream. The reason
|
||||
that standard error exists is so that you can see if any errors have happened if
|
||||
you redirect standard out.
|
||||
|
||||
Sometimes standard out may not be a stream of text, say you have a compressed
|
||||
file you want to analyze and there's an issue with the decompression. If the
|
||||
decompressor wrote its errors to the standard output stream, it could confuse or
|
||||
corrupt your analysis.
|
||||
|
||||
However, we can redirect standard error in particular by modifying how we
|
||||
redirect to the file:
|
||||
|
||||
```console
|
||||
$ rustc foo.rs 2> foo.log
|
||||
$ cat foo.log
|
||||
error: expected one of `!` or `::`, found `main`
|
||||
--> foo.rs:1:5
|
||||
|
|
||||
1 | fun main() {}
|
||||
| ^^^^ expected one of `!` or `::`
|
||||
|
||||
error: aborting due to previous error
|
||||
```
|
||||
|
||||
[Where did the `2` come from?](conversation://Mara/wat)
|
||||
|
||||
So I mentioned earlier that redirection modifies the standard input and output
|
||||
of programs. This is not entirely true, but it was a convenient half-truth to
|
||||
help build this part of the explanation.
|
||||
|
||||
For every process on a Unix-like system (such as Linux and macOS), the kernel
|
||||
stores a list of active file-like objects. This includes real files on the
|
||||
filesystem, pipes between processes, network sockets, and more. When a program
|
||||
reads or writes a file, they tell the kernel which file they want to use by
|
||||
giving it a number index into that list, starting at zero. Standard in/out/error
|
||||
are just the conventional names for the first three open files in the list, like
|
||||
this:
|
||||
|
||||
| File Descriptor | Purpose |
|
||||
| :------ | :------- |
|
||||
| 0 | Standard input |
|
||||
| 1 | Standard output |
|
||||
| 2 | Standard error |
|
||||
|
||||
Shell redirection simply changes which files are in that list of open files when
|
||||
the program starts running.
|
||||
|
||||
That is why you use a `2` there, because you are telling the shell to change
|
||||
file descriptor number `2` of the `rustc` process to point to the filesystem
|
||||
file `foo.log`, which in turn makes the standard error of `rustc` get written to
|
||||
that file for you.
|
||||
|
||||
In turn, this also means that `cat foo.txt > foo2.txt` is actually a shortcut
|
||||
for saying `cat foo.txt 1> foo2.txt`, but the `1` can be omitted there because
|
||||
standard out is usually the "default" output that most of these kind of
|
||||
pipelines cares about.
|
||||
|
||||
[How would I get both standard output and standard error in the same
|
||||
file?](conversation://Mara/hmm)
|
||||
|
||||
The cool part about the `>` operator is that it doesn't just stop with output to
|
||||
files on the desk, you can actually have one file descriptor get pointed to
|
||||
another. Let's say you have a need for both standard out and standard error to
|
||||
go to the same file. You can do this with a command like this:
|
||||
|
||||
```
|
||||
$ rustc foo.rs 2>&1 > foo.log
|
||||
```
|
||||
|
||||
This tells the shell to point standard error to standard out and then the
|
||||
combined output to `foo.log`. There's a short form of this too:
|
||||
|
||||
```
|
||||
$ rustc foo.rs &> foo.log
|
||||
```
|
||||
|
||||
[Where can I expect to use that?](conversation://Mara/hmm)
|
||||
|
||||
[It's a bourne shell extension, but I've tested it in `zsh` and `fish`. You can
|
||||
also do `&|` to pipe both standard out and standard error at the same time in
|
||||
the same way you'd do `2>&1 | whatever`.](conversation://Cadey/enby)
|
||||
|
||||
That will put standard out and standard error to `foo.log` the same way that
|
||||
`2>&1 > foo.log` will. You can also use this with `>>`:
|
||||
|
||||
```
|
||||
$ rustc foo.rs &>> foo.log
|
||||
$ cat foo.log
|
||||
error: expected one of `!` or `::`, found `main`
|
||||
--> foo.rs:1:5
|
||||
|
|
||||
1 | fun main() {}
|
||||
| ^^^^ expected one of `!` or `::
|
||||
|
||||
error: aborting due to previous error
|
||||
|
||||
error: expected one of `!` or `::`, found `main`
|
||||
--> foo.rs:1:5
|
||||
|
|
||||
1 | fun main() {}
|
||||
| ^^^^ expected one of `!` or `::`
|
||||
|
||||
error: aborting due to previous error
|
||||
```
|
||||
|
||||
[How do I redirect standard in to a file?](conversation://Mara/hmm)
|
||||
|
||||
The answer there is not directly! There is a workaround in the form of a tool
|
||||
called `tee` which outputs its standard in to both standard out and a file. For
|
||||
example:
|
||||
|
||||
```console
|
||||
$ dmesg | tee dmesg.txt | grep 'msedge'
|
||||
[ 70.585463] traps: msedge[4715] trap invalid opcode ip:5630ddcedc4c sp:7ffd41f67700 error:0 in msedge[5630d8fc2000+952d000]
|
||||
[ 70.702544] traps: msedge[4745] trap invalid opcode ip:5630ddcedc4c sp:7ffd41f67700 error:0 in msedge[5630d8fc2000+952d000]
|
||||
[ 70.806296] traps: msedge[4781] trap invalid opcode ip:5630ddcedc4c sp:7ffd41f67700 error:0 in msedge[5630d8fc2000+952d000]
|
||||
[ 70.918095] traps: msedge[4889] trap invalid opcode ip:5630ddcedc4c sp:7ffd41f67700 error:0 in msedge[5630d8fc2000+952d000]
|
||||
[ 71.031938] traps: msedge[4926] trap invalid opcode ip:5630ddcedc4c sp:7ffd41f67700 error:0 in msedge[5630d8fc2000+952d000]
|
||||
[ 71.138974] traps: msedge[4935] trap invalid opcode ip:5630ddcedc4c sp:7ffd41f67700 error:0 in msedge[5630d8fc2000+952d000]
|
||||
[ 1169.163603] traps: msedge[35719] trap invalid opcode ip:556a93951c4c sp:7ffc533f35c0 error:0 in msedge[556a8ec26000+952d000]
|
||||
[ 1213.301722] traps: msedge[36054] trap invalid opcode ip:55a245960c4c sp:7ffe6d169b40 error:0 in msedge[55a240c35000+952d000]
|
||||
[10963.234459] traps: msedge[104732] trap invalid opcode ip:55fdb864fc4c sp:7ffc996dfee0 error:0 in msedge[55fdb3924000+952d000]
|
||||
```
|
||||
|
||||
This would put the output of the `dmesg` command (read from kernel logs) into
|
||||
`dmesg.txt`, as well as sending it into the grep command. You might want to do
|
||||
this when debugging long command pipelines to see exactly what is going into a
|
||||
program that isn't doing what you expect.
|
||||
|
||||
Redirections also work in scripts too. You can also set "default" redirects for
|
||||
every command in a script using the `exec` command:
|
||||
|
||||
```sh
|
||||
exec > out.log 2> error.log
|
||||
|
||||
ls
|
||||
rustc foo.rs
|
||||
```
|
||||
|
||||
This will have the file listing from `ls` written to `out.log` and any errors
|
||||
from `rustc` written to `error.log`.
|
||||
|
||||
A lot of other shell tricks and fun is built on top of these fundamentals. For
|
||||
example you can take a folder, zip it up and then unzip it over on another
|
||||
machine using a command like this:
|
||||
|
||||
```
|
||||
$ tar cz ./blog | ssh pneuma tar xz -C ~/code/christine.website/blog
|
||||
```
|
||||
|
||||
This will run `tar` to create a compressed copy of the `./blog` folder and then
|
||||
pipe that to tar on another computer to extract that into
|
||||
`~/code/christine.website/blog`. It's just pipes and redirection all the way
|
||||
down! Deep inside `ssh` it's really just piping output of commands back and
|
||||
forth over an encrypted network socket. Connecting to an IRC server is just
|
||||
piping in and out data to the chat server, even more so if you use TLS to
|
||||
connect there. In a way you can model just about everything in Unix with pipes
|
||||
and file descriptors because that is the cornerstone of its design: Everything
|
||||
is a file.
|
||||
|
||||
[This doesn't mean it's literally a file on the disk, it means you can _interact
|
||||
with_ just about everything using the same system interface as you do with
|
||||
files. Even things like hard disks and video cards.](conversation://Mara/hacker)
|
||||
|
||||
Here's a fun thing to do. Using [`curl`](https://curl.se/) to read the contents
|
||||
of a URL and [`jq`](https://stedolan.github.io/jq/) to select out bits from a
|
||||
JSON stream, you can make a script that lets you read the most recent title from
|
||||
my blog's [JSONFeed](/blog.json):
|
||||
|
||||
```sh
|
||||
#!/usr/bin/env bash
|
||||
# xeblog-post.sh
|
||||
|
||||
curl -s https://christine.website/blog.json | jq -r '.items[0] | "\(.title) \(.url)"'
|
||||
```
|
||||
|
||||
At the time of writing this post, here is the output I get from this command:
|
||||
|
||||
```
|
||||
$ ./xeblog-post.sh
|
||||
Anbernic RG280M Review https://christine.website/blog/rg280m-review
|
||||
```
|
||||
|
||||
What else could you do with pipes and redirection? The cloud's the limit!
|
||||
|
||||
---
|
||||
|
||||
Thanks to violet spark for looking over this post and fact-checking as well as
|
||||
helping mend some of the brain dump and awkward wording into more polished
|
||||
sentences.
|
|
@ -70,6 +70,14 @@ let Config =
|
|||
, twitter = Some "BeJustFine"
|
||||
, inSystem = True
|
||||
}
|
||||
, Author::{
|
||||
, name = "Nicole"
|
||||
, handle = "Twi"
|
||||
, picUrl = None Text
|
||||
, link = None Text
|
||||
, twitter = None Text
|
||||
, inSystem = True
|
||||
}
|
||||
]
|
||||
, port = defaultPort
|
||||
, clackSet = [ "Ashlynn" ]
|
||||
|
|
Loading…
Reference in New Issue