Pipe Viewer – A Unix Utility You Should Know About

catonmat.net

236 points by lnyan 2 months ago


mort96 - 2 months ago

Pipe viewer is excellent. I use it all the time.

As of version 1.8.10[1], which includes my merge request[2] to add an '--output' option, it has even completely replaced my use of 'dd' for writing disk images: 'sudo pv -Yo /dev/mmcblk0 whatever.img' is nicer, has much better progress indication, automatically selects a more sensible buffer size, and begets fewer groans from UNIX neckbeards, than the old 'sudo dd of=/dev/mmcblk0 if=whatever.img'. (The '-Y' causes pv to sync after each write, which greatly improves progress indication in Linux.)

Though it's useful for much more of course. I use it for progress when compressing files ('pv blah | gzip ...'), when uploading files to the web ('pv blah | curl --upload-file - ...' — curl doesn't show progress when uploading for whatever reason), or just when I wanna see that something is happening with an operation which would otherwise take a while (even things like a slow 'du -h /some/path | sort -h' benefits from a 'pv' squeezed in the middle just to indicate that something is happening).

[1] https://codeberg.org/a-j-wood/pv/releases/tag/v1.8.10

[2] https://codeberg.org/a-j-wood/pv/pulls/90

heinrich5991 - 2 months ago

There's also `progress` which works for tools mainly operating on a single file, but unlike `pv`, you don't have to start the tool differently. It'd e.g. work nicely for the `gzip` example. Just call `progress` on a different terminal while `gzip` is running.

arghwhat - 2 months ago

> This example shows that the access.log file is being read at the speed of 37.4MB/s but gzip is writing data at only 1.74MB/s. We can immediately calculate the compression rate. It's 37.4/1.74 = 21x!

No you cannot:

1. Compression algorithms buffer a lot and will have have tendencies to perform large burst writes and in particular large writes on final flush. Instantaneous measurements will therefore not be particularly useful.

2. Compression ratio refers to the average across an entire file, as entropy is not uniform across the input unless you are compressing just noise or just zeros. Some sections might compress extremely well while others do not and end up consuming more space than the input due to overhead.

jcul - 2 months ago

pv is great.

It has a limit parameter so you can limit the speed. Great if you don't want to saturate some link or have additional costs for uploading above a certain rate per hour/day.

Also useful for testing behaviour on slow filesystem / connections.

It can take a pid argument too, -d IIRC, which will get it to display progress info for all the open file descriptors of a running process.

Really useful as a quick way to check what a IO process is doing if appears to be stuck.

darkwater - 2 months ago

Pipe viewer? What's that? Let me check the post...oh, it's good old pv! Never noticed it had a full name, damn Unix utilities with their short names!

NelsonMinar - 2 months ago

I love pv but how much does adding the pipe affect overhead? I feel like most of my big jobs I want to measure are on things where you want the program to have direct access to the underlying file or storage. `pv somefile | dd` is going to be slower than `dd somefile`. At least I think so? I have no idea what modern Linux I/O can optimize.

Also does pv necessitate doing single threaded I/O?

codetrotter - 2 months ago

I used to love pv, but I had ZFS send and recv hang many times when I was using pv in the pipeline and I was never sure why but after I stopped using pv in the pipeline and started using the verbose flag of the ZFS command on the receive side instead which provides enough output for me to see that it’s progressing and haven’t had those kinds of problems since.

Searching now it seems indeed that up until recently this was a problem other people have been having too. For example the recent forum thread https://forums.freebsd.org/threads/zfs-send-hanging.94994/ where they were discussing which version this was fixed in and someone saying that the latest available version to them from the packages was still a bit too old.

throwaway127482 - 2 months ago

I like to use pv as a quick and dirty operations per second counter. Sometimes I will write a program or script that does a bunch of things in parallel (e.g. RPCs to a service I'm working on), and prints one line of output for every operation completed. Then I pipe that output to pv using the --lines option to count only lines. It shows how many lines are being printed per second, which roughly counts operations per second. (IIRC, also need to pipe to /dev/null to prevent pv's fancy output from clobbering the tool's output).

Fun stuff! Especially when combined with GNU parallel, in cases where the thing I'm measuring isn't already parallelized, and I want to be lazy.

6c696e7578 - 2 months ago

A little more typing, but I find dd present on most systems already, so I tend to do this:

  tar ... | dd status=progress | ...
emptiestplace - 2 months ago

> The obvious way to do it is:

> $ gzip -c access.log > access.log.gz

Is it?

ElevenLathe - 2 months ago

See also vipe(1) from the wonderful moreutils: https://manpages.debian.org/stretch/moreutils/vipe.1.en.html

sirjaz - 2 months ago

We have that in powershell also show-progress

thanatos519 - 2 months ago

Yes! My `,pv` is approximately: (probably a better way to make the k, but I stop once something is adequate; maybe I just need to make a `,,kof`)

    tar cpS "$@" --sort=name | pv -bratpes $(du -cks "$@"|sed -n '/.total$/ s/.total$//p')k
Which gives me progress bars for big copies like:

    ,pv files/ | tar xp -C /destination

    ,pv files/ | ssh overthere tar xp -C /destination
kazinator - 2 months ago

If you want the drill into text flowing through a pipeline, I made a different tool called Pipe Watch (pw).

https://www.kylheku.com/cgit/pw/about

dustfinger - 2 months ago

In addition to pv, I also recommend learning about watch [1], which allows you to watch things change over time.

[1]: https://linux.die.net/man/1/watch

dazzawazza - 2 months ago

Also available on FreeBSD https://www.freshports.org/sysutils/pv

082349872349872 - 2 months ago

tangentially, tee(1) can also be useful for deciding which update in the middle of a long pipeline broke things

c0deR3D - 2 months ago

Got me wondering, how does it works?

mac3n - 2 months ago

see also

https://gitlab.com/mac3n/pipeleak

- 2 months ago
[deleted]