Pipe Viewer

Pipe Viewer I’ve got a rather large dataset that I need to do a lot of processing on, over several iterations, it’s a 20gb zip file, flat text, and I’m impatient and don’t like not knowing things!

My new favourite Linux command line tool, pv (pipe viewer) is totally awesome. Check this out:

 pv -cN source < urls.gz | zcat | pv -cN zcat | perl -lne '($a,$b,$c,$d) = split /\||\t/; print $b unless $b =~ /ac\.uk/; print $c unless $c =~ /ac\.uk/' | pv -cN perl | gzip | pv -cN gzip > hosts.gz zcat: 93.4GiB 1:33:18 [26.6MiB/s] [ <=> ] perl: 85.7GiB 1:33:18 [25.3MiB/s] [ <=> ] source: 13.2GiB 1:33:17 [3.57MiB/s] [===============================================> ] 67% ETA 0:44:41 gzip: 12.7GiB 1:33:18 [3.51MiB/s] [ <=> ] 

pv – Pipe Viewer – My New Favourite Command Line Tool