perl
can be considered a general purpose programming language. A full tutorial on perl
is beyond the scope of this tutorial. Only some useful tools and one liners will be introduced here.
While perl
is a bit more difficult to master than awk
, it can often do things at much faster rate. So it is handy to use on large files even if the same task can be done in awk
.
perl
programs can be provided on the command line directly without creating a separate program file. The -e
flag is used to indicate that the program code is to follow. For example:
$ perl -e 'print "hello world\n"'
Since we will be using perl
for data processing, we will often rely on the -p
and -n
flags. These flags will automatically loop over all lines in files specified on command line or standard input and run the provided code over them. -p
will also print the line after processing, while -n
would require the user to print the line (useful if you only want to print certain lines).
In the example below, we will prepend and append a character to each line in the input. Note that unlike awk
, perl
will NOT remove the trailing new line from each line. Hence, we must use chomp
method to remove the trailing new line. By default, most methods in perl
operate on the $_
operator and hence we don't need to specify that as an argument to chomp
.
$ perl -n -e 'chomp; print "X $_ X\n"' input_users.txt
X USER_ID:ACCT_NUM:USER_NAME X
X 3:5555:David X
X 5:789:Carol X
...
If we are only transforming rows in input, then we can rely on the -p
flag and its automatic printing.
$ perl -p -e 's/Carol/Susan/' input_users.txt
USER_ID:ACCT_NUM:USER_NAME
3:5555:David
5:789:Susan
11:5555:Susan
...
The above works because the s/../../
substitution command reads/writes to $_
by default and the -p
automatic print displays $_
. We could also modify $_
directly as part of the program:
$ perl -p -e '$_ = "X" . $_' input_users.txt
XUSER_ID:ACCT_NUM:USER_NAME
X3:5555:David
X5:789:Carol
X11:5555:Carol
...
When working with columnar (CSV) data, it is often handy to use the -a
auto-split command line option. This option will split each input line and store the split array in @F
. The -F
flag can be used to specify the field separator.
$ perl -F':' -ane 'print "$F[1]\n"' input_users.txt
ACCT_NUM
5555
789
5555
...
Similar to awk
, perl
also supports the special BEGIN
, END
blocks to indicate code that needs to run at the beginning and end of processing respectively.
The example below will produce the sum of all values in column 2 ($F[1]
). Note that if a BEGIN
and/or END
block is present, we need to enclose the normal code in {}
. Also note that we did not need to initialize $s
to zero in a BEGIN
block.
$ perl -F':' -an -e '{$s += $F[1]}' -e 'END {print "sum = $s\n"}' input_users.txt
sum = 84919
It is often required to modify the contents of a file rather than create a new file. perl
makes this quite easy with the -i
flag.
$ cp input_users.txt x.txt
$ perl -pi -e 's/Frank/Fred/' x.txt
$ diff input_users.txt x.txt
5c5
< 1:1234:Frank
---
> 1:1234:Fred
$ rm x.txt
It is also possible to create a backup of the original file before it is replaced:
$ perl -p -i.bak -e 's/Frank/Fred/' x.txt
TODO: document common special variables (e.g. $_
, $/
, ...)