Update on Awk One-Liners Explained: String and Array Creation

This is an update post to my three-part article Awk One-Liners Explained.

I received an email from Eric Pement (the original author of Awk one-liners) and he said that he just published a new version of awk1line.txt file. I did a diff and found that there were seven new one-liners in it!

The new file has two new sections String Creation and Array Creation, and it also updates Selective Printing of Certain Lines section. I'll explain the new one-liners in this article.

Here is the latest version of awk1line.txt – awk1line-new.txt.

The original Eric Pement's Awk one-liner collection consists of five sections, and I explained them in my previous three articles:

1. File spacing (explained in part one).
2. Numbering and calculations (explained in part one).
3. Text conversion and substitution (explained in part two).
4. Selective printing of certain lines (explained in part three).
5. Selective deletion of certain lines (explained in part three).
6. String creation, array creation and update on selective printing of certain lines. (explained in this part).
7. Release of Awk One-Liners Explained e-book.

Awesome news: I have written an e-book based on this article series. Check it out:

Awk book

Okay, let's roll with the new one-liners!

String Creation

1. Create a string of a specific length (generate a string of x's of length 513).

awk 'BEGIN { while (a++<513) s=s "x"; print s }'

This one-liner uses the "BEGIN { }" special block that gets executed before anything else in an Awk program. In this block a while loop appends character "x" to variable "s" 513 times. After it has looped, the "s" variable gets printed out. As this Awk program does not have a body, it quits after executing the BEGIN block.

This one-liner printed the 513 x's out, but you could have used it for anything you want in BEGIN, main program or END blocks.

Unfortunately this is not the most effective way to do it. It's a linear time solution. My friend waldner (who, by the way, wrote a guest post on 10 Awk Tips, Tricks and Pitfalls) showed me a solution that's logarithmic time (based on idea of recursive squaring):

function rep(str, num,     remain, result) {
    if (num < 2) {
        remain = (num == 1)
    } else {
        remain = (num % 2 == 1)
        result = rep(str, (num - remain) / 2)
    }
    return result result (remain ? str  : "")
}

This function can be used as following:

awk 'BEGIN { s = rep("x", 513) }'

2. Insert a string of specific length at a certain character position (insert 49 x's after 6th char).

gawk --re-interval 'BEGIN{ while(a++<49) s=s "x" }; { sub(/^.{6}/,"&" s) }; 1'

This one-liner works only with Gnu Awk, because it uses the interval expression ".{6}" in the Awk program's body. Interval expressions were not traditionally available in awk, that's why you have to use "--re-interval" option to enable them.

For those that do not know what interval expressions are, they are regular expressions that match a certain number of characters. For example, ".{6}" matches any six characters (the any char is specified by the dot "."). An interval expression "b{2,4}" matches at least two, but not more than four "b" characters. To match words, you have to give them higher precedence - "(foo){4}" matches "foo" repeated four times - "foofoofoofoo".

The one-liner starts the same way as the previous - it creates a 49 character string "s" in the BEGIN block. Next, for each line of the input, it calls sub() function that replaces the first 6 characters with themselves and "s" appended. The "&" in the sub() function means the matched part of regular expression. The '"&" s' means matched part of regex and contents of variable "s". The "1" at the end of whole Awk one-liner prints out the modified line (it's syntactic sugar for just "print" (that itself is syntactic sugar for "print $0")).

The same can be achieved with normal standard Awk:

awk 'BEGIN{ while(a++<49) s=s "x" }; { sub(/^....../,"&" s) }; 1

Here we just match six chars "......" at the beginning of line, and replace them with themselves + contents of variable "s".

It may get troublesome to insert a string at 29th position for example... You'd have to go tapping "." twenty-nine times ".............................". Better use Gnu Awk then and write ".{29}".

Once again, my friend waldner corrected me and pointed to Awk Feature Comparsion chart. The chart suggests that the original one-liner with ".{6}" would also work with POSIX awk, Busybox awk, and Solaris awk.

Array Creation

3. Create an array from string.

split("Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec", month, " ")

This is not a one-liner per se but a technique to create an array from a string. The split(Str, Arr, Regex) function is used do that. It splits string Str into fields by regular expression Regex and puts the fields in array Arr. The fields are placed in Arr[1], Arr[2], ..., Arr[N]. The split() function itself returns the number of fields the string was split into.

In this piece of code the Regex is simply space character " ", the array is month and string is "Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec". After the split, month[1] is "Jan", month[2] is "Feb", ..., month[12] is "Dec".

4. Create an array named "mdigit", indexed by strings.

for (i=1; i<=12; i++) mdigit[month[i]] = i

This is another array creation technique and not a real one-liner. This technique creates a reverse lookup array. Remember from the previous "one-liner" that month[1] was "Jan", ..., month[12] was "Dec". Now we want to the reverse lookup and find the number for each month. To do that we create a reverse lookup array "mdigit", such that mdigit["Jan"] = 1, ..., mdigit["Dec"] = 12.

It's really trivial, we loop over month[1], month[2], ..., month[12] and set mdigit[month[i]] to i. This way mdigit["Jan"] = 1, etc.

Selective Printing of Certain Lines

5. Print all lines where 5th field is equal to "abc123".

awk '$5 == "abc123"'

This one-liner uses idiomatic Awk - if the given expression is true, Awk prints out the line. The fifth field is referenced by "$5" and it's checked to be equal to "abc123". If it is, the expression is true and the line gets printed.

Unwinding this idiom, this one-liner is really equal to:

awk '{ if ($5 == "abc123") { print $0 } }'

6. Print any line where field #5 is not equal to "abc123".

awk '$5 != "abc123"'

This is exactly the same as previous one-liner, except it negates the comparison. If the fifth field "$5" is not equal to "abc123", then print it.

Unwinding it, it's equal to:

awk '{ if ($5 != "abc123") { print $0 } }'

Another way is to literally negate the whole previous one-liner:

awk '!($5 == "abc123")'

7. Print all lines whose 7th field matches a regular expression.

awk '$7  ~ /^[a-f]/'

This is also idiomatic Awk. It uses "~" operator to test if the seventh "$7" field matches a regular expression "^[a-f]". This regular expression means "all lines that start with a lower-case letter a, b, c, d, e, or f".

awk '$7 !~ /^[a-f]/'

This one-liner matches negates the previous one and prints all lines that do not start with a lower-case letter a, b, c, d, e, and f.

Another way to write the same is:

awk '$7 ~ /^[^a-f]/'

Here we negated the group of letters [a-f] by adding "^" in the group. That's a regex trick to know.

Awk one-liners explained e-book

I just wrote my first e-book called Awk One-Liners Explained. I improved the explanations of one-liners in this article series, added new one-liners and added three new chapters:

Introduction to Awk One-liners
Summary of Awk Special Variables
Idiomatic Awk

The book is here:

Awk book

What's next?

If you liked this series, then here's some more Awk stuff I've created:

Awk cheat-cheet – summary of Awk variables, functions and command line arguments.
10 Awk Tips, Tricks and Pitfalls - Waldner's guest post.
Awk YouTube Video Downloader – a Youtube video downloader written in Gnu Awk.

Have fun with this and see you next time!