Take a look at this Perl regular expression:

perl -lne '(1x$_) =~ /^(11+?)\1+$/ && print "$_ is composite"'

This regular expression matches only the composite numbers (and doesn't match prime numbers). Can you figure out how it works? I'll give the explanation below but try to figure it out yourself. Here is what happens when you run it:

$ perl -lne '(1x$_) =~ /^(11+?)\1+$/ && print "$_ is composite"' 1 2 3 4 is composite 5 6 is composite 7 8 is composite 9 is composite 10 is composite 11 12 is composite 13

(solution below)

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

.

As you can see, it matches non-prime numbers. Here is how it works.

First, the given number (in the `$_`

variable) is converted in its unary representation via `(1x$_)`

. For example, the number `5`

gets converted into `1x5`

, which is `11111`

(`1`

repeated `5`

times).

Next, this unary string gets tested against the regular expression. If it matches, the number is a composite, otherwise it's a prime number.

The regular expression `^(11+?)\1+$`

works this way:

It determines if two or more `1`

s repeatedly make up the whole number. If two or more `1`

s repeatedly make up the whole number, the regex matches, which means that the number is composite. Otherwise it's a prime.

Let's look at what the regex does on the numbers `5`

and `4`

.

The number `5`

in unary representation is `11111`

. The `(11+?)`

matches the first two ones `11`

. The back-reference `\1`

becomes `11`

and the whole regex now becomes `^11(11)+$`

. It can't match five ones (it can match two, four, six, eight ones, etc.), therefore it fails. But since it used `+?`

, it backtracks and matches the first three ones `111`

. The back-reference becomes `111`

and the whole regex becomes `^111(111)+$`

. It doesn't match again. This repeats for `1111`

and `11111`

, which also don't match, therefore the whole regex doesn't match and the number is a prime.

The number `4`

in unary representation is `1111`

. The `(11+?)`

matches the first two ones `11`

. The back-reference `\1`

becomes `11`

and the regex becomes `^11(11)+$`

. It matches the unary string, therefore the number is not a prime.

Very clever.

I didn't invent this regular expression, it was invented in 1998 by Abigail.

Don't take this regular expression too seriously, it's actually neither a regular expression (as defined in automata theory), nor a way to check if a number is a composite number or a prime number. It's just an awesome thing that Perl and advanced regular expression engines can do.

Also, see this insightful article called The Prime That Wasn't by Andrei Zmievski for a discussion about how this regex fails for larger numbers because of backtracking.

Also, if you want to learn more about Perl one-liners, check out my Perl One-Liners Explained article series and download the perl1line.txt file.

See you next time!