Long time since the last golf. Inspired by the recent announcement of a Perl Golf book I took part in a Polish golf that was announced on the mailing list.
Given a input string that has been "encrypted" with ROT-n on STDIN
and a dictionary of words (sequences of letters A-Za-z
, not of \w
)
in @ARGV
the program needs to output to STDOUT
the original plaintext.
(Formal rules).
My best solution was 62 characters, but I figured out about an hour before the golf ended that it was actually broken, and didn't have time to figure out anything better than the 65.44 below, which is currently good for a second place. The apparent winning solution of 63 doesn't seem to work either, for unrelated reasons. So the explanation might be for the winning entry, or it might not.
#!perl -p0
You know the drill. -p
handles reading the input and printing
the output. Use -0
to read the input in one go, instead of
a line at a time.
INIT{%a=map{pop,1}@ARGV}
In the INIT
block, pop all command line parameters to make -p
read from STDIN
. Use the removed arguments as keys in a hash table
for detecting dictionary words. Using the symbol table with
something like $$_=1while$_=pop
would save a few characters, but
that's incorrect since $ARGV
is automatically set to '-'
on entering
the main loop.
$a{$&}||y/B-ZA-Gb-za/A-z/while/\pL+/g
At the start of the main body $_
contains the whole ROT-n text.
On the first iteration /\pL+/g
will match the first word (letters
only; \pL
is essentially [a-zA-Z]
). //g
works differently in scalar
than in list context: it will only match once per call, but the next call
will start at the location in the string where the last match ended. If
a match was found it returns true, otherwise false.
In the body of the while we first check if the word we matched
is in the dictionary. If it isn't (i.e. $a{$&}
is untrue) $_
obviously isn't plaintext yet, so we rotate it by one step with
y///
. This contains the only tricky bits in the program:
-
Changing
$_
causes the scalar//g
to be reset, and start matching from the start of the program. -
Doing the rotation backwards (A -> Z, B -> A, ..., Z -> Y) instead of the more intuitive direction (A -> B, B -> C, ... Z -> A) allows writing the transliteration in a way that saves one character.
There are six characters (
[\+]^_`
) betweenZ
anda
. By adding six extra characters into the right place on the left side of the transliteration operation (with-G
) we can use the rangeA-z
on the right side, instead of specifying separate ranges for upper- and lowercase letters. Compare:
y/A-Za-z/B-ZAb-za/
y/B-ZA-Gb-za/A-z/
FWIW, the 65.48 by Piotr Fusik by far the coolest solution. Wish I'd thought of that...