Factor: a stack-based programming language

,

As you may already know, I’m a big fan of stack-based languages such as Forth, functional languages such as Haskell and reflexive languages such as Smalltalk. You can imagine how happy I was when I discovered Factor a few days ago: it combines all those aspects.

Today, a friend sent me someone email signature and asked me if I could decipher it:

01101001001000000110011001110101011000110110101101100101011001000010
00000111100101101111011101010111001000100000011011010110111101101101
00001101000010100110000101101110011001000010000001101110011011110111
01110010000001100110011011110111001000100000011110010110111101110101

As any programmer on the Earth would have, I immediately assumed that those were ASCII codes printed in binary format. I had a Factor interactive shell opened in one of my windows, so I cut and pasted the whole string (surrounded by quotes) and entered:

8 group [ 2 base> ] map >string print

and the cleartext version appeared instantly. All in all, it took me around 20 seconds to uncover the original text using Factor.

How does this work? Factor is a stack-based language, meaning that data are put onto a stack and words (equivalent of functions in other languages) use the data on the stack and put results there. Factor is a (dynamically) typed language: complex data can be pushed onto the stack, while untyped languages such as Forth can only push numbers there.

Writing the string pushes it on the stack. Using

8 group

takes the string on the top of the stack, considers it as a sequence (a succession of characters), group them by eight and returns an array of strings of length 8. At this point, there is only one element on the stack: an array of eight-characters-long strings.

Then

[ 2 base> ]

pushes another element on the top of the stack: a quotation (the equivalent of a lambda expression in functional languages), which is a block containing code. The base> word consumes two elements from the stack, a string S and a number B, and pushes back a number which is the value represented by S in base B. For example, the expression

"01101001" 2 base>

will let the value 105 on the stack, as 01101001 in binary represents 105 in decimal.

map

takes a sequence and a quotation. It represents the classical map operation in functional languages: it applies the quotation to every element of the sequence and gathers the results in a new sequence. As a consequence, each eight-characters-long string gets transformed into its decimal representation. At the end, we end up with a single element on the stack, which is an array containing all the ASCII codes of the sentence.

>string

transforms a sequence of ASCII codes into the corresponding string. Then

print

is similar to C’s puts and prints the string on the standard output while recognizing special sequences such as and .

The content of the unencrypted text itself is not important; my point is that Factor is very compact and its stack-oriented nature helps writing concise and clear programs. For example, here is one of the many possible implementations of the reverse functionality: it takes a string from the stack and lets its ASCII representation in binary onto the stack.

[ 2 >base 8 CHAR: 0 pad-head ] [ append ] map-reduce

I’m sure that you’re thrilled to know that “Hello, world!” encodes as

0100100001100101011011000110110001101111001011000010
0000011101110110111101110010011011000110010000100001

(edited on 2009-06-13 to use pad-head and map-reduce)

If you like this post, you can send some bitcoin dust to 17Kr97KJNWrsAmgr3wJBVnLAuWqCCvAdMq.
blog comments powered by Disqus