Note: It has been pointed out that there is a problem that occurs when encoding strings of consecutive characaters e.g. 1111111111 - I will fix this when I get a chance

Try out LZW data compression using my PHP implementation below - try encoding the article taken at random from Wikepdia or input your own text for encoding.

LZW doesn't work well with short strings. Try encoding 'hello'!

The encoded output will hopefully be shorter than the input!

The first 16 bits of the output are used to store the number of bits used to encode each codeword.

Fixed version note
You will notice on this old version that if you encode and decode the text it will not be the same. After a while of debugging I finally noticed (the algorithm looked perfect!) that the problem was caused by the php function in_array that I used to determine if word+x is in the dictionary (see LZW algorithm).

The following code demonstrates the problem:
$myarray = array(" 1", "a");
echo in_array("1", $myarray); //echo in_array("1", $myarray, true); will work correctly
The output is 1 (true) - it seems that in_array trims the string " 1", so that it equates to "1". The fix was simple - use the strict paramater of in_array and it will also check the types. :-)

View Page Source

It's not what you know, it's whoyouknow.co.uk