Chosen- and Known-Plaintext Attacks on Encrypted Magic Folders
Mike Stay
Jul 2005: I demonstrated this attack to the author of Encrypted Magic Folders four years ago; after seeing the attack, he was convinced to change to strong encryption and currently uses 128-bit Blowfish. This attack applies only to the version being sold in 2001.

Encrypted Magic Folders is a very popular shareware title from PC-Magic. Magic Folders, its predecessor, was designed to intercept file IO requests in Windows and return "File Not Found" if the file was in a marked folder. It does this reasonably well, although booting into DOS exposes all the files. To solve this problem, PC-Magic chose to roll their own encryption. Reading the documentation about EMF's encryption is frightening: as one cypherpunk put it, "Smells like snake oil, looks like snake oil, it even has bits of snake scales in it." Here we show that EMF is susceptible to a known-plaintext attack that needs as few as 768 bytes.

The encryption uses a rather complicated process to initialize two 256-byte tables. That process is unimportant; all we need to know is that the first table contains random bytes, while the second is a random permutation. This means that in the first table, some bytes may occur more than once while others will not appear at all, and in the second table each byte appears exactly once.

The decryption process is as follows:

P = t2[ ( ( C ^ t1[ (pos & 0xFF) ^ 0xFF ] ) - t1[ pos % modlen] ) & 0xFF ],

where P is the plaintext byte, C is the corresponding ciphertext byte, pos is the position in the file, modlen is a known quantity between 176 and 193 that is stored in a specific file in a hidden directory created by EMF, ^ is XOR, & is AND, and % is mod.

Note that the plaintext byte only depends on the ciphertext byte and the position for a given password; therefore, if one encrypts two documents with the same password, anywhere that they have two bytes in the ciphertext that match, the plaintext bytes will also match. Note also that the encryption repeats after

LCM( 256, modlen )

bytes, where LCM is the least common multiple. This can be as little as 768 bytes for modlen=192; it will not exceed 49K.

This means that if I have the victim encrypt 256 files of length 49K, each filled with a single byte, I can decrypt every file encrypted under that password simply by finding which file contains the matching ciphertext byte.

Note now that since t2 is a permutation, if two plaintext characters at different positions are the same, then the index into t2 will also be the same. Given a few plaintext bytes that are all the same (P in what follows), we can write equations like the following:

P = t2[ ( ( C ^ t1[ a ] ) - t1[ b ] ) & 0xFF ],

P = t2[ ( ( C' ^ t1[ a' ] ) - t1[ b' ] ) & 0xFF ],

P = t2[ ( ( C'' ^ t1[ a ] ) - t1[ b' ] ) & 0xFF ].

imply

( ( C ^ t1[ a ] ) - t1[ b ] ) & 0xFF = ( ( C' ^ t1[ a' ] ) - t1[ b' ] ) & 0xFF = ( ( C'' ^ t1[ a ] ) - t1[ b' ] ) & 0xFF.

The a's and b's are known, since they only depend on the position. We can expect to find equations like the third that include a and b' since a repeats every 256 bytes and b repeats every modlen bytes.

If we guess t1[a] and t1[b], then we can calculate the index i into t2. The third equation must also equal i, since it gave the same plaintext byte and there is only one occurrence of that byte in t2. We know C'' and we have guessed t1[a], so we can calculate t1[b']. Once we have calculated t1[b'], we can move on to the second equation and solve for t1[a'] in the same way.

With enough plaintext, we can get equations that include all possible locations in t1; guessing two bytes allows us to derive what the other bytes must be. We can filter wrong guesses by throwing out ones that cause conflicts--that is, we have already derived values for two entries in the table, but when we use them in one of our equations, we do not get the right index.

Once we have a complete, consistent table for t1, we can take each ciphertext byte, calculate its index into t2 given the position of the ciphertext byte, and set the value at that index to equal the corresponding plaintext byte.

How much plaintext is enough? We need at least 256 bytes of the plaintext to be the same, since there are 256 entries in the table and we need one equation for each entry. We also need equations where we have one byte we already know, so let's be generous and double it to 512.

If all the plaintext bytes were the same, we'd only know one index into t2. To find each of the bytes in t2, we need one of each plaintext byte. The minimum plaintext requirement, therefore, is on the order of 768 bytes, where the first 512 are the same and the last 256 cover all possible bytes. We expect that in a random plaintext, we would need around N log N = 256 * 8 = 2K bytes to find one of each byte for filling t2, and around 64 K to find 256 of the same byte. Fortunately, many file formats repeat bytes very often; headers of OLE documents like Word and Excel are full of FF's, for instance, so we often can get by with less.

A known-plaintext attack can therefore be mounted with as few as 768 bytes, though for random files we expect the attack to require around 64K on average.

Yet again, the axiom holds true: use public crypto algorithms and protocols.