Breaking A Monoalphabetic Encryption System Using a Known Plaintext Attack

We recall that Monoalphabetic substitution is a system of encryption where every occurrence of a particular plaintext letter is replaced by a cyphertext letter. For instance, Caesar substitution is monoalphabetic while Vigenere is not. A 2x2 Hill encryption is a monoalphabetic substitution acting on pairs of letters. Keep in mind that the definition of a monoalphabetic substitution allows for the possibility that two distinct plaintext letters are replaced by the same cyphertext letter. However, to break this system using a known plaintext attack, we will require that any two distinct plaintext letters are replaced by two distinct cyphertext letters.

To encrypt (decrypt) using the Applet below, simply cut and paste your plaintext (cyphertext) into the textarea, select Encrypt (Decrypt) and start entering your key. The plaintext (cyphertext) will be encoded (decoded) as you enter the key. To enter a key, first click in the square below (above) the plaintext (cyphertext) letter you want to encrypt (decrypt). The square should now be highlighted in yellow. Now type the corresponding cyphertext (plaintext) letter. The square to the right should now be highlighted. To delete a letter, click on the appropriate box and simply press Back Space or Del. The Space Bar and Arrow keys can be used to cycle through the key without editing it. Notice that letters that do not yet occur in the key are shaded gray.

IMPORTANT: The applet allows for the partial encryption/decryption of a monoalphabetic substitution. To this end, we will use the convention that plaintext letters are always in uppercase and cyphertext letters are always in lowercase.

To break a monoalphabetic substitution using a known plaintext attack, we can take advantage of the fact that any pair of letters in the original plaintext message is replaced by a pair of letters with the same pattern. In other words, if two letters of paintext are distinct, then their corresponding letters of cyphertext must also be distinct. To illustrate this, if we know that the word "AMMUNITION" appears in the plaintext, then we can look for strings of 10 consecutive letters of cyphertext that have the following pattern:

The 2^nd and 3^rd letters are the same
The 5^th and 10^th letters are the same (and different from the 2^nd letter)
The 6^th and 8^th letters are the same (and different from the 2^nd and 5^th letters)
All other letters are distinct.

Once we have found all possible matches, we can use a chi-squared statistic to determine which one is the most likely match for the known plaintext.

The Applet below is programmed to illustrate this codebreaking process.

Upon pressing the Random Cyphertext button, the Applet will display some text which was encrypted using a Monoalphabetic Substitution with a randomly selected key.
Press the Break button. Enter a word that you know/believe to be part of the original plaintext message and press Search or the Return key. The Applet will calculate all possible matches for the search word and display a list (in groups of 10) in increasing order according to the corresponding statistic. You may also sort the list in decreasing order according to the number of times a match occurs. Press the "Frequency" button to do so.
To see a list of the single letter frequencies taken from the cyphertext, enter a one letter search word and press Search
Some possible matches might conflict with the parts of the key you have already determined. In this case, one asterisk is placed next to the statistic for each conflict it has with your current key. For example, suppose that you have already determined T->q and E->j and you search for the word "THE". You should only be interested in the possible matches of the form "q_j". So if "qfk" is a possible match, the Applet will put one asterisk (*) next to this match since it conflicts with your assumption that E->j. Likewise, "qjy" would have two asterisks next to it since it violates E->j in two different ways, namely H->j and E->y.
To list only those possible matches that do not conflict with the current key, select "No" next to "Show All".
Once you have selected a possible match for your search word, press the Okay button, and this selection will be incorporated into the current key and the message will be partially decrypted.
Now, hopefully, you will have enough of the message decoded to start making some guesses as to what other letters of cyphertext might be.

Here's an example of how this codebreaking process might take place.