Project 1C: Code Cracking Improvements

In our last assignment, you implemented a one-shot candidate key based on the observed frequency distributions. Most likely, your candidate key did not recover the original message. For this assignment, we will implement a program that has a much better chance of cracking messages that were encrypted using any one of our substitution cihpers. Our goals include:

Implementation guidelines

As was the case in Project 1B, in order to crack messages encrypted with a substitution cipher, our strategy will be to find the frequencies of letters in the cipher text, and use those frequencies to help us guess the key that gave rise to the text. However, we will usually need to try more than just one candidate key. To do that systematically, we will implement a search through the space of candidate keys.

I am assuming that your search method operates with respect to three key pieces of information: the partial key consisting of the subset of ciphertext letters assigned to given plaintext letters; the remaining unassigned ciphertext letters that have not be committed to a plain text letter; and the unassigned plaintext letters to which a remaining ciphertext letter could be assigned. This conceptualization leads no a nice recursive formulation of the algorithm. (Of course, you need to deal with avoiding unintended side-effects, etc.)

However, your program should still crack messages that contain a small number of misspelled words or words that do not appear in the dictionary.

Requirements

  1. Create a WordWeaver. Write a class, WordWeaver, that creates an efficient means to access all words that have a given letter in a given position. For each one of some maximum number of letter positions (say, seven), the class creates a HashMap<Character,HashSet<String>> that maps characters in a certain position to all of the words that have that letter in that position. For example, the HashSet associated with the character ‘a’ in the second position might have the words: bat, cat, rat, ball, lathe, sack, and so on. The point of this class is to help support the evaluation of candidate keys (discussed below). Your WordWeaver class must include a method, public int possibleWords(String aPartialWord), that returns the number of words that are consistent with the given partial word For example, the (plaintext) partial word, ‘___xzj_er’, cannot become a valid word no matter which letters are selected for the remaining underscores; thus, the possibleWord method would return 0.
  2. Evaluate a (potentially partial) candidate key. In your SubstitutionCracker class, add a method public static int evalCandidateKey(String canKey, String cipherText) that uses the given candidate key to decode the cipherText, breaks that into space-delimited ‘words’, and calls WordWeaver's possibleWords method on each such word. The value of the candidate key should be some combination of the values of each word. Exactly how you combine the values that possibleWords returns for each word is up to you. You are invited, but not required, to use additional factors in your evaluation of candidate keys. Such factors might include the plausibility of a candidate key based on the frequency rank of the letter assignments. That is, a key that maps the most frequent letter in plaintext to the least frequent letter in the ciphertext is unlikely to be correct.
  3. Implement beam-search through space of candidate keys. With your WordWeaver and evaluation function in place, you are ready to search for the key that will decode a cipher text. Our strategy here is to work our way through letter-assignments in decreasing order of their frequencies as they appear in English. That is, you will first explore assignments for the space character, followed by the second most frequent, and so on. Add a method, public static void beamSearch(List candidateKeys, String ciphertext, int beamWidth), to your SubstitutionCracker class. The list of candidateKeys parameter will be our beam; its length may be less than but will never exceed our beam wdith. We will initially call beamSearch with a blank key; that is, 29 underscores. The method should (a) construct successor keys for each candidate in the list (initially only one), (b) evaluate each candidate key using evalCandidateKey, (c) sort the resulting list of keys by their evaluation scores, (d) prune the sorted list to the beam width (if necessary) dropping the candidates with the lowest scores, and (e) finally repeating the entire process either recursively or iteratively. This will terminate when all of the letters in the ciphertext have been assigned. At this point, you should print out the decodings of the ciphertext that are made by each candidate key in the order determined by your evaluation function.
  4. Decode my encrypted message. With all these modifications to your SubstitutionCracker class in place, modify your main method to accept a command-line argument that is the filename of a ciphertext. With the contents of that file, run your beamSearch method which should hopefully output the correct plaintext first, or near the top of the output decodings. You should experiment with different beam widths. I will provide two ciphertexts on Eureka that you should be able to crack.

Submission Instructions:

On your machine where you are doing your homework, create a folder called <your email name> followed by “P02”. For example, someone with email address “cjones” would create a folder called “cjonesP02”. Inside that folder, place plain text file(s) containing your answers to any written questions required of you. Also, place whatever Java and documentation files are necessary for your programming project in the same folder. Finally, either tar or zip the folder so that when I extract it, the folder ”<emailname>P02“ will be created. Finally, submit via Eureka.