Project 1B: Code Cracking

For this assignment, we will implement a program that cracks messages encrypted using any one of our substitution ciphers. Our goals include:

Implementation guidelines

In order to crack messages encrypted with a substitution cipher, our strategy will be to find the frequencies of letters in the cipher text, and use those frequencies to help us guess the key that gave rise to the text. To do that, we will also need an estimate on the frequencies of letters in standard English text. With the frequencies in hand, we can establish an initial correspondence between letters in the cipher text and the plain text based on frequencies of those letters in the respective texts. This correspondence represents a candidate key. We can evaluate the plausibility of candidate keys by checking the words that appear in the plain text resulting from the key against a dictionary of English words. You'll want to use the dictionary of words that I provide on our Eureka page.

Requirements

  1. Letter frequencies. Write the class, FrequencyMap, that records mappings between characters and their frequencies in a given document. Your class should have a single constructor that consumes a String argument that is the filename of a document for which we want its frequency map. You should design your class with an eye toward extracting the characters in order from most to least frequent. As a Map, your implementation can be simplified by not reinventing the wheel; the TreeMap class may be helpful; that is, by either implementing or extending a Map. Consequently, your map will naturally consist of Map.Entry<Character,Integer>s.
  2. Letters ordered by frequencies. A TreeMap (for example) orders its map entries by a natural ordering that makes comparison of Keys efficient. Once we have determined the frequencies, we want to access the letters in an order that is decreasing with their frequency. Augment your FrequencyMap class with a method, public List<Map.Entry<Character,Integer>> getSortedEntries() that returns a list of character-integer map entries ordered by decreasing integer counts (which represent our frequencies). You should implement a Comparator<Map.Entry<Character,Integer>> class to support this.
  3. Starting the cracking program. Write another class, SubstitutionCracker, that will use your FrequencyMap class and will eventually handle the cracking of an encrypted message. For now, we will generate and evaluate only a single candidate decrypted message. Your SubstitutionCracker class should take a single command-line argument that represents the filename of an encrypted text that we want to crack.
    • Find frequency distributions. By using your FrequencyMap class, find the frequencies of Crime and Punishment, by Fyodor Dostoevsky; we will be using this text as our measure of the frequency of letters in plain English. With another frequency map, determine the frequencies of letters in the encrypted message that is provided as the command-line argument; this encrypted text should be generated using your SubstitutionCipher class from part 1 of our project.
    • Create candidate key. Using the frequencies in Crime and Punishment and in your cipher text, create a candidate key by matching the most frequent characters, the second-most frequent, and so on. Because your cipher text may not use every character in our 29-letter character set, you should introduce the underscore, ‘_’, as an ‘unspecified’ key letter; this will turn out to be helpful in part three of our project.
    • Evaluate candidate key. We can evaluate candidate keys based on how they transform the cipher text into plain text. If the resulting plain text has many words that appear an our dictionary of English words (provided on Eureka), then we place some confidence in the candidate key. When the proportion of such words is high we or more likely to have found the key by which the cipher text was created. Evaluate the words (space-separated) generated by the candidate key found in the previous step. Report (i.e., display) the number of words found in the dictionary relative to the total number of words in the resulting plain text message.

Submission Instructions:

On your machine where you are doing your homework, create a folder called <your email name> followed by “P1B”. For example, someone with email address “cjones” would create a folder called “cjonesP1B”. Inside that folder, place plain text file(s) containing your answers to any written questions required of you. Also, place whatever Java and documentation files are necessary for your programming project in the same folder. Finally, either tar or zip the folder so that when I extract it, the folder ”<emailname>P1B“ will be created. Finally, submit via Eureka.