Over the last number of years, several peer-reviewed journals and conferences have been embarrassed at having accepted randomly generated research papers. You can read about one example at this site (with link to actual paper). While impressive in an amusing sort of way, it is instructive to discover just how far we can get with extremely naive methods. In this exercise, you will implement your own ‘random’ text generator and we will try to get a sense of how much additional work would be necessary to get a randomly generated paper accepted. In the process, I am hoping that, among other things, you will:
As in other exercises, you may use the language of your choice with the provision that it run on my laptop, etc. As always, your code must be well documented.
You should write a one-page report of your activities and findings associated with this assignment. Your report should serve as a stand-alone document; thus, it should describe the problem or focus, the approach that you employed, and an indication of how well it performed. However, you should weight the description toward the fourth requirement above. You may include short generated sequences for unigram, bi-gram and tri-gram models, but I am more interested in seeing a comparison of 30 or so words based on a tri-gram model in comparison to the improved text (of equal length) based on your efforts in step 4. You are welcome to include figures if you think they contribute to the report; however, make sure your picture really is worth a thousand words.
I am providing a modified template file that you should use to format your one-page report. (If you would rather use LaTeX, you may use the style file from the ACM that was linked previously.) Your affiliation should be “Westmont”; Whether you use LaTeX or Word, you should use the template with only the following modifications:
Include a README file with your final submission. It should serve as an index to the files that you are submitting, and include instructions for running your program.
You should bundle your files (code, README, and report pdf) in a gzipped tar file. Name your gzipped tar file with your Westmont emailname and “P5” (no spaces); for example, someone named Eva Bailey might create a folder called “evabaileyP5” or “ebaileyP5”. When I open your submission, your files should be contained within an easily identifiable sub-directory.