EA Part 3: Crossover Reproduction and Diversity

Updates


In Part 1 and Part 2 of the project, we relied on mutation to move individuals closer to the target or into a more fit configuration. This only works (at all) if there is a probabilistic bias between the forward and backward mutation rates. In some ways, this is cheating. Another way we can evolve toward our target is through crossover reproduction. In this third and final installment of the project, we will implement such an approach. We will also be interested in the amount of diversity in our population and how that amount varies over time.

Objectives

In this second project deliverable, you will, among other things:

Preliminaries and Definitions

Crossover

Instead of generating offspring purely via mutation, we introduce crossover reproduction. This mechanism takes two parents (chosen by some method such as fitness-proportional selection) and exchanges their genetic material, producing two new children. In order to exchange and combine the respective material, the crossover operator makes some number of cuts. For explanatory purposes we will consider a single cut but you should be able to extend this to any given number of cuts.

In the single-cut case, we randomly select some position along the genetic sequence where we will crossover. Then for one offspring, we take the genetic sequence to the left of the cut from the first parent and attach it to the second parent's sequence on the right of the cut. The second child is the opposite: the genetic sequence to the left of the cut point from the second parent is attached to the first parent's sequence on the right of the cut. As an example, suppose the first parent has the sequence "ALKHINKS" and the second has "METSHLOT". Further suppose we randomly decide to crossover between the third and fourth letters. Then we would take "ALK" from the first parent and "SHLOT" from the second parent giving rise to "ALKSHLOT" as the first child. The second child consists of the opposite parts giving rise to "METHINKS".

Diversity

We also want to consider the role of diversity in an evolving population. How do we even understand diversity? Is it a property of individuals or of populations? We could say that an individual, on average, is either similar or different from most other individuals in the population. A population that has many individuals who are generally different from others in the population could be said to be diverse.

We can measure the uniqueness of an individual by how widely its genes are shared by others in the population. A gene that most of the population shares would not contribute to an individual's uniqueness measure. We can compute uniqueness by comparing an individual's genetic sequence to the sequences of all the other individuals in the population. The average uniqueness across the population can represent the population diversity. We want a measure where high values of uniqueness leads to high values of diversity.

Provisions

For this portion of the project, we will change our representation of genes and genetic sequences. Let a gene be a single-character string and a genetic sequence be a string with a letter for each gene. The alphabet should still be the 26 uppercase letters and space (but strings instead of symbols). Hint: You will want to browse the String functions provided in Advanced Student Language.

Requirements

This assignment requires more independence on your part. The required changes below may necessitate additional other changes to existing functions that we wrote before and of course new helper functions. Examples of changes to existing functions would include adding additional parameters to their signatures and headers. You are free to do this even if a contract was specified in a previous project deliverable. However, you should continue to use the two-part structure definition (define-struct org (code fit)), where the code is now a string. And any functions prescribed below should conform to the instructions.

This is an individual project; working together must be limited to asking specific questions about Racket functionality. Let me stress this -- do not share code and do not discuss your solutions.

As always, follow the design recipe and make sure your code conforms to these specifications. Function names and argument order must be followed precisely. I may test your program with code of my own so errors at testing time will result in significant penalties even if your code ‘works’ with your own tests. As with Part 2, you should use the abstract functions whenever possible instead of implementing a special case of the recursion; failure to do so will result in minor deductions.

  1. Genetic sequences as strings. Make the necessary changes to alter the representation of genetic sequences from lists of symbols to strings.
  2. Survival. From one generation to the next, some fraction of the population will survive. Modify your next-gen function so that the new population returned contains the appropriate fraction of survivors.
  3. Crossover. Write the function, crossover, that consumes two genetic sequences and a list of numbers representing the crossover cut-points. Your function should return a list of two genetic sequences for use in the two offspring.
  4. Offspring. Modify your offspring function so that it consumes two individuals, the target, and the number of crossover cut-points; it should produce a list of two individuals based on the crossover operator and their fitness with respect to the target.
  5. Mutation. Genetic algorithms traditionally rely more on crossover reproduction than on mutation. A typical rate for mutation would be 0.01 or there abouts. Note, we will no longer use forward or backward mutation. Mutation should happen as follows: for each individual, a random number is generated and compared to the mutation rate. Either the individual escapes unchanged or it has exactly one gene, selected at random, changed to a random letter selected from the alphabet.
  6. Diversity. Write the function uniqueness that consumes an individual and a population and returns a number represesnting the uniqueness of the individual within the population. You should compare the given individual to each individual in the population and for each gene accumulate the number of other individuals that have the same gene (in the corresponding position). When going through the population, you should not compare the individual with itself. The uniqueness value can be in the range from 0 to p*g where p is the size of the population and g is the length of the genetic sequence.
    Next write the function diversity-stats that consumes a population and returns a list of four numbers representing the average-uniqueness, the variance in uniqueness over the population, the minimum and the maximum uniqueness values, respectively.
  7. Revisions to the GUI. As before, I want you to organize your code according to the MVC paradigm. The requirements above should appear in your Model section, the added gui-items below should appear in your View section, and the needed changes to the call-back function should appear in your Control section.
    • Add a text input for entering the fraction of the population that (probabilistically) survives from one generation to the next. A good value might be 0.2, but you should experiment with different values in order to find what works best.
    • Add a text input for entering the number of cut points to be used during crossover.
    • Add a text for entering the mutation rate.
    • Add message items to display the four diversity statistics.

Submission

Do not forget to include the standard acknowledgements header at the top of your file. To grade this deliverable, I will review your code (including signatures, purpose statements, indentation, etc.), and then attempt to run it and interact with your GUI. Submit a single file named, “P3.rkt”, which should contain your code fulfilling the requirements above as well as your acknowledgements.