EA Part 1: Weasels

Updates


In this project, you will simulate natural selection for an organism in an environment. An organism, or individual, that is fit with respect to its environment will tend to flourish and have numerous offspring. Your model will demonstrate the power of mutation and natural selection as a search mechanism that can find solutions to complex problems in the form of highly fit individuals by stochastically exploring a space of genetic sequences. The project is divided into multiple deliverables. This page describes the first of these deliverables.

Objectives

In this first project deliverable, you will, among other things:

Preliminaries and Definitions

To start, we will model an individual's genetic material as a sequence of symbols corresponding to the characters of our alphabet plus the ‘Space’ (27 possible genes). An individual's size is the length of its genetic sequence. A population will consist of some number of individuals. We will hold the size of the population and of individuals fixed at specific values.

We model the environment as a particular sequence of letters. That is, we say that an individual whose genetic code is identical to the target sequence is perfectly fit. If the target sequence is (W E A S E L) and an organism had the sequence (K X E J N P) we would say the organism is completely unfit; that is, it has none of the right letters in the right place. Note: we will assume that an individual's size is always the same as the size of the target.

Within a population of individuals, a fit inidividual will reproduce more effectively than unfit individuals. If natural selection is an effective heuristic for guiding search through the space of individuals, then over time, a population will tend to adapt to and converge on the environment (target). In our case, the adaptation means that individuals' genetic codes should converge on the target sequence.

In our simplified model, organisms or individuals reproduce by mutation. A gene (single symbol) will tend to be preserved from one generation to the next if it fits the environment well. Similarly, genes that do not fit the target tend to be replaced with mutations in the next generation. To simulate probabilities, we will be using a random number generator. The Racket function (random n) will return a pseudo-random integer in the range [0,n-1]. This will be helpful.

Ultimately, we will want to simulate a series of generations and the changes within a population over time. You will write a Racket program to perform this simulation. As always, you should follow the design recipe, organize your functions in meaningful groups, and define global variables to facilitate future modifications of your code.

Provisions

The following data definitions for a gene, an alphabet, and a genetic-sequence, structure and data definitions for an individual and data definition for a population are provided for you.

We will represent our environment, or target, as a genetic-sequence.

Requirements

This is an individual project; working together must be limited to asking specific questions about Racket functionality. Let me stress this -- do not share code and do not discuss your solutions.

As always, follow the design recipe and make sure your code conforms to these specifications. Function names and argument order must be followed precisely. I may test your program with code of my own so errors at testing time will result in significant penalties even if your code ‘works’ with your own tests. You are encouraged to use the abstract functions wherever appropriate.

  1. Individual fitness. Because the fitness value of an individual is so important in several respects, we are representing an individual as the combination of its genetic code and the fitness of that genetic code. But before we get there, we need a function that will compute the fitness of a genetic-sequence. Write the function fit, that consumes two genetic sequences, one for a particular individual and the second representing the target, and returns a number that represents the fitness as the number of genes in the individual that match the target by position.
  2. random-individual: genetic-sequence -> individual. Write a function that consumes the target sequence and returns an individual (i.e., a make-org). Hint: You may want to use a local in writing this function.
  3. random-population: genetic-sequence number -> population. Write a function that creates a random population; that is, it creates a population of random individuals. Your function should consume a genetic-sequence representing the target or environment, and a number representing the desired size of the population.
  4. Generational modeling: part 0. In the second part of the project we will simulate the natural selection and evolution of our population. For that, you will need to write a function, offspring, that will consume an individual and the target genetic-sequence. This function will produce a new individual probabilistically based on the match between the given individual's genetic sequence and the target sequence. We have two defined mutation rates; one we'll call forward mutation which is the probability that a mismatched gene gets mutated from its current value, and the other a backward mutation rate, which is the probability that a gene matching the target gets mutated into an unmatched value. You should define global identifiers for these two probabilities so they can be easily changed. We might have a forward rate of 90% and a backward rate of 5%. Each gene in the sequence is independently subject to mutation. That is, each gene should be checked as matching the target or not, and then then for each have a weighted coin tossed that determines if the gene gets mutated. The particular weighting -- forward or backward -- depends on whether the gene does not match the target (forward) or already does (backward). In order to keep things simple, we will allow the possibility that a gene mutates to itself; that is, we will not require that when mutated, the gene always changes to a new value.
  5. Graphical User Interface. You should organize your code according to the MVC framework discussed in class and in the text. Although you will have written a function that creates a population, we will be using that in our second deliverable. For this part of the project, you will write a GUI that lets you interact with a single individual. Create a GUI that has three components: a message area for displaying the genetic code of the individual, a message area for reporting the fitness of the individual, and a button that replaces the individual with its offspring. That is, when a user clicks the button, the current individual should be passed as an argument to offspring and the resulting individual should become the current individual and should be displayed in the GUI. To accomplish this, use the extended GUI teachpack, egui.rkt (available on Eureka), instead of the standard gui.rkt teachpack provided in the HtDP teachpacks. This extended teachpack provides one additional function, message-contents, that operates on message gui-items. The function, message-contents, consumes a message gui-item and returns its current contents. You can use this to extract the current sequence and pass it to your offspring function. Of course, you will need to do the appropriate conversions between symbols, strings, etc.

Submission

Do not forget to include the standard acknowledgements header at the top of your file. To grade this deliverable, I will review your code (including signatures, purpose statements, indentation, etc.), and then attempt to run it and interact with your GUI. Submit a single file named, “P1.rkt”, which should contain your code fulfilling the requirements above as well as your acknowledgements.