CS010 Introduction to Computer Science I
Fall 2007

Project 2.1
WEASEL:
A Simulation of Mutation and Natural Selection

In this project, you will simulate natural selection for an organism in an environment. An organism that is genetically fit with respect to its environment will tend to flourish and have numerous offspring. Your model will demonstrate the power of mutation and natural selection as a means for guiding search for fit individuals through a space of genetic sequences. The project is divided into two separate deliverables.  This page describes the first of these deliverables.

#### Objectives

In this project, you will, among other things:
• practice using abstract functions
• gain experience in modeling real-world problem domains
• encounter a new form of recursion (part 2)
• build a GUI by which one can interact with your program (part 2)

#### Overview

We will model an organism as a genetic sequence of symbols corresponding to the characters of our alphabet plus the symbol 'Space (27 possible genes). An organism's size is the length of its genetic sequence. We model the environment as a particular target genetic sequence. That is, an organism whose genetic code is identical to the target sequence is said to be perfectly fit. If the target sequence was {W,E,A,S,E,L} and an organism had the sequence {K,X,E,J,N,P} we would say the organism is completely unfit. Within a population of individuals, a fit organism will reproduce more effectively than unfit individuals. If natural selection is an effective heuristic for guiding search through the space of individuals, then over time, a population will tend to adapt to the environment (target). In our case, the adaptation means that individuals' genetic codes should converge on the target sequence.

In our simplified model, organisms reproduce by mutation. For our purposes, a gene (single symbol) will tend to be preserved from one generation to the next if it fits the environment. Similarly, genes that do not fit the target tend to be replaced with mutations in the next generation. To simulate probabilities, we will be using a random number generator. The Scheme function (random n) will return a semi-random number in the range [0,n-1]. This will be helpful.

Ultimately, we want to simulate a series of generations and the changes within a population over time. You will write a Scheme program to perform this simulation. As always, you should follow the design recipe, organize your functions in meaningful groups, and define global variables to facilitate adjustments to your simulation.

#### Provisions

First, formally we provide a data definition for an individual.

Our alphabet will be the symbols: 'A, 'B, ... 'Z, 'Space

A gene is a symbol from our alphabet

A genetic-sequence is either:
1. empty, or
2. (cons s i) where s is a gene and i is a genetic-sequence
(Or as we would write at this point of the semester, a (listof gene).)

(define-struct org (code fit))
An individual is a structure: (make-org c f) where c is a
genetic-sequence and f is a number representing the fitness

We will represent our environment, or target, as a genetic-sequence.

#### Requirements

1. First, you will need a function, generate-sequence, that consumes a number describing the length of the organism's genetic code, and returns a genetic-sequence of the appropriate length.  The genetic code should be selected randomly.
2. You need to write code to compute the fitness of a particular genetic-sequence with respect to the environment.  We will model fitness as the number of genes in a given sequence that are the same as the corresponding genes in the target.  Write the function, fit, that consumes two genetic sequences, one for a particular critter and the second representing the target, and returns a number that represents the fitness, or the number of genes in the individual that match the target.
3. Next we need a function to create individuals.  Write the function, generate-individual, which is analogous to generate-sequence.  It consumes a target genetic-sequence and returns a randomly generated individual with the appropriately determined fitness.
4. You need to write a function, offspring, that will consume an individual and a target genetic-sequence, and will produce a semi-randomly generated individual based on the match between the genetic material of the given individual and the target.  There are two set mutation rates; one we'll call forward mutation which is the probability that a mismatched gene gets mutated, and a backward mutation rate that is the probability that matching gene gets mutated. You should define global variables for these two probabilities.  We might have a forward rate of 70% and a backward rate of 10%.  To keep things simple, we will allow the possibility that a gene mutates to itself; that is, we will not require that when mutated, the gene always changes.
5. Let us define a population as a list of individuals.
6. (updated 10/21) Write a function, make-population, that creates a random population. Your function should consume a genetic-sequence representing the target or environment, and a number representing the size of the population to create.
7. Create the next generation. From one generation to the next, we want to reflect the respective fitness of individuals in the population.  An individual's fitness will influence how many offspring it yields.  We will probabilistically fill the population representing the next generation according to the individuals' fitness in the current generation.  An individual that is very fit is very likely to have multiple offspring in the new population; an un-fit individual is unlikely to have offspring and will probably die off.  Suppose we have ten individuals in our population with fitnesses as follows: 15, 11, 8, 8, 7, 4, 3, 2, 2, and 0.  In this case, we have a total fitness of 60.  Individuals in the next generation have a 1/4th chance of being a descendant of the first individual in the current population (the one with fitness 15). Similarly, they have a 1/30th chance of being descended from the 8th and 9th individuals. Write the function, next-gen, that consumes a population and a target genetic-sequence and creates a new population of the same size by probabilistically selecting existing individuals according to their fitness as described above, and then generating an offspring for that individual.  As already suggested, an individual with high fitness will probably be selected multiple times to generate offspring.

#### Submission instructions

Use the standard HW template at the top of your submitted project. Remember that you must acknowledge help that you receive and help that you provide. Follow the recipe and good coding style conventions. Upload your project to the Eureka site by the due date and time.