CS125 Project, Part 3

Due Friday, October 20, 2006

For this assignment, everyone needs their PostgreSQL account. Your account name will be your Westmont email address.  Your password was announced in class and is identified on Eureka. 

Project 3 Deliverable

  1. Please attach a copy of your E/R schema from the Part 1 of your project. If you have modified your design because of my feedback (or any other reason), please hand in the modified design instead; the new design will not be graded but will be compared with your relational design.
  2. Write an SQL database schema for your project, using the CREATE TABLE commands.  Pick suitable datatypes for each attribute. Page 292-293 of the text gives you the principal options regarding types, but PostgreSQL specifics can be easily found in the documentation. In your submission, include a file, "createschema.sql", that contains the SQL commands you use to create your database schema (it is a good idea to keep this file for the balance of the course).
  3. Write another file, "destroyschema.sql", that will drop all of the relations that you created in "createschema.sql".
  4. In yet another file, "runqueries.sql", use the INSERT command to insert a few tuples into each of your relations. 
  5. Develop a substantial amount of data for your database and load it into the appropriate relations using the postgres copy command. (Try '\h copy' in psql for information on how to copy data from files to tables.)  Place the "copy" command inside your "runqueries.sql" file after the inserts.  To create the actual data, write a program in any programming language you like that creates large files of records in a format acceptable to the Postgres copy command, then load the data into your Project relations. If you are using real data for your project, your program will need to transform the data into files of records conforming to your project schema. The rest of you will write a program to fabricate data: your program will generate either random or nonrandom (e.g., sequential) records conforming to your schema. Note that it is both fine and expected for your data values--strings especially--to be meaningless gibberish. The point of generating large amounts of data is so that you can experiment with a database of realistic size, rather than a "toy" database. The data you generate and load should be on the order of:

If the semantics of your application includes relations that are expected to be relatively small (e.g., schools within a university), it is fine to use some small relations, but please ensure that you have relations of the sizes prescribed above as well. When writing a program to fabricate data, there are two important points to keep in mind:

(a)  Although you have not (yet) declared keys in your relations, in many cases you probably know that an attribute or set of attributes in a relation will serve as a key. If so, be sure not to generate duplicate values for these attributes.

(b)  Your project almost certainly includes relations that are expected to join with each other. For example, you may have a Student relation with attribute courseNo that's expected to join with attribute number in relation Course. In generating data, be sure to generate values that actually do join--otherwise all of your interesting queries will have empty results! One way to guarantee joinability is to generate the values in one relation, then use the generated values in one relation to select joining values for the other relation. For example, you could generate course numbers first (either sequentially or randomly), then use these numbers to fill in the courseNo values in the student relation.

6.  Include a README file with information about each of the files you're submitting.

Don't forget to save a copy of your work for reference as you do Part 4 of the Database Project.

Submission
Hopefully, you are working on your project in a folder with sub-directories for each deliverable.  Regardless, create a folder consisting of your name and "p3" for this deliverable.  In this folder, place your E/R diagram (reflecting any changes you have made), the file for creating your database schema, any files needed to generating random data, and a README file that identifies the purpose of each file.  Zip or tar this folder and its contents so that when I unpack it, it will create a folder with your emailname and "p3".  I expect to be able to unpack your file, visit your directory, look at your README file, load your database schema creation file with "\i createschema.sql", run your queries with "\i runqueries.sql", run any other ad hoc queries, and then remove you schema with "\i destroyschema.sql". 

Acknowledgements:  Most of the material for the project descriptions and assignments has been taken from Professor Ullman's website.