CS125 Project, Part 4

Due Friday, October 29, 2004

On-Line Submission of Part 4

For this assignment, everyone needs their PostgreSQL account. Your account name will be the same as your Westmont student email account (typically your first initial and first seven letters of last name, but not always).  Your password will be the final six (6) digits of your student ID number.  I will inform you when the accounts have been activated.

You should submit your project assignments from this one onward as email.  When multiple files are involved in an assignment, please bundle them with tar or zip. Please include a README file explaining what is in the directory and what role the files play in answering the various parts of the assignment.  Please put your name in the README file.

You should include a script for creating the tables and for populating them with data.  Include data files if your script uses the copy command.  For this and every subsequent deliverable, you need to provide everything needed to create and populate your database from scratch.  That is, I will create a fresh database and then load your scripts.  If I encounter errors, it will negatively impact your grade.  You should include a README file with general information and any instructions specific to your database.

Project 4 Deliverable

(a)
Write an SQL database schema for your project, using the CREATE TABLE commands.  Pick suitable datatypes for each attribute. Page 292-293 of the text gives you the principal options regarding types, but PostgreSQL specifics can be easily found in the documentation. Hand in one (1) file with the commands you use to create your database schema (it is a good idea to keep this file for the balance of the course). Show the response of psql to a request to describe each of your relation schemas. For example, to see the schema for relation Foo type
     \d Foo
(b)
Execute five INSERT commands to insert tuples into one of your relations. Show the response of psql and the relation that results when you issue a SELECT * command. 

(c)
Develop a substantial amount of data for your database and load it into your relations using the postgres copy command. (Try '\h copy' in psql for information on how to copy data from files to tables.) To create the data, write a program in any programming language you like that creates large files of records in a format acceptable to the Postgres copy command, then load the data into your Project relations. If you are using real data for your project, your program will need to transform the data into files of records conforming to your project schema. The rest of you will write a program to fabricate data: your program will generate either random or nonrandom (e.g., sequential) records conforming to your schema. Note that it is both fine and expected for your data values--strings especially--to be meaningless gibberish. The point of generating large amounts of data is so that you can experiment with a database of realistic size, rather than a "toy" database. The data you generate and load should be on the order of:

If the semantics of your application includes relations that are expected to be relatively small (e.g., schools within a university), it is fine to use some small relations, but please ensure that you have relations of the sizes prescribed above as well. When writing a program to fabricate data, there are two important points to keep in mind:

(1)
Although you have not (yet) declared keys in your relations, in many cases you probably know that an attribute or set of attributes in a relation will serve as a key. If so, be sure not to generate duplicate values for these attributes.

(2)
Your project almost certainly includes relations that are expected to join with each other. For example, you may have a Student relation with attribute courseNo that's expected to join with attribute number in relation Course. In generating data, be sure to generate values that actually do join--otherwise all of your interesting queries will have empty results! One way to guarantee joinability is to generate the values in one relation, then use the generated values in one relation to select joining values for the other relation. For example, you could generate course numbers first (either sequentially or randomly), then use these numbers to fill in the courseNo values in the student relation.

Turn in your program code for generating or transforming data, a small sample of the records generated for each relation (5 or so records per relation), and a script showing the loading of your data into Postgres. 

Acknowledgements:  Most of the material for the project descriptions and assignments has been taken from Professor Ullman's website.