Project Guidelines, Part 2
For this assignment, everyone needs their PostgreSQL account on vmwardrobe.
Your user accounts and database names will be the same as your Westmont email address.
Your password is announced in class and you should change it as soon as possible.
- Please attach a copy of your (possibly revised) project proposal from Part 1,
including the (i) motivation, (ii) E/R diagram, and (iii) conversions to relations.
If you have modified your design because of my feedback (or any other reason),
please submit the modified design instead together with explanation.
This should be included as a separate file in your submission.
- Write an SQL database schema for your project, using the CREATE TABLE commands.
Pick suitable datatypes for each attribute.
Section 2.3 of your text provides examples and additional information.
Specifics can be easily found in the
Submit a file, createschema.sql, that contains
the SQL commands you use to create your database schema
(As with your proposal and E/R diagram,
you should keep and maintain this file for the balance of the project
as you will submit it each time).
- Write and include another file, destroyschema.sql,
that will drop all of the relations that you created in createschema.sql.
- In yet another file, runqueries.sql,
use the INSERT command to insert a few tuples into each of your relations.
The data in these tuples can be entirely fictional
but should reflect the intended use of your database.
Develop a substantial amount of data for your database and load
it into the appropriate relations using the psql copy command.
(Try ‘\h copy’ in psql for information on how to copy data from files to tables.)
Place the copy command(s) inside your runqueries.sql file
after the explicit inserts from the previous step.
To create the actual data that is copied,
write, and include in your submission,
a program in your favorite programming language
that creates large files of tuples or records in a format acceptable to the Postgres copy command.
(Make sure that your code is adequately documented!)
After generating and saving this data to a file referenced by your copy command(s),
load the resulting file into your Project relations.
If you are using real data for your project,
your program will need to transform the data into files of records conforming to your project schema.
Otherwise, you will write a program to fabricate data;
your program will generate either random or nonrandom (e.g., sequential) records
that conform to your schema.
Note that it is both fine and expected for your data values -- strings especially -- to be meaningless gibberish.
The point of generating large amounts of data is so that you can experiment with
a database of reasonable size, rather than a ‘toy’ database.
The data you generate and load should be on the order of:
Some of your relations probably will be relatively small
(e.g., schools within a university).
However, your project will have several relations
with sizes on the order of those just mentioned.
When writing a program to fabricate data, there are two important points to keep in mind:
(a) Although you have not (yet) declared keys in your database schema,
you should already know what they are.
(Note, you did declare keys in your E/R diagrams and relation schemas.)
Be sure not to generate duplicate values for these attributes when designing your data-generation program.
(b) Your project certainly includes relations that are
expected to join with each other.
For example, you may have a Student relation with attribute courseNumber
that is expected to join with attribute number in relation Course.
In generating data, be sure to generate values that actually do
join -- otherwise all of your interesting queries will have empty results!
One way to guarantee joinability is to generate the values in one
relation, then use the generated values in one relation to select
joining values for the other relation.
For example, you could generate course
numbers first (either sequentially or randomly), then use these
numbers to fill in the courseNumber values in the Student relation.
- At least two relations with a few thousand tuples each,
- At least one additional relation with hundreds of tuples,
- The rest having 10 to 20 tuples.
- Include a README manifest file with information about each of the files you're submitting.
Refer as needed to the information under the heading,
Some Mechanics for Project Assignments given in Project 1.
your tarred or zipped submission should create a folder consisting of your Westmont username
(and optionally ‘p2’ for this deliverable).
Within that folder,
place your E/R diagram (reflecting any changes you have made);
the files createscheama.sql, destroyschema.sql and runqueries.sql;
any program and data files that get loaded by your copy commands;
and a README file that briefly describes each of the included files.
I expect to be able to unpack your file, visit your directory, look at your README file,
and to create and populate your database from within psql
using “\i createschema.sql” and “\i runqueries.sql”.
I should also be able to remove your database schema with “\i destroyschema.sql”.