Project Guidelines, Part 2

For this assignment, everyone needs their PostgreSQL account on vmwardrobe. Your user accounts and database names will be the same as your Westmont email address.

  1. (2pt) Please include a copy of your (possibly revised) project proposal from Part 1, including the (i) motivation, (ii) E/R diagram, and (iii) conversions to relations. If you have modified your design because of my feedback (or any other reason), please submit the modified design together with explanation. This should be included as a separate file or files in your submission.
  2. (4pt) Write an SQL database schema for your project, using the CREATE TABLE commands. Pick suitable datatypes for each attribute. Section 2.3 of your text provides examples and additional information. Specifics can be easily found in the PostgreSQL documentation. Submit a file, createschema.sql, that contains the SQL commands you use to create your database schema (As with your proposal and E/R diagram, you should keep and maintain this file for the balance of the project as you will submit it each time). From the psql client, you can load your createschema.sql (as well as other .sql files) using the ‘\i filename’ command for a file containing sql commands. Although you declared keys in your E/R diagrams, do not declare keys in your database schema.
  3. (2pt) Write and include another file, dropschema.sql, that will drop all of the relations that you created in createschema.sql.
  4. (5pt) In yet another file, populatesmall.sql, use the INSERT command to insert on the order of ten tuples into each of your relations. The data in these tuples can be entirely fictional but should look realistic and reflect the intended use of your database.
  5. (6pt) Develop a substantial amount of data for your database and load it into the appropriate relations using the psql copy command. (Try ‘\h copy’ in psql for information on how to copy data from files to tables.) Place the copy command(s) inside a file, populatelarge.sql (which you can in turn load using the ‘\i populatelarge.sql’ command).
    If you are using real data for your project, your program will need to transform the data into files of records conforming to your project schema. Otherwise, you will write a program using an appropriate language to fabricate data (make sure that your code is adequately documented!); your program will generate either random or nonrandom (e.g., sequential) records that conform to your schema. Note that it is both fine and expected for your data values -- strings especially -- to be meaningless gibberish. The point of generating large amounts of data is so that you can experiment with a database of reasonable size, rather than a ‘toy’ database. After generating and saving this data to files referenced by your copy command(s), load the resulting file into your relations. The data you generate and load should be on the order of:
    • At least two relations with a tens of thousands of tuples each,
    • At least one additional relation with thousands of tuples,
    • The rest having hundreds of tuples.
    Some of your relations probably will be relatively small (e.g., schools within a university). However, your project will have several relations with sizes on the order of those just mentioned.
    When writing a program to fabricate data, there are two important points to keep in mind: (a) Although you have not (yet) declared keys in your database schema, you should already know what they are. (Note, you did declare keys in your E/R diagrams and relation schemas.) Be sure not to generate duplicate values for these attributes when designing your data-generation program. (b) Your project certainly includes relations that are expected to join with each other. For example, you may have a Student relation with attribute major that is expected to join with attribute deptID in relation Department. When generating data, be sure to generate values that actually do join -- otherwise all of your interesting queries will have empty results! One way to guarantee joinability is to generate the values for one relation, and then use the generated values to provide joining values for another relation. For example, you could first generate department names and IDs for the Department relation and then randomly choose from these generated values to fill in the major values in the Student relation.
  6. (1pt) Include a README manifest file with information about each of the files you're submitting.


Refer as needed to the information under the heading, Some Mechanics for Project Assignments givin in Project 1. When extracted, your tarred or zipped submission should create a folder having the name of your Westmont username (and optionally ‘p2’ for this deliverable). Within that folder, place your E/R diagram (reflecting any changes you have made); the files createscheama.sql, dropschema.sql, populatesmall.sql, and populatelarge.sql, and a README file that briefly describes each of the included files; any program you have written should be found within a sub-folder, src; data files that get loaded by your copy commands should be in the top-level folder with your sql scripts. I expect to be able to unpack your file, visit your directory, look at your README file, and to create and populate your database from within psql using “\i createschema.sql” and “\i populatesmall.sql”. Similarly, I should be able to remove your database schema with “\i dropschema.sql”. I should also be able to generate new data using your program in the src folder; make certain the README and your code contains sufficient documentation for generating new data.