Project Guidelines, Part 3

For this assignment, everyone needs to be using their PostgreSQL account. You should already be familiar with accessing your database at this point.

As always, when multiple files are involved in an assignment, please bundle them with tar or zip. Also, include your scripts from Project 2 for schema creation, population and destruction, together with any data files that are necessary for your runqueries.sql script. Note, you should include any source program that you wrote to generate artificial data but the resulting data should be stored in a separate file. That is, the population of your database should not require running your data generation program. You should also include (possibly revised) versions of your proposal, E/R diagram, and conversion to relations with FD's. Together with the elements above and your {a-d}.sql files described below, be sure to include a README file listing the contents of your directory and what role the respective files play in answering the various parts of the assignment. The README should include mention and explanations for any changes that you have made to prior project deliverables. Please put your name at the top of the README file.


  1. (4pt) Come up with five questions that you would like to have answered by your database. Write five queries for your database, using the select-from-where construct of SQL that answer those questions based on the data in your database. Before each query, write a comment describing what the query accomplishes. To receive full credit, several of your queries should exhibit some interesting feature of SQL; for example, queries over more than one relation, or subqueries. Your queries should make sense and return meaningful results when your database contains data only from populatesmall.sql. Initial debugging is much easier when you're operating on small amounts of meaningful data. Once you're confident that your queries are working, run them on your complete database. If you discover that most or all of your ‘interesting’ queries return an empty answer on your large database, check whether you followed the instructions in the last deliverable for generating data values that join properly. You may need to modify your data generator accordingly. If your query returns more than a few dozen tuples, you must truncate the results using the LIMIT clause at the end of your query. Submit a file, a.sql, that contains your SQL queries. I should be able to load and run your file, a.sql, by using the \i command in psql.
  2. (4pt) Write five data modification commands on your database. Most of these commands should be ‘interesting’, in the sense that they involve some complex feature, such as inserting the result of a query, updating several tuples at once, or deleting a set of tuples that is more than one but less than all the tuples in a relation. As with the queries in (1), you might want to try out your commands on small data before trying it on your full database. Also, before each modification, write a comment describing what is accomplished. Create a file, b.sql, that contains these SQL modification commands.
  3. (4pt) Create two views on top of your database schema. Again, these views should be meaningful with respect to the purpose for which you are building your database. Your file, c.sql, should contain your CREATE VIEW statements. Also, include a query involving each view (again, truncate the response using LIMIT if there are more than a few tuples produced). As before, document the purpose of the query. Finally, update your dropschema.sql script so that tables involved in your views have the CASCADE option.
  4. (6pt) In part (1) you may have discovered that some queries run very slowly over your large database. An important technique for improving the performance of queries is to create indexes. An index on an attribute A of relation R allows the database to quickly find all tuples in R with a given value for attribute A. This index is useful if a value of A is specified by your query (in the where-clause). It may also be useful if A is involved in a join that compares it to some other attribute. For example, in the query
          SELECT Dealers.address
          FROM Drivers, Dealers
          WHERE = 'joe'
          AND Drivers.frequents =;
    we might use an index on to help us more quickly find the tuple for driver Joe.

    In Postgres, you can get an index by the command:

            CREATE INDEX <IndexName>
            ON <RelName>(<Attribute List>);
    • Note: Postgres creates indexes automatically when you declare an attribute(s) PRIMARY KEY or UNIQUE. This is why I asked you not to declare keys in Project 2. Thus, if you did not follow those instructions, you may need to remove those before you will observe the expected speed-up from indexes.

    If the attribute list contains more than one attribute, then the index requires values for all the listed attributes to find a tuple. That situation might be helpful if the attributes together form a key. An illustration of the CREATE INDEX command for a single attribute is

            CREATE INDEX DriverInd ON Drivers(name);
            CREATE INDEX DealerInd ON Dealers(name);
    which creates the two indexes mentioned above. To get rid of an index, you can say DROP INDEX <IndexName>;. Notice that each index must have a name, even though we only refer to the name when we want to drop the index.

    Create at least two useful indexes for your PDA (other than indexes that postgresql would create automatically for you because of key or unique declarations). You should consider the types of queries that you expect will be run and in that light where an index could be beneficial. Run your queries from part (1) on your large database with and without the indexes. To time your commands, you may preface your query with EXPLAIN ANALYZE, such as:

            EXPLAIN ANALYZE <query>;
    WARNING: "EXPLAIN ANALYZE" will actually run your query in order to give you the timing information. That is fine for a generic SELECT-FROM-WHERE query, but could upset things if your query involves DELETE, INSERT, or UPDATE. If that is the case, you may want to use the following approach:
            BEGIN; EXPLAIN ANALYZE <query>;

    Naturally these times may be affected by external factors such as system load, etc. Still, you should notice a difference between the execution times with indexes and the times without. Turn in d.sql containing your index creation and deletion statements with timed queries placed appropriately so that I can run your file on your database and see the resulting execution times with and without indexes. Include comments in your file discussing what you did to achieve significant difference, the differences you observed, and your explanation for those differences.

    Note: Sometimes, students discover that indexes appear to slow down the execution of queries. There are two issues you should consider:

    1. A query involving several relations will not speed up unless indexes to support all of the selections and joins are available. A single index may only increase the number of disk I/O's needed (to get index blocks), without affecting the query as a whole.
    2. The second time you run a query may take much less time than the first, because the second time data is cached in main memory. Data may be cached in the main memory of the database server. If many students are using Postgres at the same time, the machine's cache will probably drop your data if you wait a few seconds. We suggest that you perform only one timing experiment in a session; at least exit from psql and start it again.
  5. (2pt) Include your README with appropriate content.


Refer as needed to the information under the heading, Some Mechanics for Project Assignments given in Project 1. When extracted, your tarred or zipped submission should create a folder having the name of your Westmont username (and optionally ‘p3’ for this deliverable). Within that folder, in addition to everything submitted for previous project deliverables, place your {a-d}.sql files; do not forget to include a README file that briefly describes each of the included files. I expect to be able to unpack your file, visit your directory, look at your README file, and to create and populate your database from within psql as in project 2, and then run your query files as “\i a.sql” etc.