CS125 Project, Part 4

Due Monday, November 6, 2006

(updated 10/30/2006)

On-Line Submission of Part 4

For this assignment, everyone needs their PostgreSQL account.  You should already be familiar with accessing your database at this point.

You should submit your project assignments from this one onward via Eureka.  When multiple files are involved in an assignment, please bundle them with tar or zip. Please include a README file explaining what is in the directory and what role the files play in answering the various parts of the assignment.  Please put your name in the README file.

If you made any changes to your database schema, data generator, script, etc., you should include new versions in your deliverable and appropriate explanations in your README file. 

Project 4 Deliverable

Write five queries for your database, using the select-from-where construct of SQL. To receive full credit, all but perhaps one of your queries should exhibit some interesting feature of SQL: queries over more than one relation, or subqueries, for example. We suggest that you experiment with your SQL commands on a small database (e.g., your hand-created database), before running them on the large database that you loaded in the last deliverable. Initial debugging is much easier when you're operating on small amounts of data. Once you're confident that your queries are working, run them on your complete database. If you discover that most or all of your ``interesting'' queries return an empty answer on your large database, check whether you followed the instructions in the last deliverable for generating data values that join properly. You may need to modify your data generator accordingly. Turn in a file, 'a.sql', that contains your SQL queries.  I should be able to load and run your file, a.sql, by using the \i command in psql.  Please use the 'LIMIT' modifier to your select statements so that no more than 20 tuples are returned for any given query.
Write five data modification commands on your PDA database. Most of these commands should be "interesting", in the sense that they involve some complex feature, such as inserting the result of a query, updating several tuples at once, or deleting a set of tuples that is more than one but less than all the tuples in a relation. As with the queries in (1), you might want to try out your commands on small data before trying it on your full database. Create a file, b.sql, that contains these modifications.

Create two views on top of your database schema. Your c.sql file should contain your CREATE VIEW statements.  Also, provide a query involving each view (again, truncate the response using LIMIT if there are more than a few tuples produced).
In part (a) you probably discovered that some queries run very slowly over your large database.  An important technique for improving the performance of queries is to create indexes. An index on an attribute A of relation R allows the database to find quickly all tuples in R with a given value for attribute A. This index is useful if a value of A is specified by your query (in the where-clause). It may also be useful if A is involved in a join that equates it to some other attribute. For example, in the query
     SELECT Dealers.address
FROM Drivers, Dealers
WHERE Drivers.name = 'joe'
AND Drivers.frequents = Dealers.name;
we might use an index on Drivers.name to help us find the tuple for driver Joe quickly. We might also like an index on Dealers.name, so we can take all the dealers Joe shops at and quickly find the tuples for those dealers to read their addresses.

In Postgres, you can get an index by the command:

     CREATE INDEX <IndexName> ON <RelName>(<Attribute List>);


If the attribute list contains more than one attribute, then the index requires values for all the listed attributes to find a tuple. That situation might be helpful if the attributes together form a key, for example. An illustration of the CREATE INDEX command is

     CREATE INDEX DriverInd ON Drivers(name);
CREATE INDEX DealerInd ON Dealers(name);
which creates the two indexes mentioned above. To get rid of an index, you can say DROP INDEX followed by the name of the index. Notice that each index must have a name, even though we only refer to the name if we want to drop the index.

Create at least two useful indexes for your PDA. Run your queries from part (1) on your large database with the indexes and without the indexes. To time your commands, you may preface your query with EXPLAIN ANALYZE, such as:

        EXPLAIN ANALYZE <query>;

WARNING: "EXPLAIN ANALYZE" will actually run your query in order to give you the timing information.  That is fine for a generic SELECT-FROM-WHERE query, but could upset things if your query involves DELETE, INSERT, or UPDATE.  If that is the case, you may want to use the following approach:


Naturally these times may be affected by external factors such as system load, etc. Still, you should see a dramatic difference between the execution times with indexes and the times without. Turn in d.sql containing your index creation and deletion statements with timed queries placed appropriately so that I can run your file on your database and see the resulting execution times with and without indexes.

Note: Often, students discover that indexes appear to slow down the execution of queries. There are two issues you should consider:

    1. A query involving several relations will not speed up unless indexes to support all of the selections and joins are available. A single index may only increase the number of disk I/O's needed (to get index blocks), without affecting the query as a whole.
    2. The second time you run a query may take much less time than the first, because the second time data is cached in main memory. Data may be cached in the main memory of the database server. If many students are using Postgres at the same time, the machine's cache will probably drop your data if you wait a few seconds. We suggest that you perform only one timing experiment in a session; at least exit from psql and start it again.

Acknowledgements:  Most of the material for the project descriptions and assignments has been taken from Professor Ullman's website.