This week we shall write and run some SQL queries.
It is suggested that you begin to follow the routine of loading your
database from your load file, running queries or other commands from
a file that contains those SQL statements, and then deleting your data
so it doesn't clutter up the database all week and (worse) you don't
forget and load the same tuples several times into your relations.
Remember that SQL thinks of relations as bags, and so will happily let
you insert the same tuple as many times as you ask it to.
To clean out a relation R without deleting its schema, use
DELETE FROM R;.
You should submit a zip or tar file that expands to create a
identified by your Postgres user name followed by "p5". Please
name the tar/zip file in the same way (followed by the appropriate file
extension). Your directory should contain files
named "a.sql" through "d.sql" containing queries as described
below. You should also include files ('initsmall.sql' and
'initlarge.sql') that will restore your database. That is,
provide data files and loading scripts that create tables and populates
those tables. I should be able to create a new database and then
run your initialization files and then run your a through d sql files.
Write five queries on your PDA
using the select-from-where construct of SQL.
To receive full credit, all but perhaps one of your queries should
some interesting feature of SQL: queries over more than one relation,
or subqueries, for example.
We suggest that you experiment with your SQL
commands on a small database (e.g., your hand-created database),
before running them on the large database that you loaded in PDA part
Initial debugging is much easier when you're operating on small
amounts of data. Once you're confident that your queries are
working, run them on your complete database.
If you discover that most or all of your ``interesting'' queries
return an empty answer on your large database, check whether you
followed the instructions in Assignment #3 for generating data
values that join properly. You may need to modify your data
Turn in a file, 'a.sql', that
contains your SQL queries. I should be able to load and run your
file, a.sql, by using the \i command in psql. Please use the
'LIMIT' modifier to your select statements so that no more than 20
tuples are returned for any given query.
Write five data modification commands
on your PDA
Most of these commands
should be ``interesting,'' in the sense that they involve some complex
feature, such as inserting the result of a query, updating several
at once, or deleting a set of tuples that is more than one but less
all the tuples in a relation. As with the queries in (1), you might
want to try out your commands on small data before trying it on your
database. Create a file, b.sql,
that contains these modifications.
Create two views on top of your
database schema. Your c.sql file
should contain your CREATE VIEW statements.
Also, provide a query involving each view (again, truncate
the response using LIMIT if there are more than a few tuples produced).
In part (1) you probably discovered
that some queries run very
slowly over your large database. An important
technique for improving the performance of queries is to create
indexes. An index on an attribute A of relation R
database to find quickly all tuples in R with a given value for
This index is useful if a value of A is specified by your query
(in the where-clause).
It may also be useful if A is involved in a join that equates
to some other attribute.
For example, in the query
SELECT Dealers.address FROM Drivers, Dealers WHERE Drivers.name = 'joe' AND Drivers.frequents = Dealers.name;
we might use an index on Drivers.name
to help us find the
tuple for driver Joe quickly.
We might also like an index on Dealers.name, so we can take
the dealers Joe shops at and quickly find the tuples for those dealers
read their addresses.
In Postgres, you can get an index by the
CREATE INDEX <IndexName> ON <RelName>(<Attribute List>);
Postgres (and most commercial DBMS) creates indexes automatically
when you declare an
attribute(s) PRIMARY KEY or UNIQUE.
Thus, some indexes may exist before you create them and may also be
making certain queries run faster than than they would with no indexes
Be sure to take this factor into account when trying to explain
differences in running times. If you are unable to discover queries
whose running-time is
by the indexes that you declare, you could try temporarily removing
key declarations, to prevent the automatic creation of indexes.
If the attribute list contains more than
one attribute, then the index
requires values for all the listed attributes to find a tuple.
That situation might be helpful if the attributes together form a key,
An illustration of the CREATE INDEX command is
CREATE INDEX DriverInd ON Drinvers(name); CREATE INDEX DealerInd ON Dealers(name);
which creates the two indexes mentioned
To get rid of an index, you can say DROP INDEX followed by
name of the index.
Notice that each index must have a name, even though we only refer to
the name if we want to drop the index.
Create at least two useful indexes for
your PDA. Run your
from part (1) on your large database with the indexes and without the
To time your commands, you may preface your query with EXPLAIN ANALYZE,
EXPLAIN ANALYZE <query>;
WARNING: "EXPLAIN ANALYZE" will actually run your query in order
to give you the timing information. That is fine for a generic
SELECT-FROM-WHERE query, but could upset things if your query involves
DELETE, INSERT, or UPDATE. If that is the case, you may want to
use the following approach:
BEGIN; EXPLAIN ANALYZE <query>; ROLLBACK;
Naturally these times may be affected by
such as system load, etc. Still, you should see a dramatic difference
between the execution times with indexes and the times without. Turn
in d.sql containing your index creation and deletion statements
with timed queries placed appropriately so that I can run your file on
your databasea and see the resulting execution times with and without
Note: Often, students discover that
indexes appear to slow
execution of queries.
There are two issues you should consider:
A query involving several relations will not speed up unless
support all of the selections and joins are available.
A single index may only increase the number of disk I/O's needed
(to get index blocks), without
affecting the query as a whole.
The second time you run a query may take much less time than
because the second time data is cached in main memory.
Data may be cached in the main memory of the database server.
If many students are using Postgres at the same time, the
cache will probably drop your data if you wait a few seconds.
We suggest that you perform only one timing experiment in
a session; at least exit from psql and start it again.
If you are getting strange results, you may have to wait several
of the material for the project descriptions and assignments has been
taken from Professor Ullman's website.