Slides

Create a (usually smaller) lookup 'copy' of the table

CREATE INDEX <index_name>
ON <table> USING btree | hash
(field_name1 ASC | DESC, field_name2) NULLS FIRST | LAST
INCLUDE (<field_name3>, <field_name4>);

Advantages of Indexes

Regular queries can use this smaller copy. It can make reads faster
Cutting out unneeded columns
- Fewer pages for the engine to read
- Uses less memory & CPU
Sorts can be pre-empted speeding up the query
Indexes can be filtered making them even faster when used

Disadvantages of Indexes

Extra overhead. Data changes happen in multiple places (slower writes)
More storage space is needed
Inefficient when used on small tables
Inefficient on tables with a lot of NULL values

If we take our previous query and hit F7:

SELECT sh.date_id, AVG(sd.items_sold)
FROM "sequel-mart-schema"."Sales_Header" AS sh
INNER JOIN "sequel-mart-schema"."Sales_Detail" AS sd 
	ON sh.sale_id = sd.sale_id
WHERE sh.date_sale >= '2021-08-01'
GROUP BY sh.date_id
ORDER BY sh.date_id;

We'll get an Execution Plan that shows:

Sales_Head and Sales_Detail being scanned and joining
Aggregate (to cover the AVG)
Sort (to cover the ORDER BY)

If we add EXPLAIN ANALYZE we'll get a summary of timings and costs

Add an Index

Let's see if we can speed this query up by adding an index:

What do we choose?

It can involve trial and error but basic guidelines include columns involved in:

Joins
Ordering
Filtering
Select (more likely to be used in the INCLUDE section)

If we add an index to Sales_Detail we may want to:

Put it on the sale_id column as this is involved in the JOIN
Include the items_sold column as this is used in the SELECT

CREATE INDEX sales_detail_sale_id
  ON "sequel-mart-schema"."Sales_Detail" (sale_id)
  INCLUDE (items_sold);

Checking the Execution Plan

Let's have another look at our Execution Plan with F7:

SELECT sh.date_id, AVG(sd.items_sold)
FROM "sequel-mart-schema"."Sales_Header" AS sh
INNER JOIN "sequel-mart-schema"."Sales_Detail" AS sd 
	ON sh.sale_id = sd.sale_id
WHERE sh.date_sale >= '2021-08-01'
GROUP BY sh.date_id
ORDER BY sh.date_id;

We'll see the following differences:

Sales_Detail scans the much smaller index instead
The 'Hash Join' is now a 'Nested Loop Join'
- Nested Loops are like an Excel 'VLOOKUP' function
- They work best when one table or index is very small

Now if we add EXPLAIN ANALYZE we see the effect on the resources associated with Sales_Detail:

Measure	Table Scan	Index Scan
...	...	...
Preparation
Planning Time	0.144ms	0.346ms
...	...	...
Table / Index
Scan Cost (Total)	20.50	1.83
Scan Time (Total)	0.087ms	0.003ms
Width (Characters)	6	4
...	...	...
Hash Join / Nested Loop Join
Scan Cost (Total)	539.98	235.49
Scan Time (Total)	1.966ms	0.653ms
...	...	...
Execution
Execution Time	2.223ms	0.981ms

This index has made the query ~56% faster and uses fewer resources

Your results may differ depending on your computer

It may be worth keeping this index if:

The extra 'copy' of the data doesn't slow down INSERTS, UPDATES and DELETES considerably
The query is likely to be run a lot. It's often not worth keeping indexes for infrequently run queries

Notes:

These costings and times are small because the tables are small
- They scale up and become noticable when a table contains millions or even billions of records
There is usually more
'Planning Time' is slowed because indexes give PostgreSQL more options to consider
- Balance whether this extra Planning Time is more than made up for by a quicker 'Execution Time'

Dropping an Index

You can use the DROP INDEX command to remove an index

DROP INDEX <schema_name>.<index_name>;

For example, to drop the index we've just created run:

DROP INDEX "sequel-mart-schema".sales_detail_sale_id;

1. Execution Plan (No Index)

2. EXPLAIN ANALYZE (No Index)

3. CREATE INDEX ON Sale_Detail

4. Execution Plan (Index Included)

5. EXPLAIN ANALYZE (Index Included)

6. Drop Index