Create a (usually smaller) lookup 'copy' of the table

CREATE INDEX <index_name>
ON <table> USING btree | hash
(field_name1 ASC | DESC, field_name2) NULLS FIRST | LAST
INCLUDE (<field_name3>, <field_name4>);
Advantages of Indexes
  • Regular queries can use this smaller copy. It can make reads faster
  • Cutting out unneeded columns
    • Fewer pages for the engine to read
    • Uses less memory & CPU
  • Sorts can be pre-empted speeding up the query
  • Indexes can be filtered making them even faster when used
Disadvantages of Indexes
  • Extra overhead. Data changes happen in multiple places (slower writes)
  • More storage space is needed
  • Inefficient when used on small tables
  • Inefficient on tables with a lot of NULL values

If we take our previous query and hit F7:

SELECT sh.date_id, AVG(sd.items_sold)
FROM "sequel-mart-schema"."Sales_Header" AS sh
INNER JOIN "sequel-mart-schema"."Sales_Detail" AS sd 
	ON sh.sale_id = sd.sale_id
WHERE sh.date_sale >= '2021-08-01'
GROUP BY sh.date_id
ORDER BY sh.date_id;

We'll get an Execution Plan that shows:

  • Sales_Head and Sales_Detail being scanned and joining
  • Aggregate (to cover the AVG)
  • Sort (to cover the ORDER BY)

If we add EXPLAIN ANALYZE we'll get a summary of timings and costs


Add an Index

Let's see if we can speed this query up by adding an index:

What do we choose?

It can involve trial and error but basic guidelines include columns involved in:

  • Joins
  • Ordering
  • Filtering
  • Select (more likely to be used in the INCLUDE section)

If we add an index to Sales_Detail we may want to:

  • Put it on the sale_id column as this is involved in the JOIN
  • Include the items_sold column as this is used in the SELECT
CREATE INDEX sales_detail_sale_id
  ON "sequel-mart-schema"."Sales_Detail" (sale_id)
  INCLUDE (items_sold);
Checking the Execution Plan

Let's have another look at our Execution Plan with F7:

SELECT sh.date_id, AVG(sd.items_sold)
FROM "sequel-mart-schema"."Sales_Header" AS sh
INNER JOIN "sequel-mart-schema"."Sales_Detail" AS sd 
	ON sh.sale_id = sd.sale_id
WHERE sh.date_sale >= '2021-08-01'
GROUP BY sh.date_id
ORDER BY sh.date_id;

We'll see the following differences:

  • Sales_Detail scans the much smaller index instead
  • The 'Hash Join' is now a 'Nested Loop Join'
    • Nested Loops are like an Excel 'VLOOKUP' function
    • They work best when one table or index is very small

Now if we add EXPLAIN ANALYZE we see the effect on the resources associated with Sales_Detail:

Measure Table Scan Index Scan
... ... ...
Preparation
Planning Time 0.144ms 0.346ms
... ... ...
Table / Index
Scan Cost (Total) 20.50 1.83
Scan Time (Total) 0.087ms 0.003ms
Width (Characters) 6 4
... ... ...
Hash Join / Nested Loop Join
Scan Cost (Total) 539.98 235.49
Scan Time (Total) 1.966ms 0.653ms
... ... ...
Execution
Execution Time 2.223ms 0.981ms

This index has made the query ~56% faster and uses fewer resources

Your results may differ depending on your computer

It may be worth keeping this index if:

  • The extra 'copy' of the data doesn't slow down INSERTS, UPDATES and DELETES considerably
  • The query is likely to be run a lot. It's often not worth keeping indexes for infrequently run queries

Notes:

  • These costings and times are small because the tables are small
    • They scale up and become noticable when a table contains millions or even billions of records
  • There is usually more
  • 'Planning Time' is slowed because indexes give PostgreSQL more options to consider
    • Balance whether this extra Planning Time is more than made up for by a quicker 'Execution Time'

Dropping an Index

You can use the DROP INDEX command to remove an index

DROP INDEX <schema_name>.<index_name>;

For example, to drop the index we've just created run:

DROP INDEX "sequel-mart-schema".sales_detail_sale_id;
1. Execution Plan (No Index) 2. EXPLAIN ANALYZE (No Index) 3. CREATE INDEX ON Sale_Detail 4. Execution Plan (Index Included) 5. EXPLAIN ANALYZE (Index Included) 6. Drop Index