PostgreSQL: Analyze it first!
I recently wrote an article about comparing Hash and B-tree indexes. Unfortunately, I made a mistake, and now it’s time to make it right.
In this article, I will show you the PL/pgSQL script with a significant issue that is hard to catch. Then, I will show you my debugging story, including some of the PostgreSQL internals. And last but not least, the learned lesson and advice for you to avoid the same mistakes.
Let’s look at the wrong procedure.
CREATE OR REPLACE FUNCTION test(length integer, count numeric)
RETURNS TABLE (
sample_length integer,
unique_ratio decimal, -- in percentage
hash_index_size bigint, -- in kilobytes
btree_index_size bigint, -- in kilobytes
column_size bigint, -- in kilobytes
hash_select_query decimal, -- in milliseconds
btree_select_query decimal, -- in milliseconds
hash_insert_query decimal, -- in milliseconds
btree_insert_query decimal -- in milliseconds
) AS
$$
DECLARE
strings varchar[];
BEGIN
CREATE TABLE IF NOT EXISTS hash_table(example varchar);
CREATE TABLE IF NOT EXISTS btree_table(example varchar);
INSERT INTO hash_table (SELECT random_string(length) FROM generate_series(1, count));
INSERT INTO btree_table (SELECT example FROM hash_table);
CREATE INDEX IF NOT EXISTS hash_index ON hash_table USING hash(example);
CREATE INDEX IF NOT EXISTS btree_index ON btree_table USING btree(example);
ANALYSE hash_table;
ANALYSE btree_table;
strings := array_agg(random_string(length)) FROM generate_series(1, 100);
RETURN QUERY
SELECT (SELECT length(example) FROM hash_table LIMIT 1),
round(count(DISTINCT example)::decimal / count(*) * 100, 2) AS unique_ratio,
pg_relation_size('hash_index') / 1024 AS hash_index_size,
pg_relation_size('btree_index') / 1024 AS btree_index_size,
pg_table_size('hash_table') / 1024 AS column_size,
benchmark('SELECT example FROM hash_table WHERE example = $1', strings) AS hash_select_query,
benchmark('SELECT example FROM btree_table WHERE example = $1', strings) AS btree_select_query,
benchmark('INSERT INTO hash_table VALUES($1)', strings) AS hash_insert_query,
benchmark('INSERT INTO btree_table VALUES($1)', strings) AS btree_insert_query
FROM hash_table;
DROP TABLE IF EXISTS hash_table;
DROP TABLE IF EXISTS btree_table;
END
$$ LANGUAGE plpgsql;
It seemed fine back then, so I used it to generate plenty of charts. Below, I’m presenting one of them just for a better context.
Do you see anything suspicious? Neither did I. Before I say what’s wrong here, let’s recall together the definition of the Hash index.
Hash indexes store a 32-bit hash code derived from the indexed column’s value.
Are you following now where I’m going?
If the hash index stores only hash indexes of indexed values, why would index size differ for strings with different lengths? Why does this happening?
We will start our research by running only the part below from our initial script and check some statistics using pageinspect
extension.
CREATE EXTENSION pageinspect;
CREATE TABLE IF NOT EXISTS hash_table(example varchar);
-- For the research, we will use 10.000 strings with a length of 1024 characters.
INSERT INTO hash_table (
SELECT random_string(1024) FROM generate_series(1, 10000)
);
CREATE INDEX IF NOT EXISTS hash_index ON hash_table USING hash(example);
SELECT * FROM hash_metapage_info(get_raw_page('hash_index', 0));
From the list, two things are interesting in our case: ffactor
and maxbucket
.
ffactor
determines when an index should allocate more space for saved data.maxbucket
shows the current number of allocated buckets (a list of sorted hash codes with pointers to rows in the table).
Putting it simply and skipping some of the optimizations, as this deserves another article, PostgreSQL allocates a new space when the number of indexed rows is bigger than ffactor
multiplied by maxbucket + 1
(source link).
Interestingly, in our case, the created index has a default ffactor
equal to 307 (75% from maximum 409) with maxbucket
equal to 639. But why so many buckets if we only index 10.000 rows?
Now, using the pageinspect
extension, let’s see how many rows (tuples) are in every bucket.
SELECT (hash_page_stats(get_raw_page('hash_index', generate_series))).*
FROM generate_series(1, 10);
What a surprise! Our pages are about 90% free. It is super inefficient; what could cause this? To answer that question, let’s look at the index build function.
Here is an interesting piece of code.
ffactor = HashGetTargetPageUsage(rel) / item_width;
/* keep to a sane range */
if (ffactor < 10)
ffactor = 10;
It seems that fallback ten was used in our case because there were no statistics available yet.
Let’s run this code and see if we catch any differences.
CREATE TABLE hash1_table(example varchar);
CREATE TABLE hash2_table(example varchar);
INSERT INTO hash1_table (SELECT random_string(1024) FROM generate_series(1, 10000));
INSERT INTO hash2_table (SELECT example FROM hash_table1);
SELECT relname, n_tup_ins, n_live_tup, n_ins_since_vacuum
FROM pg_stat_user_tables WHERE relname IN ('hash1_table', 'hash2_table');
There are no statistics available on the second table.
Unfortunately, both tables would have an inefficient number of buckets in hash indexes anyway.
SELECT * FROM hash_metapage_info(get_raw_page('hash1_table_idx', 0))
UNION ALL
SELECT * FROM hash_metapage_info(get_raw_page('hash2_table_idx', 0));
Nevertheless, let’s see how adding analyze
after filling the table would affect the index.
CREATE TABLE hash_table(example varchar);
INSERT INTO hash_table (
SELECT random_string(1024) FROM generate_series(1, 10000)
);
ANALYZE hash_table;
CREATE INDEX hash_table_idx ON hash_table USING hash(example);
SELECT * FROM hash_metapage_info(get_raw_page('hash_table_idx', 0));
That’s a completely different story. The number of allocated buckets is less, meaning the whole index size is smaller.
Back to the beginning, let’s fix the procedure to compare Hash and B-tree indexes by adding analyze
after filling the tables. This will be the correct chart representing results for 10.000 rows with different given lengths.
The results now make more sense! And they are much more inspiring because now the Hash index takes 30x times less memory than the B-tree index (for 10.000 strings with 1024 characters lengths).
analyze
is a powerful instrument that helps PostgreSQL do its work efficiently. So it’s better to call it frequently than rarely, especially when you’re doing benchmarks. Furthermore, I encourage you to check your indexes and maybe reindex them (after analyzing, of course)!
Based on the above research, I think I will make a second attempt at the “Hash vs. B-tree index” article.
Follow me to be notified of my upcoming articles.