пятница, 4 декабря 2009 г.

Тестирование PostgreSQL 8.3 на больших таблицах: 40М записей

Продолжение статьи

Тестирование PostgreSQL 8.3 на больших таблицах: денормализация (10М записей). Параметры сервера и PostgreSQL те же.

Нормализованная таблица без справочников (40M записей)

select count(*) from tmp_facts7 where a1=2 and a2=2 and a3=2 and a4=2;
count
-------
391
(1 row)

explain analyze select * from tmp_facts7 where a1=2 and a2=2 and a3=2 and a4=2;
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on tmp_facts7 (cost=102820.61..123140.53 rows=307 width=16) (actual time=1240.778..1250.728 rows=391 loops=1)
Recheck Cond: ((a4 = 2) AND (a3 = 2) AND (a2 = 2))
Filter: (a1 = 2)
-> BitmapAnd (cost=102820.61..102820.61 rows=5832 width=0) (actual time=1238.241..1238.241 rows=0 loops=1)
-> Bitmap Index Scan on tmp_facts7_a4_idx (cost=0.00..34273.29 rows=2105271 width=0) (actual time=371.444..371.444 rows=2224826 loops=1)
Index Cond: (a4 = 2)
-> Bitmap Index Scan on tmp_facts7_a3_idx (cost=0.00..34273.29 rows=2105271 width=0) (actual time=371.881..371.881 rows=2223561 loops=1)
Index Cond: (a3 = 2)
-> Bitmap Index Scan on tmp_facts7_a2_idx (cost=0.00..34273.29 rows=2105271 width=0) (actual time=371.275..371.275 rows=2221344 loops=1)
Index Cond: (a2 = 2)
Total runtime: 1251.841 ms
(11 rows)


SELECT relname, relpages, reltuples FROM pg_class
WHERE NOT relname LIKE 'pg_%' ORDER BY relpages DESC;
relname | relpages | reltuples
-----------------------------------+----------+-------------
tmp_facts7 | 216217 | 4.00001e+07
tmp_facts7_a4_idx | 87777 | 4.00001e+07
tmp_facts7_a3_idx | 87777 | 4.00001e+07
tmp_facts7_a2_idx | 87777 | 4.00001e+07
tmp_facts7_a1_idx | 87777 | 4.00001e+07


Нормализованная таблица со справочниками (40M записей + 10,100,1k,10k записей в справочниках)

select count(*)
from tmp_facts7
join tmp_facts7_a1 on tmp_facts7.a1=tmp_facts7_a1.id
join tmp_facts7_a2 on tmp_facts7.a2=tmp_facts7_a2.id
join tmp_facts7_a3 on tmp_facts7.a3=tmp_facts7_a3.id
join tmp_facts7_a4 on tmp_facts7.a4=tmp_facts7_a4.id
where tmp_facts7_a1.name='00' and tmp_facts7_a2.name='00' and tmp_facts7_a3.name='00' and tmp_facts7_a4.name='00';
count
-------
391
(1 row)

explain analyze
select count(*)
from tmp_facts7
join tmp_facts7_a1 on tmp_facts7.a1=tmp_facts7_a1.id
join tmp_facts7_a2 on tmp_facts7.a2=tmp_facts7_a2.id
join tmp_facts7_a3 on tmp_facts7.a3=tmp_facts7_a3.id
join tmp_facts7_a4 on tmp_facts7.a4=tmp_facts7_a4.id
where tmp_facts7_a1.name='00' and tmp_facts7_a2.name='00' and tmp_facts7_a3.name='00' and tmp_facts7_a4.name='00';
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=124783.94..124783.95 rows=1 width=0) (actual time=1267.201..1267.202 rows=1 loops=1)
-> Nested Loop (cost=104399.33..124783.94 rows=1 width=0) (actual time=1217.377..1267.021 rows=391 loops=1)
Join Filter: (tmp_facts7.a1 = tmp_facts7_a1.id)
-> Nested Loop (cost=104399.33..124782.80 rows=1 width=4) (actual time=1217.140..1234.743 rows=6768 loops=1)
-> Nested Loop (cost=0.00..19.81 rows=1 width=12) (actual time=0.070..0.096 rows=1 loops=1)
-> Nested Loop (cost=0.00..11.53 rows=1 width=8) (actual time=0.056..0.078 rows=1 loops=1)
-> Seq Scan on tmp_facts7_a2 (cost=0.00..3.25 rows=1 width=4) (actual time=0.010..0.029 rows=1 loops=1)
Filter: (name = '00'::text)
-> Index Scan using tmp_facts7_a4_name_idx on tmp_facts7_a4 (cost=0.00..8.27 rows=1 width=4) (actual time=0.044..0.046 rows=1 loops=1)
Index Cond: (tmp_facts7_a4.name = '00'::text)
-> Index Scan using tmp_facts7_a3_name_idx on tmp_facts7_a3 (cost=0.00..8.27 rows=1 width=4) (actual time=0.012..0.016 rows=1 loops=1)
Index Cond: (tmp_facts7_a3.name = '00'::text)
-> Bitmap Heap Scan on tmp_facts7 (cost=104399.33..124660.93 rows=5832 width=16) (actual time=1217.066..1229.133 rows=6768 loops=1)
Recheck Cond: ((tmp_facts7.a4 = tmp_facts7_a4.id) AND (tmp_facts7.a3 = tmp_facts7_a3.id) AND (tmp_facts7.a2 = tmp_facts7_a2.id))
-> BitmapAnd (cost=104399.33..104399.33 rows=5832 width=0) (actual time=1214.610..1214.610 rows=0 loops=1)
-> Bitmap Index Scan on tmp_facts7_a4_idx (cost=0.00..34273.29 rows=2105271 width=0) (actual time=363.302..363.302 rows=2224826 loops=1)
Index Cond: (tmp_facts7.a4 = tmp_facts7_a4.id)
-> Bitmap Index Scan on tmp_facts7_a3_idx (cost=0.00..34273.29 rows=2105271 width=0) (actual time=364.123..364.123 rows=2223561 loops=1)
Index Cond: (tmp_facts7.a3 = tmp_facts7_a3.id)
-> Bitmap Index Scan on tmp_facts7_a2_idx (cost=0.00..34273.29 rows=2105271 width=0) (actual time=366.062..366.062 rows=2221344 loops=1)
Index Cond: (tmp_facts7.a2 = tmp_facts7_a2.id)
-> Seq Scan on tmp_facts7_a1 (cost=0.00..1.12 rows=1 width=4) (actual time=0.001..0.003 rows=1 loops=6768)
Filter: (tmp_facts7_a1.name = '00'::text)
Total runtime: 1268.177 ms
(24 rows)

select count(*)
from tmp_facts7
where
tmp_facts7.a1 IN (select id from tmp_facts7_a1 where name='00') and
tmp_facts7.a2 IN (select id from tmp_facts7_a2 where name='00') and
tmp_facts7.a3 IN (select id from tmp_facts7_a3 where name='00') and
tmp_facts7.a4 IN (select id from tmp_facts7_a4 where name='00');
count
-------
391
(1 row)

explain analyze
select count(*)
from tmp_facts7
where
tmp_facts7.a1 IN (select id from tmp_facts7_a1 where name='00') and
tmp_facts7.a2 IN (select id from tmp_facts7_a2 where name='00') and
tmp_facts7.a3 IN (select id from tmp_facts7_a3 where name='00') and
tmp_facts7.a4 IN (select id from tmp_facts7_a4 where name='00');
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=298421.25..298421.26 rows=1 width=0) (actual time=6530.685..6530.686 rows=1 loops=1)
-> Nested Loop IN Join (cost=34817.31..298421.24 rows=1 width=0) (actual time=493.943..6530.065 rows=391 loops=1)
Join Filter: (tmp_facts7.a2 = tmp_facts7_a2.id)
-> Hash IN Join (cost=34817.31..298417.98 rows=1 width=4) (actual time=471.609..6381.263 rows=6898 loops=1)
Hash Cond: (tmp_facts7.a1 = tmp_facts7_a1.id)
-> Hash IN Join (cost=34816.17..298416.82 rows=4 width=8) (actual time=471.453..6320.043 rows=124045 loops=1)
Hash Cond: (tmp_facts7.a3 = tmp_facts7_a3.id)
-> Nested Loop (cost=34807.89..298393.49 rows=4000 width=12) (actual time=471.244..5245.973 rows=2224826 loops=1)
-> HashAggregate (cost=8.28..8.29 rows=1 width=4) (actual time=0.066..0.068 rows=1 loops=1)
-> Index Scan using tmp_facts7_a4_name_idx on tmp_facts7_a4 (cost=0.00..8.27 rows=1 width=4) (actual time=0.061..0.062 rows=1 loops=1)
Index Cond: (name = '00'::text)
-> Bitmap Heap Scan on tmp_facts7 (cost=34799.61..272069.32 rows=2105271 width=16) (actual time=471.174..3375.398 rows=2224826 loops=1)
Recheck Cond: (tmp_facts7.a4 = tmp_facts7_a4.id)
-> Bitmap Index Scan on tmp_facts7_a4_idx (cost=0.00..34273.29 rows=2105271 width=0) (actual time=370.219..370.219 rows=2224826 loops=1)
Index Cond: (tmp_facts7.a4 = tmp_facts7_a4.id)
-> Hash (cost=8.27..8.27 rows=1 width=4) (actual time=0.050..0.050 rows=1 loops=1)
-> Index Scan using tmp_facts7_a3_name_idx on tmp_facts7_a3 (cost=0.00..8.27 rows=1 width=4) (actual time=0.047..0.049 rows=1 loops=1)
Index Cond: (name = '00'::text)
-> Hash (cost=1.12..1.12 rows=1 width=4) (actual time=0.013..0.013 rows=1 loops=1)
-> Seq Scan on tmp_facts7_a1 (cost=0.00..1.12 rows=1 width=4) (actual time=0.009..0.011 rows=1 loops=1)
Filter: (name = '00'::text)
-> Seq Scan on tmp_facts7_a2 (cost=0.00..3.25 rows=1 width=4) (actual time=0.004..0.019 rows=1 loops=6898)
Filter: (tmp_facts7_a2.name = '00'::text)
Total runtime: 6531.779 ms
(24 rows)


select count(*)
from tmp_facts7,tmp_facts7_a1,tmp_facts7_a2,tmp_facts7_a3,tmp_facts7_a4
where tmp_facts7.a1=tmp_facts7_a1.id and tmp_facts7.a2=tmp_facts7_a2.id and tmp_facts7.a3=tmp_facts7_a3.id and tmp_facts7.a4=tmp_facts7_a4.id
and tmp_facts7_a1.name='00' and tmp_facts7_a2.name='00' and tmp_facts7_a3.name='00' and tmp_facts7_a4.name='00';
count
-------
391
(1 row)


explain analyze
select count(*)
from tmp_facts7,tmp_facts7_a1,tmp_facts7_a2,tmp_facts7_a3,tmp_facts7_a4
where tmp_facts7.a1=tmp_facts7_a1.id and tmp_facts7.a2=tmp_facts7_a2.id and tmp_facts7.a3=tmp_facts7_a3.id and tmp_facts7.a4=tmp_facts7_a4.id
and tmp_facts7_a1.name='00' and tmp_facts7_a2.name='00' and tmp_facts7_a3.name='00' and tmp_facts7_a4.name='00';

QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=124783.94..124783.95 rows=1 width=0) (actual time=1261.212..1261.213 rows=1 loops=1)
-> Nested Loop (cost=104399.33..124783.94 rows=1 width=0) (actual time=1211.897..1261.036 rows=391 loops=1)
Join Filter: (tmp_facts7.a1 = tmp_facts7_a1.id)
-> Nested Loop (cost=104399.33..124782.80 rows=1 width=4) (actual time=1211.666..1228.751 rows=6768 loops=1)
-> Nested Loop (cost=0.00..19.81 rows=1 width=12) (actual time=0.074..0.101 rows=1 loops=1)
-> Nested Loop (cost=0.00..11.53 rows=1 width=8) (actual time=0.061..0.082 rows=1 loops=1)
-> Seq Scan on tmp_facts7_a2 (cost=0.00..3.25 rows=1 width=4) (actual time=0.013..0.032 rows=1 loops=1)
Filter: (name = '00'::text)
-> Index Scan using tmp_facts7_a4_name_idx on tmp_facts7_a4 (cost=0.00..8.27 rows=1 width=4) (actual time=0.046..0.048 rows=1 loops=1)
Index Cond: (tmp_facts7_a4.name = '00'::text)
-> Index Scan using tmp_facts7_a3_name_idx on tmp_facts7_a3 (cost=0.00..8.27 rows=1 width=4) (actual time=0.011..0.015 rows=1 loops=1)
Index Cond: (tmp_facts7_a3.name = '00'::text)
-> Bitmap Heap Scan on tmp_facts7 (cost=104399.33..124660.93 rows=5832 width=16) (actual time=1211.587..1223.046 rows=6768 loops=1)
Recheck Cond: ((tmp_facts7.a4 = tmp_facts7_a4.id) AND (tmp_facts7.a3 = tmp_facts7_a3.id) AND (tmp_facts7.a2 = tmp_facts7_a2.id))
-> BitmapAnd (cost=104399.33..104399.33 rows=5832 width=0) (actual time=1209.131..1209.131 rows=0 loops=1)
-> Bitmap Index Scan on tmp_facts7_a4_idx (cost=0.00..34273.29 rows=2105271 width=0) (actual time=362.778..362.778 rows=2224826 loops=1)
Index Cond: (tmp_facts7.a4 = tmp_facts7_a4.id)
-> Bitmap Index Scan on tmp_facts7_a3_idx (cost=0.00..34273.29 rows=2105271 width=0) (actual time=362.364..362.364 rows=2223561 loops=1)
Index Cond: (tmp_facts7.a3 = tmp_facts7_a3.id)
-> Bitmap Index Scan on tmp_facts7_a2_idx (cost=0.00..34273.29 rows=2105271 width=0) (actual time=362.790..362.790 rows=2221344 loops=1)
Index Cond: (tmp_facts7.a2 = tmp_facts7_a2.id)
-> Seq Scan on tmp_facts7_a1 (cost=0.00..1.12 rows=1 width=4) (actual time=0.001..0.003 rows=1 loops=6768)
Filter: (tmp_facts7_a1.name = '00'::text)
Total runtime: 1262.187 ms
(24 rows)


Выводы

Замечу, что мощность выборки в последних тестах увеличилась вчетверо, равно как и общее число записей в основной таблице. При этом вчетверо упала скорость выборок. Действительно неприятно, т.к. для отображения пользователю от десяти до полусотни записей зачастую нужно выбрать из исходной таблицы в разы большее их число, в связи с дальнейшей группировкой и агрегацией исходных данных.

Комментариев нет:


(C) Alexey Pechnikov aka MBG, mobigroup.ru