понедельник, 23 ноября 2009 г.

Degradation of indexing speed in SQLite 3.6.20

Hardware

laptop with Intel(R) Core(TM)2 Duo CPU T5470 1.60GHz (working as 800MHz CPU for low heating)
1 GB RAM
HDD Western Digital Scorpio Black 2.5" 7200 rpm 16Mb (FS ext3)
OS Debian (lenny+testing+sid).

SQLite build
My debian package has these options to compile SQLite 3.6.20:

DEB_OPT_FLAG := -O2 -fno-strict-aliasing \
-DSQLITE_CORE=1 \
-DSQLITE_ENABLE_COLUMN_METADATA \
-DSQLITE_ENABLE_STAT2 \
-DSQLITE_DEFAULT_RECURSIVE_TRIGGERS=1 \
-DSQLITE_ENABLE_COMPRESS \
-DSQLITE_ENABLE_FTS3 \
-DSQLITE_ENABLE_INET \
-DSQLITE_ENABLE_MD5 \
-DSQLITE_ENABLE_RTREE \
-DSQLITE_ENABLE_TABLEFUNC \
-DSQLITE_ENABLE_VIRTUALTEXT \
-DSQLITE_ENABLE_ICU \
-DSQLITE_ENABLE_UUID \
-DSQLITE_ENABLE_VERSIONING \
-DSQLITE_SOUNDEX \
-DSQLITE_TEMP_STORE=3 \
-DSQLITE_DEFAULT_PAGE_SIZE=4096 \
-DSQLITE_DEFAULT_CACHE_SIZE=128000 \
-DSQLITE_DEFAULT_TEMP_CACHE_SIZE=8000 \
-DSQLITE_DEFAULT_FILE_FORMAT=4

LDFLAGS=-luuid -lz -ldl `icu-config --ldflags`


You can get binary and sources packages from http://mobigroup.ru/debian/pool/main/s/sqlite3/ (current build is 3.6.20-mobigroup.1).

Test results

┌────────────────────┬────┬───┬─────┬──────┬──────┬───────────────────────────┐
│Groups count (N) │1 │10 │100 │1000 │10000 │ │
├────────────────────┴────┴───┴─────┴──────┴──────┴───────────────────────────┤
│DB population │
├────────────────────┬────┬───┬─────┬──────┬──────┬───────────────────────────┤
│Creation time │2s │20s│205s │2225s │21329s│./test.tcl groups N │
│DB size │2,6M│26M│255M │2,6G │26G │items 25000 │
├────────────────────┴────┴───┴─────┴──────┴──────┴───────────────────────────┤
│Indexing after DB population │
├────────────────────┬────┬───┬─────┬──────┬──────┬───────────────────────────┤
│Single-column │ │ │ │ │ │ │
│indexing time │<1s │5s │49s │574s │ │CREATE INDEX single_idx on │
│DB size │2,9M│29M│290M │2,9G │- │test(destcode) │
├────────────────────┼────┼───┼─────┼──────┼──────┼───────────────────────────┤
│Analyze time │<1s │<1s│28s │588s │- │ANALYZE │
├────────────────────┴────┴───┴─────┴──────┴──────┴───────────────────────────┤
│Indexing after DB population │
├────────────────────┬────┬───┬─────┬──────┬──────┬───────────────────────────┤
│Multi-column │ │ │ │ │ │CREATE INDEX complex_idx │
│indexing time │1s │6s │80s │>5000s│ │on test │
│DB size │4,0M│40M│397M │ - │- │(destcode,name,delete_date)│
├────────────────────┼────┼───┼─────┼──────┼──────┼───────────────────────────┤
│Analyze time │<1s │4s │211s │- │- │ANALYZE │
├────────────────────┴────┴───┴─────┴──────┴──────┴───────────────────────────┤
│Single-column indexing with DB population │
├────────────────────┬────┬───┬─────┬──────┬──────┬───────────────────────────┤
│ │ │ │ │ │ │./test.tcl groups N │
│Creation time/DB │3s │29s│658s │>5000s│ │items 25000 index single │
│size │2,9M│29M│290M │ - │- │ │
├────────────────────┴────┴───┴─────┴──────┴──────┴───────────────────────────┤
│Multi-column indexing with DB population │
├────────────────────┬────┬───┬─────┬──────┬──────┬───────────────────────────┤
│ │ │ │ │ │ │./test.tcl groups N │
│Creation time/DB │3s │36s│1875s│ │ │items 25000 index complex │
│size │4,0M│40M│ 397M│- │- │ │
└────────────────────┴────┴───┴─────┴──────┴──────┴───────────────────────────┘


test.tcl

#!/usr/bin/tclsh8.5
# Use as ./test.tcl groups X items Y [index single|complex]
package require sqlite3

array set opts $argv

puts "Test of $opts(groups) with $opts(items) items in each group"

catch {file delete test$opts(groups).db}
catch {file delete test$opts(groups).db-journal}
sqlite3 db test$opts(groups).db
db eval {PRAGMA cache_size=16000}
#db eval {PRAGMA synchronous=OFF}

db eval {
CREATE TABLE test ( -- справочник направлений для телефонии
id INTEGER PRIMARY KEY AUTOINCREMENT NOT NULL,
save_date REAL NOT NULL DEFAULT (julianday('now')),
delete_date REAL,
name text collate russian not null, -- группа направлений
destname text collate russian not null, -- название направления
destcode integer not null, -- префикс E.164 номера телефона
rcode text collate NOCASE not null, -- код региона
price real not null, -- стоимость минуты разговора
currency text collate NOCASE not null default 'RUB'
)
}
if {[info exists opts(index)] && $opts(index) eq {single}} {
db eval {CREATE INDEX single_idx on test(destcode)}
}
if {[info exists opts(index)] && $opts(index) eq {complex}} {
db eval {CREATE INDEX complex_idx on test(destcode,name,delete_date)}
}

for {set j 0} {$j<[expr $opts(groups)]} {incr j} {
set name "Группа направлений $j"
db transaction {
for {set i 0} {$i<$opts(items)} {incr i} {
set destname "Направление $i"
set destcode [string range [expr round(1000000000000000*rand())] 0 [expr round(15*rand())]]
set price [expr round(10000*rand())/100.]
# puts "insert into test (name,destname,destcode,rcode,price) values ($name,$destname,$destcode,'',$price)"
db eval {insert into test (name,destname,destcode,rcode,price) values ($name,$destname,$destcode,'',$price)}
}
}
}
db close


Upd.

I did perform additional test for speed of select query. Is used the real query from telephony billing system.

Select Time for
select rowid from test where name='Группа направлений 0' and destcode=500 and delete_date IS NULL

┌────────────────────┬────┬───┬─────┬──────┬──────┐
│Groups count (N) │1 │10 │100 │1000 │10000 │
├────────────────────┼────┼───┼─────┼──────┼──────┤
│without index │0.1s│1s │8s │78s │796s │
├────────────────────┼────┼───┼─────┼──────┼──────┤
│single index │0s │0s │0.08s│0.4s │- │
└────────────────────┴────┴───┴─────┴──────┴──────┘

Note: the query returns 1...3 rows. Time is measured by SQLite timer as user+sys times.

The test results
There is no speed degradation of no-indexed database.
As temporary fix of indexing problem we can use single-column realtime index and create complex index after full population of database (I split database monthly).

Upd.
I did perform the tests on server hardware (with 8 Gb RAM). As we can see the index creating speed degradation is result of index size more than SQLite page cache size.


$ time ./test.tcl groups 1000 items 25000; \
time sqlite3 test1000.db "pragma cache_size=500000;CREATE INDEX single_idx on test(destcode);"
Test of 1000 with 25000 items in each group

real 10m7.098s
user 9m36.932s
sys 0m17.545s

real 3m5.097s
user 2m55.639s
sys 0m4.392s

$ time ./test.tcl groups 2000 items 25000; \
time sqlite3 test2000.db "pragma cache_size=500000;CREATE INDEX single_idx on test(destcode);";
Test of 2000 with 25000 items in each group

real 21m7.168s
user 19m13.756s
sys 0m33.994s

real 6m42.785s
user 6m12.483s
sys 0m17.749s

$ > time ./test.tcl groups 4000 items 25000; \
time sqlite3 test4000.db "pragma cache_size=500000;CREATE INDEX single_idx on test(destcode);";
Test of 4000 with 25000 items in each group

real 40m36.081s
user 38m28.492s
sys 1m11.944s

real 13m42.090s
user 12m48.424s
sys 0m28.150s

$ time ./test.tcl groups 8000 items 25000; \
time sqlite3 test8000.db "pragma cache_size=500000;CREATE INDEX single_idx on test(destcode);"
Test of 8000 with 25000 items in each group

real 80m36.335s
user 76m46.752s
sys 2m9.436s
^CError: interrupted

real 214m12.327s
user 23m39.909s
sys 1m18.873s


The last index size must be more than 2Gb and can't be effectively created with cache_size equal to 2Gb. This command interrupted manually by Ctrl+C keys.

Комментариев нет:


(C) Alexey Pechnikov aka MBG, mobigroup.ru