The small signature for sqlite3-rdiff

Depends: current test version of sqlite3-rdiff

The sqlite3-rdiff utility produces a signature file of size about 10% of original database size. Yes, it's better than coping an entire database but it's not good enough for production use. So I wrote a new algorithm that can build the small signatures. The algorithm calculates checksums for set of rows and so with N rows in each set the signature size will be decreased by the factor of N! Of course the delta file size will be increased but only a little for most databases.

My code uses a hack


select rowid/N as rowid, sum(murmurhash('$unixepoch',$cols)) from ... group by rowid


This solution is not good and may be used for testing only. This code will fail with N>16.


$ time ./sqlite3-rdiff signature slave.db slave.db.signature
signature slave.db slave.db.signature --table-name % --rows-per-hash 1
=7 system_config
=160 center
=7 role
=1 macroregion
=988 point
=7 region
=1841 user
=2867166 document_status
=25 comment
=250985 document_comment
=0 operation
=16 datatype
=7 type_template
=51 template
=370 template_field
=511335 document
=1593416 document_field
=757015 document_file
=7336 document_photo
=16 status_switch
=491129 message
=7 system_config_history
=160 center_history
=7 role_history
=1 macroregion_history
=988 point_history
=7 region_history
=1841 user_history
=2869992 document_status_history
=25 comment_history
=251191 document_comment_history
=0 operation_history
=16 datatype_history
=7 type_template_history
=51 template_history
=370 template_field_history
=511335 document_history
=1593416 document_field_history
=757015 document_file_history
=7336 document_photo_history
=16 status_switch_history

real 0m55.309s
user 0m53.075s
sys 0m1.444s


$ time ./sqlite3-rdiff delta slave.db.signature master.db slave.db.delta
delta slave.db.signature master.db slave.db.delta --table-name % --rows-per-hash 1
-0 +0 system_config
-0 +0 center
-0 +0 role
-0 +0 macroregion
-0 +0 point
-0 +0 region
-0 +0 user
-687 +2922 document_status
-0 +0 comment
-50 +230 document_comment
-0 +0 operation
-0 +0 datatype
-0 +0 type_template
-0 +0 template
-0 +0 template_field
-0 +550 document
-0 +1348 document_field
-0 +970 document_file
-0 +2 document_photo
-0 +0 status_switch
-0 +4443 message
-0 +0 system_config_history
-0 +0 center_history
-0 +0 role_history
-0 +0 macroregion_history
-0 +0 point_history
-0 +0 region_history
-0 +0 user_history
-0 +4202 document_status_history
-0 +0 comment_history
-0 +230 document_comment_history
-0 +0 operation_history
-0 +0 datatype_history
-0 +0 type_template_history
-0 +0 template_history
-0 +0 template_field_history
-0 +550 document_history
-0 +1348 document_field_history
-0 +970 document_file_history
-0 +2 document_photo_history
-0 +0 status_switch_history

real 4m20.282s
user 3m18.276s
sys 0m5.944s



$ time ./sqlite3-rdiff --rows-per-hash 16 signatureN slave.db slave.db.signature16
signature slave.db slave.db.signature16 --table-name % --rows-per-hash 16
=1 system_config
=19 center
=1 role
=1 macroregion
=90 point
=1 region
=185 user
=179198 document_status
=3 comment
=15687 document_comment
=0 operation
=2 datatype
=1 type_template
=10 template
=57 template_field
=45270 document
=227657 document_field
=63430 document_file
=495 document_photo
=3 status_switch
=30697 message
=1 system_config_history
=11 center_history
=1 role_history
=1 macroregion_history
=62 point_history
=1 region_history
=116 user_history
=179375 document_status_history
=2 comment_history
=15700 document_comment_history
=0 operation_history
=2 datatype_history
=1 type_template_history
=4 template_history
=24 template_field_history
=31959 document_history
=99589 document_field_history
=47314 document_file_history
=459 document_photo_history
=2 status_switch_history

real 3m40.539s
user 1m40.978s
sys 0m8.097s


$ time ./sqlite3-rdiff --rows-per-hash 16 deltaN slave.db.signature16 master.db slave.db.delta16
delta slave.db.signature16 master.db slave.db.delta16 --table-name % --rows-per-hash 16
-0 +0 system_config
-0 +0 center
-0 +0 role
-0 +0 macroregion
-0 +0 point
-0 +0 region
-0 +0 user
-5808 +8042 document_status
-0 +0 comment
-480 +654 document_comment
-0 +0 operation
-0 +0 datatype
-0 +0 type_template
-0 +0 template
-0 +0 template_field
-16 +553 document
-16 +1352 document_field
-16 +974 document_file
-4 +4 document_photo
-0 +0 status_switch
-16 +4452 message
-0 +0 system_config_history
-0 +0 center_history
-0 +0 role_history
-0 +0 macroregion_history
-0 +0 point_history
-0 +0 region_history
-0 +0 user_history
-16 +4211 document_status_history
-0 +0 comment_history
-16 +238 document_comment_history
-0 +0 operation_history
-0 +0 datatype_history
-0 +0 type_template_history
-0 +0 template_history
-0 +0 template_field_history
-16 +558 document_history
-16 +1357 document_field_history
-16 +978 document_file_history
-11 +11 document_photo_history
-0 +0 status_switch_history

real 3m28.369s
user 1m51.347s
sys 0m7.928s



$ ls -lh|grep slave.db|awk '{print $5 "\t" $8}'
1.5G slave.db
3.4M slave.db.delta
4.3M slave.db.delta16
160M slave.db.signature
13M slave.db.signature16

Comments

Popular posts from this blog

Открытый софт для научных расчетов

Счетчики в SQLite

Модем Huawei E1550 в debian