sqlite3-rdiff: master-slave replication for SQLite
Link: http://mobigroup.ru/files/sqlite-ext/sqlite3-rdiff
Depends: tcl 8.5, sqlite3, murmurhash SQLite extension
I'm glad to annonce the sqlite3-diff utility for SQLite replication. Are used the ROWID value as unique key of row and murmurhash for build signatures for each row. It's enough for master-slave replication. The INTEGER PRIMARY KEY is not mandatory becouse sqlite3-rdiff may store the ROWID values.
The signature and delta files are valid SQLite3 databases too. They can be dumped/restored, updated and analyzed by any SQLite3 client application.
Hash collisions are resolved by using unique salt for each signature. So after first sync the theoretical frequency of collision (when any record is changed but the hash is same) is about 10^-9, after second sync - 10^-18, after N syncs - 10^-N*9. The salt is unix epoch time which is saved by using "PRAGMA user_version" in signature and delta databases.
Note: the master-master replication is now unsupported becouse is can't be so universal and are needed some conflict-resolution strategies. But it possible with uuid extension to have really unique identifier for each record and use it as key for master-master replication (and of course the ROWID values must be ignored). But the signature and delta files will be too big.
Usage
Example.
The test on my Core2 Duo laptop.
We can check the success of replication by starting replication again. There are no new changes in replicated database and so the we may get the empty patch.
All right, the patch is empty.
Upd.
The test on the Core Quad desktop with single 10 000 rpm HDD. Database work.db.copy is used for some tests and has different schema but the sqlite3-rdiff work fine on it.
Depends: tcl 8.5, sqlite3, murmurhash SQLite extension
I'm glad to annonce the sqlite3-diff utility for SQLite replication. Are used the ROWID value as unique key of row and murmurhash for build signatures for each row. It's enough for master-slave replication. The INTEGER PRIMARY KEY is not mandatory becouse sqlite3-rdiff may store the ROWID values.
The signature and delta files are valid SQLite3 databases too. They can be dumped/restored, updated and analyzed by any SQLite3 client application.
Hash collisions are resolved by using unique salt for each signature. So after first sync the theoretical frequency of collision (when any record is changed but the hash is same) is about 10^-9, after second sync - 10^-18, after N syncs - 10^-N*9. The salt is unix epoch time which is saved by using "PRAGMA user_version" in signature and delta databases.
Note: the master-master replication is now unsupported becouse is can't be so universal and are needed some conflict-resolution strategies. But it possible with uuid extension to have really unique identifier for each record and use it as key for master-master replication (and of course the ROWID values must be ignored). But the signature and delta files will be too big.
Usage
NAME
sqlite3-rdiff - compute and apply signature-based row differences for SQLite3 databases
SYNOPSYS
sqlite3-rdiff [options] signature old-file signature-file
sqlite3-rdiff [options] delta signature-file new-file delta-file
sqlite3-rdiff [options] patch old-file delta-file result-file
DESCRIPTION
Signature is solving by murmurhash2 algorithm for full row. Hash collision is resolved by using unique salt for each replication session.
OPTIONS:
--table-name regular expression for table names, use ? as any single symbol and % for any set of symbols.
AUTHOR
Alexey Pechnikov <pechnikov@mobigroup.ru>
SEE ALSO
The main rdiff-backup web page is at http://geomapx.blogspot.com/2009/12/sqlite3-rdiff-master-slave-replication.html.
Example.
The test on my Core2 Duo laptop.
# add test triggers to block table inserts
# the replication utility may disable triggers when patching database
$ sqlite3 2009-11-01.db "create trigger telephony_log_i before insert on telephony_log \
begin select raise(ABORT,'trigger fire');end"
# make slave database with some changes
$ cp 2009-11-01.db 2009-11-01.db.slave
$ sqlite3 2009-11-01.db.slave "delete from telephony_log where rowid%100000=0;"
$ sqlite3 2009-11-01.db.slave "delete from telephony_log_error where rowid%10=0;"
# how many rows are deleted in previous commands?
$ sqlite3 2009-11-01.db "select count(*) from telephony_log where rowid%100000=0;"
14
$ sqlite3 2009-11-01.db "select count(*) from telephony_log_error where rowid%10=0;"
41
$ time ./sqlite3-rdiff signature 2009-11-01.db.slave 2009-11-01.db.signature
signature 2009-11-01.db.slave 2009-11-01.db.signature --table-name %
=1465379 telephony_log
=303709 telephony_log_rating
=377 telephony_log_error
=140 telephony_log_counter
real 0m40.361s
user 0m28.850s
sys 0m0.936s
$ time ./sqlite3-rdiff delta 2009-11-01.db.signature 2009-11-01.db 2009-11-01.db.delta
delta 2009-11-01.db.signature 2009-11-01.db 2009-11-01.db.delta --table-name %
-0 +14 telephony_log
-0 +0 telephony_log_rating
-0 +41 telephony_log_error
-0 +0 telephony_log_counter
real 1m50.502s
user 1m34.338s
sys 0m1.268s
$ time ./sqlite3-rdiff patch 2009-11-01.db.slave 2009-11-01.db.delta 2009-11-01.db.replicated
patch 2009-11-01.db.slave 2009-11-01.db.delta 2009-11-01.db.replicated --table-name %
-0 +14 telephony_log (2 triggers disabled)
-0 +0 telephony_log_rating (0 triggers disabled)
-0 +41 telephony_log_error (0 triggers disabled)
-0 +0 telephony_log_counter (0 triggers disabled)
real 0m24.624s
user 0m0.168s
sys 0m2.764s
$ ls -lh|grep 2009-11-01.db|awk '{print $5 "\t" $9}'
342M 2009-11-01.db
44K 2009-11-01.db.delta
342M 2009-11-01.db.replicated
23M 2009-11-01.db.signature
342M 2009-11-01.db.slave
We can check the success of replication by starting replication again. There are no new changes in replicated database and so the we may get the empty patch.
$ ./sqlite3-rdiff signature 2009-11-01.db.replicated 2009-11-01.db.signature2
signature 2009-11-01.db.replicated 2009-11-01.db.signature2 --table-name %
=1465393 telephony_log
=303709 telephony_log_rating
=418 telephony_log_error
=140 telephony_log_counter
$ ./sqlite3-rdiff delta 2009-11-01.db.signature2 2009-11-01.db 2009-11-01.db.delta2
delta 2009-11-01.db.signature2 2009-11-01.db 2009-11-01.db.delta2 --table-name %
-0 +0 telephony_log
-0 +0 telephony_log_rating
-0 +0 telephony_log_error
-0 +0 telephony_log_counter
$ ./sqlite3-rdiff patch 2009-11-01.db.replicated 2009-11-01.db.delta2 2009-11-01.db.replicated2
patch 2009-11-01.db.replicated 2009-11-01.db.delta2 2009-11-01.db.replicated2 --table-name %
-0 +0 telephony_log (1 triggers disabled)
-0 +0 telephony_log_rating (0 triggers disabled)
-0 +0 telephony_log_error (0 triggers disabled)
-0 +0 telephony_log_counter (0 triggers disabled)
All right, the patch is empty.
Upd.
The test on the Core Quad desktop with single 10 000 rpm HDD. Database work.db.copy is used for some tests and has different schema but the sqlite3-rdiff work fine on it.
$ time ./sqlite3-rdiff signature work.db.copy work.db.signature
signature work.db.copy work.db.signature --table-name %
=7 system_config
=160 center
=7 role
=1 macroregion
=988 point
=7 region
=1841 user
=2863845 document_status
=25 comment
=250561 document_comment
=0 operation
=16 datatype
=7 type_template
=51 template
=370 template_field
=510515 document
=1591502 document_field
=755659 document_file
=7329 document_photo
=16 status_switch
=486684 message
=7 system_config_history
=160 center_history
=7 role_history
=1 macroregion_history
=988 point_history
=7 region_history
=1841 user_history
=2863845 document_status_history
=25 comment_history
=250561 document_comment_history
=0 operation_history
=16 datatype_history
=7 type_template_history
=51 template_history
=370 template_field_history
=510515 document_history
=1591502 document_field_history
=755659 document_file_history
=7329 document_photo_history
=16 status_switch_history
real 1m58.328s
user 0m53.179s
sys 0m4.604s
$ time ./sqlite3-rdiff delta work.db.signature work.db work.db.delta
delta work.db.signature work.db work.db.delta --table-name %
-0 +0 system_config
-0 +0 center
-0 +0 role
-0 +0 macroregion
-0 +0 point
-0 +0 region
-0 +0 user
-995 +4316 document_status
-0 +0 comment
-205 +629 document_comment
-0 +0 operation
-0 +0 datatype
-0 +0 type_template
-0 +0 template
-0 +0 template_field
-0 +820 document
-0 +1914 document_field
-0 +1356 document_file
-0 +7 document_photo
-0 +0 status_switch
-0 +4445 message
-0 +0 system_config_history
-0 +0 center_history
-0 +0 role_history
-0 +0 macroregion_history
-0 +0 point_history
-0 +0 region_history
-0 +0 user_history
-0 +6147 document_status_history
-0 +0 comment_history
-0 +630 document_comment_history
-0 +0 operation_history
-0 +0 datatype_history
-0 +0 type_template_history
-0 +0 template_history
-0 +0 template_field_history
-0 +820 document_history
-0 +1914 document_field_history
-0 +1356 document_file_history
-0 +7 document_photo_history
-0 +0 status_switch_history
real 5m11.592s
user 3m23.989s
sys 0m5.704s
$ time ./sqlite3-rdiff patch work.db.copy work.db.delta work.db.copy
patch work.db.copy work.db.delta work.db.copy --table-name %
-0 +0 system_config (3 triggers disabled)
-0 +0 center (3 triggers disabled)
-0 +0 role (3 triggers disabled)
-0 +0 macroregion (3 triggers disabled)
-0 +0 point (3 triggers disabled)
-0 +0 region (3 triggers disabled)
-0 +0 user (3 triggers disabled)
-995 +4316 document_status (3 triggers disabled)
-0 +0 comment (3 triggers disabled)
-205 +629 document_comment (3 triggers disabled)
-0 +0 operation (3 triggers disabled)
-0 +0 datatype (3 triggers disabled)
-0 +0 type_template (3 triggers disabled)
-0 +0 template (3 triggers disabled)
-0 +0 template_field (3 triggers disabled)
-0 +820 document (3 triggers disabled)
-0 +1914 document_field (3 triggers disabled)
-0 +1356 document_file (3 triggers disabled)
-0 +7 document_photo (3 triggers disabled)
-0 +0 status_switch (3 triggers disabled)
-0 +4445 message (0 triggers disabled)
system_config_versioning
center_versioning
role_versioning
macroregion_versioning
point_versioning
region_versioning
user_versioning
document_status_versioning
comment_versioning
document_comment_versioning
operation_versioning
datatype_versioning
type_template_versioning
template_versioning
template_field_versioning
document_versioning
document_field_versioning
document_file_versioning
document_photo_versioning
status_switch_versioning
real 0m0.726s
user 0m0.500s
sys 0m0.124s
$ ls -lh|grep work.db|awk '{print $5 "\t" $8}'
1.5G work.db
2.0G work.db.copy
4.0M work.db.delta
159M work.db.signature
Comments