Mjönir: storing GUIDs/UUIDs in binary
-
Eero af Heurlin
Mjönir: storing GUIDs/UUIDs in binary
Wed June 24 2009 16:45:07 UTCAs discussed in linköping: GUID/UUIDs should be stored in binary and
converted to ascii representation on API layer.
Pros:
+ Significantly less storage used
+ Smaller indexes
+ Faster searches (I'd guess, since it's not a text search)
Cons:
- Not backwards compatible (it doesn't need to be...)
- Debugging on DB level becomes slightly more painful (but this matters
not to users or non-core developers...)
And to those exclaiming that what kind of brainfart is this, sorting
them in text ?!?!: "historical reasons".
/Rambo
_______________________________________________
dev mailing list
dev@lists.midgard-project.org
http://lists.midgard-project.org/mailman/listinfo/dev -
Re: [midgard-dev] Mjönir: storing GUIDs/UUIDs in binary
Thu June 25 2009 10:05:06 UTCEero af Heurlin writes:
Hi!
> As discussed in linköping: GUID/UUIDs should be stored in binary and
> converted to ascii representation on API layer.
>
> Pros:
>
> + Significantly less storage used
> + Smaller indexes
> + Faster searches (I'd guess, since it's not a text search)
SQLite doesn't have binary data type.
There's only BLOB which is considered as is.
I am not sure if it's text search or not in such case.
> Cons:
>
> - Not backwards compatible (it doesn't need to be...)
I am afraid it has to be backward compatible in Mjolnir.
But this couldn't be a problem as we can introduce UUID property with
GUID in parallel.
> - Debugging on DB level becomes slightly more painful (but this matters
> not to users or non-core developers...)
Debugging is always difficult tusk, so I wouldn't say it's con.
If needed, we could create some specific tool for debugging.
But of course, first important thing is if binary will be easily
portable across db providers.
Piotras
_______________________________________________
dev mailing list
dev@lists.midgard-project.org
http://lists.midgard-project.org/mailman/listinfo/dev -
Re: [midgard-dev] Mjönir: storing GUIDs/UUIDs in binary
Thu June 25 2009 10:45:05 UTCPiotr Pokora wrote:
>> Pros:
>>
>> + Significantly less storage used
>> + Smaller indexes
>> + Faster searches (I'd guess, since it's not a text search)
>
> SQLite doesn't have binary data type.
> There's only BLOB which is considered as is.
> I am not sure if it's text search or not in such case.
>
BLOB must be a binary search, but it's entirely another question if
sqlite allows indexing on BLOBs.
Anyways the installations that use sqlite should not have so large DBs
that the index would be absolutely required since the raw binary search
should be fast anyway.
This is relatively easy to benchmark though: take a large production db
dump and copy it somewhere, drop all indexes and make new column
guid_binary and populate with the binary versions then start comparing.
>> Cons:
>>
>> - Not backwards compatible (it doesn't need to be...)
>
> I am afraid it has to be backward compatible in Mjolnir.
> But this couldn't be a problem as we can introduce UUID property with
> GUID in parallel.
>
You mean the mRFC0042 UUIDs ?
Anyways incompatibility exists only on db level (as I suggested on API
level hex presentations would be used) and conversion script is trivial.
Surely there are other changes between mjölnir and vinland that require
database updates.
>> - Debugging on DB level becomes slightly more painful (but this matters
>> not to users or non-core developers...)
>
> Debugging is always difficult tusk, so I wouldn't say it's con.
>
It's a trivial matter, but con anyway.
/Rambo
_______________________________________________
dev mailing list
dev@lists.midgard-project.org
http://lists.midgard-project.org/mailman/listinfo/dev -
Re: [midgard-dev] Mjönir: storing GUIDs/UUIDs in binary
Sun July 19 2009 15:55:06 UTCMade some tests on my Nexenta (GNU/OpenSolaris) midgard zone (not really
a VM but close).
dumped repligard table with --opt from my personal installation:
60M Jul 19 21:50 repligard.sql.orig
Made a database 'test' and imported the .orig file as the 'repligard'
table, then made the table with binary guids:
cp repligard.sql.orig repligard_bin.sql
sed -i.sed -r -e "s/'([0-9a-f]{32,80})'/UNHEX('\1')/g" repligard_bin.sql
sed -i -e 's/repligard/repligard_bin/' repligard_bin.sql
and import the repligard_bin.sql to the test db.
root@midgard:/var/tmp# ls -lah /var/lib/mysql/test/
total 125M
drwx------ 2 mysql mysql 9 Jul 19 22:07 .
drwxr-xr-x 6 mysql mysql 11 Jul 19 22:01 ..
-rw-rw---- 1 mysql mysql 65 Jul 19 22:01 db.opt
-rw-rw---- 1 mysql mysql 35M Jul 19 22:03 repligard.MYD
-rw-rw---- 1 mysql mysql 39M Jul 19 22:03 repligard.MYI
-rw-rw---- 1 mysql mysql 8.9K Jul 19 22:03 repligard.frm
-rw-rw---- 1 mysql mysql 28M Jul 19 22:07 repligard_bin.MYD
-rw-rw---- 1 mysql mysql 25M Jul 19 22:07 repligard_bin.MYI
-rw-rw---- 1 mysql mysql 8.9K Jul 19 22:07 repligard_bin.frm
Here we can significantly smaller data and especially index (371414 rows).
mysql> SELECT id,realm FROM repligard WHERE guid IN
('eaaa5ae56694c00d0ec29cef87108881',
'c1eb0a70cfda11db95e7c514e61ac918c918',
'f3c8548acfda11db95e7c514e61ac918c918',
'b20217eccfdb11db95e7c514e61ac918c918',
'66a37932cfde11db95e7c514e61ac918c918');
+-----+-------------------------------+
| id | realm |
+-----+-------------------------------+
| 195 | org_routamc_photostream_photo |
| 107 | org_routamc_photostream_photo |
| 74 | org_routamc_photostream_photo |
| 30 | person |
| 81 | org_routamc_photostream_photo |
+-----+-------------------------------+
5 rows in set (0.00 sec)
mysql>
mysql> SELECT id,realm FROM repligard_bin WHERE guid IN
(UNHEX('eaaa5ae56694c00d0ec29cef87108881'),
UNHEX('c1eb0a70cfda11db95e7c514e61ac918c918'),
UNHEX('f3c8548acfda11db95e7c514e61ac918c918'),
UNHEX('b20217eccfdb11db95e7c514e61ac918c918'),
UNHEX('66a37932cfde11db95e7c514e61ac918c918'));
+-----+-------------------------------+
| id | realm |
+-----+-------------------------------+
| 195 | org_routamc_photostream_photo |
| 107 | org_routamc_photostream_photo |
| 74 | org_routamc_photostream_photo |
| 30 | person |
| 81 | org_routamc_photostream_photo |
+-----+-------------------------------+
5 rows in set (0.00 sec)
Seems my table is not large enough and since the guid is the primary key
I can't delete the index to compare scanning speed.
However this completely artificial case is interesting:
mysql> SELECT id,realm FROM repligard WHERE guid >
'66a37932cfde11db95e7c514e61ac918c918' AND guid <
'c1eb0a70cfda11db95e7c514e61ac918c918';
<snip>
131975 rows in set (0.38 sec)
mysql> SELECT id,realm FROM repligard_bin WHERE guid >
UNHEX('66a37932cfde11db95e7c514e61ac918c918') AND guid <
UNHEX('c1eb0a70cfda11db95e7c514e61ac918c918');
<snip>
131973 rows in set (0.46 sec)
Interesting as in on what grounds is the text lookup faster than pure
binary, the resultsets are mostly the same but for some reason the text
comparison returns two more results.
Another query set:
mysql> select hex(guid),realm from repligard_bin where id=153;
+--------------------------------------+---------------------------------+
| hex(guid) | realm |
+--------------------------------------+---------------------------------+
| 0F5AB82FE6C83AECE4C969470B0A50B8 | element |
| E7B5E0C2E1A395EC735A33DF4F422EFC | snippet |
| 0846BED617A584BA828CF1742454B0E1 | record_extension |
| 731CE1D8B4BDE32E73E2C70406837A91 | page |
| 5AAF11E51C5D9956011240EB4B2AD470 | eventmember |
| EE163297585E93555B249E3A96CFD9C5 | member |
| 0713CB44547977C27BB09C82251A6EF5 | language |
| ACAA16866EC65E86DC38C27D716585C1 | pageelement |
| 5C2DE948CFCE11DBAE09CDEA25059E589E58 | midcom_core_privilege |
| 1664DE8ACFDD11DB95E7C514E61AC918C918 | org_routamc_photostream_photo |
| 16671B6ECFDD11DB95E7C514E61AC918C918 | org_routamc_photostream_photo_i |
| 1AB42A2CCFDD11DB95E7C514E61AC918C918 | org_routamc_gallery_photolink |
| 346607C059FC11DCAF7A7BB29D5E73347334 | midcom_services_at_entry_db |
| A1BAEF6C1E2B11DEB3F4DF0A8C25EA8EEA8E | midcom_core_login_session |
+--------------------------------------+---------------------------------+
14 rows in set (0.25 sec)
mysql> select guid,realm from repligard where id=153;
+--------------------------------------+---------------------------------+
| guid | realm |
+--------------------------------------+---------------------------------+
| 0f5ab82fe6c83aece4c969470b0a50b8 | element |
| e7b5e0c2e1a395ec735a33df4f422efc | snippet |
| 0846bed617a584ba828cf1742454b0e1 | record_extension |
| 731ce1d8b4bde32e73e2c70406837a91 | page |
| 5aaf11e51c5d9956011240eb4b2ad470 | eventmember |
| ee163297585e93555b249e3a96cfd9c5 | member |
| 0713cb44547977c27bb09c82251a6ef5 | language |
| acaa16866ec65e86dc38c27d716585c1 | pageelement |
| 5c2de948cfce11dbae09cdea25059e589e58 | midcom_core_privilege |
| 1664de8acfdd11db95e7c514e61ac918c918 | org_routamc_photostream_photo |
| 16671b6ecfdd11db95e7c514e61ac918c918 | org_routamc_photostream_photo_i |
| 1ab42a2ccfdd11db95e7c514e61ac918c918 | org_routamc_gallery_photolink |
| 346607c059fc11dcaf7a7bb29d5e73347334 | midcom_services_at_entry_db |
| a1baef6c1e2b11deb3f4df0a8c25ea8eea8e | midcom_core_login_session |
+--------------------------------------+---------------------------------+
14 rows in set (0.26 sec)
Here the binary version is slightly faster (don't know if the time spent
in the HEX function is included), probably because there is only half as
much data to read from the binary guid.
The table is too small to affect insert performance:
mysql> INSERT into repligard set
guid='a1baef6c1e2b11deb3f4df0a8c25ea8eea8eaeae',realm='midcom_core_login_session',id=-1;
Query OK, 1 row affected (0.00 sec)
mysql> INSERT into repligard_bin set
guid=UNHEX('a1baef6c1e2b11deb3f4df0a8c25ea8eea8eaeae'),realm='midcom_core_login_session',id=-1;
Query OK, 1 row affected (0.00 sec)
CONCLUSION:
There is no clear winner from performance POV on this setup and dataset.
I think it would be worth the time to generate a few million guids and
do some more tests with larger tables (tepheikk: how many guids in Qaiku
repligard table ?)
Since these tables fit easily to memory and my ZFS setup in general is
very fast (raidz2 across 8 SATA disks, plenty of cache memory) I/O is
not an issue here, though I have a feeling that if we were to hit disks
(on inserts for example) the binary version which is much more efficient
space-wise would be as much faster.
/Rambo
_______________________________________________
dev mailing list
dev@lists.midgard-project.org
http://lists.midgard-project.org/mailman/listinfo/dev -
Re: [midgard-dev] Mjönir: storing GUIDs/UUIDs in binary
Sun July 19 2009 16:05:06 UTCEero af Heurlin wrote:
>>> - Debugging on DB level becomes slightly more painful (but this matters
>>> not to users or non-core developers...)
>> Debugging is always difficult tusk, so I wouldn't say it's con.
>>
>
> It's a trivial matter, but con anyway.
>
mainly it means select * is not an option (since you will get raw
binary, possibly with control character that mess your terminal), so you
need to specify the columns you want and if you want see guid say
'HEX(guid)' to get hex presentation.
But yes, trivial annoyance and doesn't matter at all to anyone but those
messing about the core.
/Rambo
_______________________________________________
dev mailing list
dev@lists.midgard-project.org
http://lists.midgard-project.org/mailman/listinfo/dev -
Re: [midgard-dev] Mjönir: storing GUIDs/UUIDs in binary
Sun July 19 2009 16:45:06 UTCEero af Heurlin wrote:
> Made some tests on my Nexenta (GNU/OpenSolaris) midgard zone (not really
> a VM but close).
>
> dumped repligard table with --opt from my personal installation:
>
> 60M Jul 19 21:50 repligard.sql.orig
>
> Made a database 'test' and imported the .orig file as the 'repligard'
> table, then made the table with binary guids:
>
> cp repligard.sql.orig repligard_bin.sql
> sed -i.sed -r -e "s/'([0-9a-f]{32,80})'/UNHEX('\1')/g" repligard_bin.sql
> sed -i -e 's/repligard/repligard_bin/' repligard_bin.sql
>
Forgot to mention: manually edited the guid from varchar(80) to
varbinary(40).
/Rambo
_______________________________________________
dev mailing list
dev@lists.midgard-project.org
http://lists.midgard-project.org/mailman/listinfo/dev -
Re: [midgard-dev] Mjönir: storing GUIDs/UUIDs in binary
Sun July 19 2009 20:12:38 UTCPosting here via web interface as well since it seems the mailing-list lurker does not like the attachments in the original mail.
Tested with repligard table of ~3M rows.
- created new test database, created the two SQL files (about 1.1G uncompressed)
- time mysql -p<REMOVED> test <large_repligard.sql
real 18m58.036s user 0m19.756s sys 0m1.495s
- time mysql -p<REMOVED> test <large_repligard_bin.sql
real 15m32.668s user 0m20.710s sys 0m1.533s
- ls -lah /var/lib/mysql/test/
total 1.2G drwx------ 2 mysql mysql 9 Jul 20 00:01 . drwxr-xr-x 6 mysql mysql 11 Jul 19 23:40 .. -rw-rw---- 1 mysql mysql 65 Jul 19 23:40 db.opt -rw-rw---- 1 mysql mysql 326M Jul 19 23:44 repligard.MYD -rw-rw---- 1 mysql mysql 371M Jul 19 23:59 repligard.MYI -rw-rw---- 1 mysql mysql 9.7K Jul 19 23:40 repligard.frm -rw-rw---- 1 mysql mysql 269M Jul 20 00:04 repligard_bin.MYD -rw-rw---- 1 mysql mysql 243M Jul 20 00:17 repligard_bin.MYI -rw-rw---- 1 mysql mysql 9.7K Jul 20 00:01 repligard_bin.frm
Significantly smaller index file here. - Test script
#!/bin/bash MYSQL_PASS="my_pass" ROWS=`mysql -B -p$MYSQL_PASS test -e 'select count(*) from repligard' | grep -v [a-z]` AWKSCRIPT="BEGIN{ srand() } END{ for(i=1;i<=100;i++) print randint($ROWS) } function randint(n) {return int(n * rand()) + 1}" for i in `echo | awk "$AWKSCRIPT"`; do # mysql -p$MYSQL_PASS test -e "SELECT guid,id,realm FROM repligard LIMIT $i,1;" # mysql -p$MYSQL_PASS test -e "SELECT HEX(guid) as guid,id,realm FROM repligard_bin LIMIT $i,1;" echo "SELECT guid,id,realm FROM repligard LIMIT $i,1;" echo "SELECT HEX(guid) as guid,id,realm FROM repligard_bin LIMIT $i,1;" doneThis gives out a bunch (200) of SQL statements for getting (same)random rows from the tables which we then execute in the mysql shell to get the timing info (see attachment random_seeks). Now these statements are basically stupid seeks, but they show that when we get to millions of rows varchar vs varbinary starts to matter from performance POV as well. - Other test script
#!/bin/bash MYSQL_PASS="my_pass" ROWS=`mysql -B -p$MYSQL_PASS test -e 'select count(*) from repligard' | grep -v [a-z]` AWKSCRIPT="BEGIN{ srand() } END{ for(i=1;i<=100;i++) print randint($ROWS) } function randint(n) {return int(n * rand()) + 1}" for i in `echo | awk "$AWKSCRIPT"`; do GUID=`mysql -B -p$MYSQL_PASS test -e "SELECT guid FROM repligard LIMIT $i,1;" | grep -v 'guid'` echo "SELECT guid,id,realm FROM repligard WHERE guid='$GUID';" echo "SELECT HEX(guid) as guid,id,realm FROM repligard_bin WHERE guid=UNHEX('$GUID');" doneAnother output of statements we try in mysql shell (after restarting the server to clear caches), attachment random_guids. Here we see basically no difference between the two tables. - Inserts
mysql> INSERT into repligard set guid='a1baef6c1e2b11deb3f4df0a8c25ea8eea8eaeae',realm='midcom_core_login_session',id=-1; Query OK, 1 row affected (0.00 sec) mysql> INSERT into repligard_bin set guid=UNHEX('a1baef6c1e2b11deb3f4df0a8c25ea8eea8eaeae'),realm='midcom_core_login_session',id=-1; Query OK, 1 row affected (0.00 sec)Inserting single rows shows no difference here (probably goes to cache/queue) - Updates (server restarted between to clear caches)
mysql> UPDATE repligard SET realm='midcom_core_login_session_upd' WHERE guid='a1baef6c1e2b11deb3f4df0a8c25ea8eea8eaeae'; Query OK, 1 row affected (0.01 sec) Rows matched: 1 Changed: 1 Warnings: 0 mysql> UPDATE repligard_bin SET realm='midcom_core_login_session_upd' WHERE guid=UNHEX('a1baef6c1e2b11deb3f4df0a8c25ea8eea8eaeae'); Query OK, 1 row affected (0.00 sec) Rows matched: 1 Changed: 1 Warnings: 0Second try (again restart in between) showed no time difference here, however I enabled profiling:mysql> show profiles; +----------+------------+-----------------------------------------------------------------------------------------------------------------------------+ | Query_ID | Duration | Query | +----------+------------+-----------------------------------------------------------------------------------------------------------------------------+ | 1 | 0.00083000 | UPDATE repligard SET realm='midcom_core_login_session_upd' WHERE guid='a1baef6c1e2b11deb3f4df0a8c25ea8eea8eaeae' | | 2 | 0.00026675 | UPDATE repligard_bin SET realm='midcom_core_login_session_upd' WHERE guid=UNHEX('a1baef6c1e2b11deb3f4df0a8c25ea8eea8eaeae') | | 3 | 0.00024950 | UPDATE repligard SET realm='midcom_core_login_session_upd' WHERE guid='a1baef6c1e2b11deb3f4df0a8c25ea8eea8eaeae' | | 4 | 0.00018250 | UPDATE repligard_bin SET realm='midcom_core_login_session_upd' WHERE guid=UNHEX('a1baef6c1e2b11deb3f4df0a8c25ea8eea8eaeae') | | 5 | 0.00020675 | UPDATE repligard SET realm='midcom_core_login_session_upd' WHERE guid='a1baef6c1e2b11deb3f4df0a8c25ea8eea8eaeae' | | 6 | 0.00016825 | UPDATE repligard_bin SET realm='midcom_core_login_session_upd' WHERE guid=UNHEX('a1baef6c1e2b11deb3f4df0a8c25ea8eea8eaeae') | +----------+------------+-----------------------------------------------------------------------------------------------------------------------------+ 6 rows in set (0.00 sec)Here the binary version is consistently significantly faster, while at it I did some GUID seeks with profiling enabled:+----------+------------+---------------------------------------------------------------------------------------------------------------+ | Query_ID | Duration | Query | +----------+------------+---------------------------------------------------------------------------------------------------------------+ | 1 | 0.00004125 | set profiling_history_size=100 | | 2 | 0.00211800 | SELECT guid,id,realm FROM repligard WHERE guid='67d07320a39e11dcb09c5d955494086f086f' | | 3 | 0.00074200 | SELECT HEX(guid) as guid,id,realm FROM repligard_bin WHERE guid=UNHEX('67d07320a39e11dcb09c5d955494086f086f') | | 4 | 0.00023200 | SELECT guid,id,realm FROM repligard WHERE guid='110e571c9d5111dcb7b5affb63d5b221b221' | | 5 | 0.00017700 | SELECT HEX(guid) as guid,id,realm FROM repligard_bin WHERE guid=UNHEX('110e571c9d5111dcb7b5affb63d5b221b221') | | 6 | 0.00077450 | SELECT guid,id,realm FROM repligard WHERE guid='817cd724bc8e11dca45e0b66222a75d875d8' | | 7 | 0.00057650 | SELECT HEX(guid) as guid,id,realm FROM repligard_bin WHERE guid=UNHEX('817cd724bc8e11dca45e0b66222a75d875d8') | | 8 | 0.00053175 | SELECT guid,id,realm FROM repligard WHERE guid='2df131ba895011dd89d383f0b77c3e2c3e2c' | | 9 | 0.00029725 | SELECT HEX(guid) as guid,id,realm FROM repligard_bin WHERE guid=UNHEX('2df131ba895011dd89d383f0b77c3e2c3e2c') | | 10 | 0.00023000 | SELECT guid,id,realm FROM repligard WHERE guid='b2175400712711de92bc3350a43365cb65cb' | | 11 | 0.00086900 | SELECT HEX(guid) as guid,id,realm FROM repligard_bin WHERE guid=UNHEX('b2175400712711de92bc3350a43365cb65cb') | | 12 | 0.00106700 | SELECT guid,id,realm FROM repligard WHERE guid='b91d206030d411deb29f897c8053de46de46' | | 13 | 0.00046800 | SELECT HEX(guid) as guid,id,realm FROM repligard_bin WHERE guid=UNHEX('b91d206030d411deb29f897c8053de46de46') | | 14 | 0.00004500 | SELECT guid,id,realm FROM repligard WHERE guid='67d07320a39e11dcb09c5d955494086f086f' | | 15 | 0.00002525 | SELECT HEX(guid) as guid,id,realm FROM repligard_bin WHERE guid=UNHEX('67d07320a39e11dcb09c5d955494086f086f') | | 16 | 0.00002850 | SELECT guid,id,realm FROM repligard WHEREguid='110e571c9d5111dcb7b5affb63d5b221b221' | | 17 | 0.00003000 | SELECT HEX(guid) as guid,id,realm FROM repligard_bin WHERE guid=UNHEX('110e571c9d5111dcb7b5affb63d5b221b221') | | 18 | 0.00002800 | SELECT guid,id,realm FROM repligard WHEREguid='817cd724bc8e11dca45e0b66222a75d875d8' | | 19 | 0.00002875 | SELECT HEX(guid) as guid,id,realm FROM repligard_bin WHERE guid=UNHEX('817cd724bc8e11dca45e0b66222a75d875d8') | | 20 | 0.00002725 | SELECT guid,id,realm FROM repligard WHERE guid='2df131ba895011dd89d383f0b77c3e2c3e2c' | | 21 | 0.00002825 | SELECT HEX(guid) as guid,id,realm FROM repligard_bin WHERE guid=UNHEX('2df131ba895011dd89d383f0b77c3e2c3e2c') | | 22 | 0.00002725 | SELECT guid,id,realm FROM repligard WHERE guid='b2175400712711de92bc3350a43365cb65cb' | | 23 | 0.00002775 | SELECT HEX(guid) as guid,id,realm FROM repligard_bin WHERE guid=UNHEX('b2175400712711de92bc3350a43365cb65cb') | | 24 | 0.00002700 | SELECT guid,id,realm FROM repligard WHERE guid='b91d206030d411deb29f897c8053de46de46' | | 25 | 0.00002625 | SELECT HEX(guid) as guid,id,realm FROMrepligard_bin WHERE guid=UNHEX('b91d206030d411deb29f897c8053de46de46') | | 26 | 0.00004475 | SELECT guid,id,realm FROM repligard WHERE guid='67d07320a39e11dcb09c5d955494086f086f' | | 27 | 0.00002875 | SELECT HEX(guid) as guid,id,realm FROMrepligard_bin WHERE guid=UNHEX('67d07320a39e11dcb09c5d955494086f086f') | | 28 | 0.00003350 | SELECT guid,id,realm FROM repligard WHERE guid='110e571c9d5111dcb7b5affb63d5b221b221' | | 29 | 0.00002775 | SELECT HEX(guid) as guid,id,realm FROM repligard_bin WHERE guid=UNHEX('110e571c9d5111dcb7b5affb63d5b221b221') | | 30 | 0.00002800 | SELECT guid,id,realm FROM repligard WHERE guid='817cd724bc8e11dca45e0b66222a75d875d8' | | 31 | 0.00002800 | SELECT HEX(guid) as guid,id,realm FROM repligard_bin WHERE guid=UNHEX('817cd724bc8e11dca45e0b66222a75d875d8') | | 32 | 0.00002700 | SELECT guid,id,realm FROM repligard WHERE guid='2df131ba895011dd89d383f0b77c3e2c3e2c' | | 33 | 0.00002725 | SELECT HEX(guid) as guid,id,realm FROM repligard_bin WHERE guid=UNHEX('2df131ba895011dd89d383f0b77c3e2c3e2c') | | 34 | 0.00002750 | SELECT guid,id,realm FROM repligard WHERE guid='b2175400712711de92bc3350a43365cb65cb' | | 35 | 0.00002675 | SELECT HEX(guid) as guid,id,realm FROM repligard_bin WHERE guid=UNHEX('b2175400712711de92bc3350a43365cb65cb') | | 36 | 0.00002800 | SELECT guid,id,realm FROM repligard WHERE guid='b91d206030d411deb29f897c8053de46de46' | | 37 | 0.00002550 | SELECT HEX(guid) as guid,id,realm FROM repligard_bin WHERE guid=UNHEX('b91d206030d411deb29f897c8053de46de46') | +----------+------------+---------------------------------------------------------------------------------------------------------------+Again the binary version is faster even when we get result from cache.
/Rambo
-
Re: [midgard-dev] Mjönir: storing GUIDs/UUIDs in binary
Wed July 22 2009 16:40:07 UTCHi!
I do not see any performance boost.
Here are my tests:
1481970 rows
mysql> create table guids (guid varchar(80));
Query OK, 0 rows affected (0.00 sec)
mysql> insert into guids (guid) SELECT guid from repligard;
Query OK, 1481970 rows affected (56.39 sec)
Records: 1481970 Duplicates: 0 Warnings: 0
mysql> create table guids_bin (guid BLOB);
Query OK, 0 rows affected (0.01 sec)
mysql> insert into guids_bin (guid) SELECT hex(guid) from repligard;
Query OK, 1481970 rows affected (2.04 sec)
Records: 1481970 Duplicates: 0 Warnings: 0
mysql> select count(guid) from guids;
+-------------+
| count(guid) |
+-------------+
| 1481970 |
+-------------+
1 row in set (0.28 sec)
mysql> select count(guid) from guids_bin;
+-------------+
| count(guid) |
+-------------+
| 1481970 |
+-------------+
1 row in set (0.31 sec)
Size:
-rw-rw---- 1 mysql mysql 8560 07-22 17:00 guids_bin.frm
-rw-rw---- 1 mysql mysql 118471204 07-22 17:01 guids_bin.MYD
-rw-rw---- 1 mysql mysql 1024 07-22 17:01 guids_bin.MYI
-rw-rw---- 1 mysql mysql 8560 07-22 16:57 guids.frm
-rw-rw---- 1 mysql mysql 65163488 07-22 17:00 guids.MYD
-rw-rw---- 1 mysql mysql 1024 07-22 17:00 guids.MYI
-rw-rw---- 1 mysql mysql 8,4K 07-22 17:00 guids_bin.frm
-rw-rw---- 1 mysql mysql 113M 07-22 17:01 guids_bin.MYD
-rw-rw---- 1 mysql mysql 1,0K 07-22 17:01 guids_bin.MYI
-rw-rw---- 1 mysql mysql 8,4K 07-22 16:57 guids.frm
-rw-rw---- 1 mysql mysql 63M 07-22 17:00 guids.MYD
-rw-rw---- 1 mysql mysql 1,0K 07-22 17:00 guids.MYI
mysql> CREATE INDEX guid_idx ON guids(guid);
Query OK, 1481970 rows affected (3.82 sec)
Records: 1481970 Duplicates: 0 Warnings: 0
mysql> CREATE INDEX guid_idx ON guids_bin(guid (80));
Query OK, 1481970 rows affected (9.60 sec)
Records: 1481970 Duplicates: 0 Warnings: 0
-rw-rw---- 1 mysql mysql 8,4K 07-22 17:17 guids_bin.frm
-rw-rw---- 1 mysql mysql 113M 07-22 17:17 guids_bin.MYD
-rw-rw---- 1 mysql mysql 113M 07-22 17:17 guids_bin.MYI
-rw-rw---- 1 mysql mysql 8,4K 07-22 17:14 guids.frm
-rw-rw---- 1 mysql mysql 63M 07-22 17:14 guids.MYD
-rw-rw---- 1 mysql mysql 59M 07-22 17:14 guids.MYI
mysql> select guid from guids limit 1 offset 765438;
+--------------------------------------+
| guid |
+--------------------------------------+
| 8186ea0039c711dd98e0bd1e904e6cc66cc6 |
+--------------------------------------+
1 row in set (0.46 sec)
mysql> select guid from guids where
guid='8186ea0039c711dd98e0bd1e904e6cc66cc6';
+--------------------------------------+
| guid |
+--------------------------------------+
| 8186ea0039c711dd98e0bd1e904e6cc66cc6 |
+--------------------------------------+
1 row in set (0.00 sec)
mysql> select unhex(guid) from guids_bin limit 1 offset 765438;
+--------------------------------------+
| unhex(guid) |
+--------------------------------------+
| 8186ea0039c711dd98e0bd1e904e6cc66cc6 |
+--------------------------------------+
1 row in set (0.13 sec)
mysql> select guid from guids limit 1 offset 765438;
+--------------------------------------+
| guid |
+--------------------------------------+
| 8186ea0039c711dd98e0bd1e904e6cc66cc6 |
+--------------------------------------+
1 row in set (0.46 sec)
mysql> select guid from guids where
guid='8186ea0039c711dd98e0bd1e904e6cc66cc6';
+--------------------------------------+
| guid |
+--------------------------------------+
| 8186ea0039c711dd98e0bd1e904e6cc66cc6 |
+--------------------------------------+
1 row in set (0.00 sec)
mysql> show profiles;
+----------+------------+------------------------------------------------------------------------------------------+
| Query_ID | Duration | Query
|
+----------+------------+------------------------------------------------------------------------------------------+
| 1 | 0.00035800 | select unhex(guid) from guids_bin where
guid=hex('8186ea0039c711dd98e0bd1e904e6cc66cc6') |
| 2 | 0.00027800 | select guid from guids where
guid='8186ea0039c711dd98e0bd1e904e6cc66cc6' |
+----------+------------+------------------------------------------------------------------------------------------+
Triple records:
mysql> insert into guids_bin (guid) SELECT hex(concat(guid, 'bintest'))
from repligard;
Query OK, 1481970 rows affected (1 min 39.57 sec)
Records: 1481970 Duplicates: 0 Warnings: 0
mysql> insert into guids_bin (guid) SELECT hex(concat(guid, 'bintest'))
from repligard;
Query OK, 1481970 rows affected (4 min 14.41 sec)
Records: 1481970 Duplicates: 0 Warnings: 0
mysql> insert into guids (guid) SELECT concat(guid, 'bintest') from
repligard;
Query OK, 1481970 rows affected (1 min 18.53 sec)
Records: 1481970 Duplicates: 0 Warnings: 0
mysql> insert into guids (guid) SELECT concat(guid, 'bintest') from
repligard;
Query OK, 1481970 rows affected (2 min 27.13 sec)
Records: 1481970 Duplicates: 0 Warnings: 0
mysql> select guid from guids where
guid='8186ea0039c711dd98e0bd1e904e6cc66cc6';
+--------------------------------------+
| guid |
+--------------------------------------+
| 8186ea0039c711dd98e0bd1e904e6cc66cc6 |
+--------------------------------------+
1 row in set (0.00 sec)
mysql> select unhex(guid) from guids_bin where
guid=hex('8186ea0039c711dd98e0bd1e904e6cc66cc6');
+--------------------------------------+
| unhex(guid) |
+--------------------------------------+
| 8186ea0039c711dd98e0bd1e904e6cc66cc6 |
+--------------------------------------+
1 row in set (0.00 sec)
Profiles;
0.00029300 | select guid from guids where
guid='8186ea0039c711dd98e0bd1e904e6cc66cc6' |
0.00031500 | select unhex(guid) from guids_bin where
guid=hex('8186ea0039c711dd98e0bd1e904e6cc66cc6') |
Piotras
_______________________________________________
dev mailing list
dev@lists.midgard-project.org
http://lists.midgard-project.org/mailman/listinfo/dev -
Re: [midgard-dev] Mjönir: storing GUIDs/UUIDs in binary
Wed July 22 2009 17:00:07 UTCPiotr Pokora writes:
> Hi!
>
> I do not see any performance boost.
> Here are my tests:
15 guid constraints used with IN operator:
0.00070600 | select guid from guids where guid IN
('8186ea0039c711dd98e0bd1e904e6cc66cc6',
'8186eaae313611dc8d0191189c9ae004e004',
'8186fb3e975711dc9a983face84444684468',
'8186fee8bb3011dc902a4d6860a894179417',
'8187021e8b4511dcb625714486f4990d990d',
'8187142c8a9a11db8b33c5f15e14e7a2e7a2', '818718760e7711dc9b225 |
0.00077800 | select unhex(guid) from guids_bin where guid IN
(hex('8186ea0039c711dd98e0bd1e904e6cc66cc6'),
hex('8186eaae313611dc8d0191189c9ae004e004'),
hex('8186fb3e975711dc9a983face84444684468'),
hex('8186fee8bb3011dc902a4d6860a894179417'),
hex('8187021e8b4511dcb625714486f4990d990d'), hex('8187142c8a9a11db8b33c |
Every table has 4,5M records.
Should I make another tests? I am bit confused cause I do not see and
performance difference.
Piotras
_______________________________________________
dev mailing list
dev@lists.midgard-project.org
http://lists.midgard-project.org/mailman/listinfo/dev -
Re: [midgard-dev] Mjönir: storing GUIDs/UUIDs in binary
Wed July 22 2009 17:15:06 UTCPiotr Pokora writes:
> Piotr Pokora writes:
>> Hi!
>>
>> I do not see any performance boost.
>> Here are my tests:
>
One more interesting thing:
-rw-rw---- 1 mysql mysql 8,4K 07-22 19:05 guids_bin.frm
-rw-rw---- 1 mysql mysql 385M 07-22 19:05 guids_bin.MYD
-rw-rw---- 1 mysql mysql 292M 07-22 19:06 guids_bin.MYI
-rw-rw---- 1 mysql mysql 8,4K 07-22 17:14 guids.frm
-rw-rw---- 1 mysql mysql 198M 07-22 18:06 guids.MYD
-rw-rw---- 1 mysql mysql 243M 07-22 18:12 guids.MYI
The more guids stored as binary the larger table is.
Piotras
_______________________________________________
dev mailing list
dev@lists.midgard-project.org
http://lists.midgard-project.org/mailman/listinfo/dev
