This is a follow up on my last post about data compression. After encoded my numerical data in a compact CSV format, I apply data compression before storing it in the disk. I have done a quick study on the two algorithm available in standard Python library, gzip and bzip2. The result is shown below. The original message's size is 537,776 bytes.
Gzip compression Result
Compression Level | Compressed Size | Compress time | Decompress time |
9 | 183,019 | 179 ms | 5.51 ms |
6 | 184,532 | 125 ms | 5.48 ms |
3 | 203,105 | 38.2 ms | 5.54 ms |
Bzip2 compression Result
Compression Level | Compressed Size | Compress time | Decompress time |
9 | 152,283 | 84.3 ms | 29 ms |
6 | 152,283 | 84.9 ms | 29 ms |
3 | 157,065 | 80.6 ms | 26.9 ms |
1 | 166,949 | 79.8 ms | 26.7 ms |
Surprisingly, bzip compress faster than gzip at level 9. Unfortunately compression performance is the least important for me. Compression ratio and decompression performance is far more important. Compression is only done one time. But fetching and decompressing the data is going to be done many times. It is hard for me to choose between the better compression ratio of bzip or the faster decompression time of gzip. For now I think I will stick with gzip.
2010.04.21 comments -