$Id: usage.txt 103 2006-01-08 18:07:53Z jojo $ In this document, some samples of the PyLZMA library will be given. First, we need to import the module:: >>> import pylzma The easiest usage is compression and decompression in one step:: >>> compressed = pylzma.compress('Hello world!') >>> pylzma.decompress(compressed) 'Hello world!' For compression, additional parameters can be specified:: >>> compressed = pylzma.compress('Hello world!', dictionary=10) >>> pylzma.decompress(compressed) 'Hello world!' Other available parameters are: dictionary Dictionary size (Range 0-28, Default: 23 (8MB)) The maximum value for dictionary size is 256 MB = 2^28 bytes. Dictionary size is calculated as DictionarySize = 2^N bytes. For decompressing file compressed by LZMA method with dictionary size D = 2^N you need about D bytes of memory (RAM). fastBytes Range 5-255, default 128 Usually big number gives a little bit better compression ratio and slower compression process. literalContextBits Range 0-8, default 3 Sometimes literalContextBits=4 gives gain for big files. literalPosBits Range 0-4, default 0 This switch is intended for periodical data when period is equal 2^N. For example, for 32-bit (4 bytes) periodical data you can use literalPosBits=2. Often it's better to set literalContextBits=0, if you change the literalPosBits switch. posBits Range 0-4, default 2 This switch is intended for periodical data when period is equal 2^N. algorithm Compression mode 0 = fast, 1 = normal, 2 = max (Default: 2) The lower the number specified for algorithm, the faster compression will perform. multithreading Use multithreading if available? (Default yes) Currently, multithreading is only available on Windows platforms. matchfinder Matchfinder algorithm to use. Possible values are bt2, bt3, bt4, bt4b, pat2r, pat2, pat2h, pat3h, pat4h, hc3, hc4 (Default: bt4). Compression ratio for all bt* and pat* almost the same. Algorithms from hc* group don't provide good compression ratio, but they often work pretty fast in combination with fast mode (algorithm=0). Methods from bt* group require less memory than methods from pat* group. Usually bt4 works faster than any pat*, but for some types of files pat* can work faster. Memory requirements depend from dictionary size (parameter "d" in table below). ===== ============ ======================================================= MF_ID Memory Description ===== ============ ======================================================= bt2 d*9.5 + 1MB Binary Tree with 2 bytes hashing. bt3 d*9.5 + 65MB Binary Tree with 2-3(full) bytes hashing. bt4 d*9.5 + 6MB Binary Tree with 2-3-4 bytes hashing. bt4b d*9.5 + 34MB Binary Tree with 2-3-4(big) bytes hashing. pat2r d*26 + 1MB Patricia Tree with 2-bits nodes, removing. pat2 d*38 + 1MB Patricia Tree with 2-bits nodes. pat2h d*38 + 77MB Patricia Tree with 2-bits nodes, 2-3 bytes hashing. pat3h d*62 + 85MB Patricia Tree with 3-bits nodes, 2-3 bytes hashing. pat4h d*110 +101MB Patricia Tree with 4-bits nodes, 2-3 bytes hashing. hc3 d*5.5 + 1MB Hash Chain with 2-3 bytes hashing. hc4 d*5.5 + 6MB Hash Chain with 2-3-4 bytes hashing. ===== ============ ======================================================= eos Should the `End Of Stream` marker be written? (Default yes) You can save some bytes if the marker is omitted, but the total uncompressed size must be stored by the application and used when decompressing: >>> compressed1 = pylzma.compress('Hello world!', eos=1) >>> compressed2 = pylzma.compress('Hello world!', eos=0) >>> len(compressed1) > len(compressed2) True >>> pylzma.decompress(compressed2) Traceback (most recent call last): ... ValueError: data error during decompression >>> pylzma.decompress(compressed2, maxlength=12) 'Hello world!' If you don't know the total uncompressed size, you can use the compatibility decompression function from pylzma version 0.0.3. Be aware that this old method is slower than the new decompression function, so you should use `pylzma.decompress` whenever possible. >>> pylzma.decompress_compat(compressed2) 'Hello world!' If you need to compress larger amounts of data, you should use the streaming version of the library. If supports compressing any file-like objects:: >>> from cStringIO import StringIO >>> fp = StringIO('Hello world!') >>> c_fp = pylzma.compressfile(fp, eos=1) >>> compressed = '' >>> while True: ... tmp = c_fp.read(1) ... if not tmp: break ... compressed += tmp ... >>> pylzma.decompress(compressed) 'Hello world!' Using a similar technique, you can decompress large amounts of data without keeping everything in memory:: >>> from cStringIO import StringIO >>> fp = StringIO(pylzma.compress('Hello world!')) >>> obj = pylzma.decompressobj() >>> plain = '' >>> while True: ... tmp = fp.read(1) ... if not tmp: break ... plain += obj.decompress(tmp) ... >>> plain += obj.flush() >>> plain 'Hello world!' However this only works for streams that contain the `End Of Stream` marker. You must provide the size of the decompressed data if you don't include the EOS marker:: >>> from cStringIO import StringIO >>> fp = StringIO(pylzma.compress('Hello world!', eos=0)) >>> obj = pylzma.decompressobj(maxlength=13) >>> plain = '' >>> while True: ... tmp = fp.read(1) ... if not tmp: break ... plain += obj.decompress(tmp) ... >>> plain += obj.flush() Traceback (most recent call last): ... ValueError: data error during decompression >>> obj.reset(maxlength=12) >>> fp.seek(0) >>> plain = '' >>> while True: ... tmp = fp.read(1) ... if not tmp: break ... plain += obj.decompress(tmp) ... >>> plain += obj.flush() >>> plain 'Hello world!' Please note that the compressed data is not compatible to the lzma.exe command line utility! To get compatible data, you can use the following utility function:: >>> import struct >>> from cStringIO import StringIO >>> def compress_compatible(data): ... c = pylzma.compressfile(StringIO(data)) ... # LZMA header ... result = c.read(5) ... # size of uncompressed data ... result += struct.pack('