205 lines
6.3 KiB
Plaintext
205 lines
6.3 KiB
Plaintext
$Id: usage.txt 103 2006-01-08 18:07:53Z jojo $
|
|
|
|
In this document, some samples of the PyLZMA library will be given.
|
|
|
|
|
|
First, we need to import the module::
|
|
|
|
>>> import pylzma
|
|
|
|
|
|
The easiest usage is compression and decompression in one step::
|
|
|
|
>>> compressed = pylzma.compress('Hello world!')
|
|
>>> pylzma.decompress(compressed)
|
|
'Hello world!'
|
|
|
|
|
|
For compression, additional parameters can be specified::
|
|
|
|
>>> compressed = pylzma.compress('Hello world!', dictionary=10)
|
|
>>> pylzma.decompress(compressed)
|
|
'Hello world!'
|
|
|
|
|
|
Other available parameters are:
|
|
|
|
dictionary
|
|
Dictionary size (Range 0-28, Default: 23 (8MB))
|
|
|
|
The maximum value for dictionary size is 256 MB = 2^28 bytes.
|
|
Dictionary size is calculated as DictionarySize = 2^N bytes.
|
|
For decompressing file compressed by LZMA method with dictionary
|
|
size D = 2^N you need about D bytes of memory (RAM).
|
|
|
|
fastBytes
|
|
Range 5-255, default 128
|
|
|
|
Usually big number gives a little bit better compression ratio and slower
|
|
compression process.
|
|
|
|
literalContextBits
|
|
Range 0-8, default 3
|
|
|
|
Sometimes literalContextBits=4 gives gain for big files.
|
|
|
|
literalPosBits
|
|
Range 0-4, default 0
|
|
|
|
This switch is intended for periodical data when period is equal 2^N.
|
|
For example, for 32-bit (4 bytes) periodical data you can use
|
|
literalPosBits=2. Often it's better to set literalContextBits=0, if you
|
|
change the literalPosBits switch.
|
|
|
|
posBits
|
|
Range 0-4, default 2
|
|
|
|
This switch is intended for periodical data when period is equal 2^N.
|
|
|
|
algorithm
|
|
Compression mode 0 = fast, 1 = normal, 2 = max (Default: 2)
|
|
|
|
The lower the number specified for algorithm, the faster compression will
|
|
perform.
|
|
|
|
multithreading
|
|
Use multithreading if available? (Default yes)
|
|
|
|
Currently, multithreading is only available on Windows platforms.
|
|
|
|
matchfinder
|
|
Matchfinder algorithm to use. Possible values are bt2, bt3, bt4, bt4b,
|
|
pat2r, pat2, pat2h, pat3h, pat4h, hc3, hc4 (Default: bt4).
|
|
|
|
Compression ratio for all bt* and pat* almost the same. Algorithms from hc*
|
|
group don't provide good compression ratio, but they often work pretty fast
|
|
in combination with fast mode (algorithm=0). Methods from bt* group require
|
|
less memory than methods from pat* group. Usually bt4 works faster than
|
|
any pat*, but for some types of files pat* can work faster.
|
|
|
|
Memory requirements depend from dictionary size (parameter "d" in table below).
|
|
|
|
===== ============ =======================================================
|
|
MF_ID Memory Description
|
|
===== ============ =======================================================
|
|
bt2 d*9.5 + 1MB Binary Tree with 2 bytes hashing.
|
|
bt3 d*9.5 + 65MB Binary Tree with 2-3(full) bytes hashing.
|
|
bt4 d*9.5 + 6MB Binary Tree with 2-3-4 bytes hashing.
|
|
bt4b d*9.5 + 34MB Binary Tree with 2-3-4(big) bytes hashing.
|
|
pat2r d*26 + 1MB Patricia Tree with 2-bits nodes, removing.
|
|
pat2 d*38 + 1MB Patricia Tree with 2-bits nodes.
|
|
pat2h d*38 + 77MB Patricia Tree with 2-bits nodes, 2-3 bytes hashing.
|
|
pat3h d*62 + 85MB Patricia Tree with 3-bits nodes, 2-3 bytes hashing.
|
|
pat4h d*110 +101MB Patricia Tree with 4-bits nodes, 2-3 bytes hashing.
|
|
hc3 d*5.5 + 1MB Hash Chain with 2-3 bytes hashing.
|
|
hc4 d*5.5 + 6MB Hash Chain with 2-3-4 bytes hashing.
|
|
===== ============ =======================================================
|
|
|
|
eos
|
|
Should the `End Of Stream` marker be written? (Default yes)
|
|
You can save some bytes if the marker is omitted, but the total uncompressed
|
|
size must be stored by the application and used when decompressing:
|
|
|
|
>>> compressed1 = pylzma.compress('Hello world!', eos=1)
|
|
>>> compressed2 = pylzma.compress('Hello world!', eos=0)
|
|
>>> len(compressed1) > len(compressed2)
|
|
True
|
|
|
|
>>> pylzma.decompress(compressed2)
|
|
Traceback (most recent call last):
|
|
...
|
|
ValueError: data error during decompression
|
|
|
|
>>> pylzma.decompress(compressed2, maxlength=12)
|
|
'Hello world!'
|
|
|
|
If you don't know the total uncompressed size, you can use the compatibility
|
|
decompression function from pylzma version 0.0.3. Be aware that this old
|
|
method is slower than the new decompression function, so you should use
|
|
`pylzma.decompress` whenever possible.
|
|
|
|
>>> pylzma.decompress_compat(compressed2)
|
|
'Hello world!'
|
|
|
|
|
|
If you need to compress larger amounts of data, you should use the streaming
|
|
version of the library. If supports compressing any file-like objects::
|
|
|
|
>>> from cStringIO import StringIO
|
|
>>> fp = StringIO('Hello world!')
|
|
>>> c_fp = pylzma.compressfile(fp, eos=1)
|
|
>>> compressed = ''
|
|
>>> while True:
|
|
... tmp = c_fp.read(1)
|
|
... if not tmp: break
|
|
... compressed += tmp
|
|
...
|
|
>>> pylzma.decompress(compressed)
|
|
'Hello world!'
|
|
|
|
|
|
Using a similar technique, you can decompress large amounts of data without
|
|
keeping everything in memory::
|
|
|
|
>>> from cStringIO import StringIO
|
|
>>> fp = StringIO(pylzma.compress('Hello world!'))
|
|
>>> obj = pylzma.decompressobj()
|
|
>>> plain = ''
|
|
>>> while True:
|
|
... tmp = fp.read(1)
|
|
... if not tmp: break
|
|
... plain += obj.decompress(tmp)
|
|
...
|
|
>>> plain += obj.flush()
|
|
>>> plain
|
|
'Hello world!'
|
|
|
|
|
|
However this only works for streams that contain the `End Of Stream` marker.
|
|
You must provide the size of the decompressed data if you don't include the
|
|
EOS marker::
|
|
|
|
>>> from cStringIO import StringIO
|
|
>>> fp = StringIO(pylzma.compress('Hello world!', eos=0))
|
|
>>> obj = pylzma.decompressobj(maxlength=13)
|
|
>>> plain = ''
|
|
>>> while True:
|
|
... tmp = fp.read(1)
|
|
... if not tmp: break
|
|
... plain += obj.decompress(tmp)
|
|
...
|
|
>>> plain += obj.flush()
|
|
Traceback (most recent call last):
|
|
...
|
|
ValueError: data error during decompression
|
|
|
|
>>> obj.reset(maxlength=12)
|
|
>>> fp.seek(0)
|
|
>>> plain = ''
|
|
>>> while True:
|
|
... tmp = fp.read(1)
|
|
... if not tmp: break
|
|
... plain += obj.decompress(tmp)
|
|
...
|
|
>>> plain += obj.flush()
|
|
>>> plain
|
|
'Hello world!'
|
|
|
|
|
|
Please note that the compressed data is not compatible to the lzma.exe command
|
|
line utility! To get compatible data, you can use the following utility
|
|
function::
|
|
|
|
>>> import struct
|
|
>>> from cStringIO import StringIO
|
|
|
|
>>> def compress_compatible(data):
|
|
... c = pylzma.compressfile(StringIO(data))
|
|
... # LZMA header
|
|
... result = c.read(5)
|
|
... # size of uncompressed data
|
|
... result += struct.pack('<Q', len(data))
|
|
... # compressed data
|
|
... return result + c.read()
|
|
|