Files
entropy/pylzma/doc/usage.txt
2009-04-02 13:45:25 +02:00

205 lines
6.3 KiB
Plaintext

$Id: usage.txt 103 2006-01-08 18:07:53Z jojo $
In this document, some samples of the PyLZMA library will be given.
First, we need to import the module::
>>> import pylzma
The easiest usage is compression and decompression in one step::
>>> compressed = pylzma.compress('Hello world!')
>>> pylzma.decompress(compressed)
'Hello world!'
For compression, additional parameters can be specified::
>>> compressed = pylzma.compress('Hello world!', dictionary=10)
>>> pylzma.decompress(compressed)
'Hello world!'
Other available parameters are:
dictionary
Dictionary size (Range 0-28, Default: 23 (8MB))
The maximum value for dictionary size is 256 MB = 2^28 bytes.
Dictionary size is calculated as DictionarySize = 2^N bytes.
For decompressing file compressed by LZMA method with dictionary
size D = 2^N you need about D bytes of memory (RAM).
fastBytes
Range 5-255, default 128
Usually big number gives a little bit better compression ratio and slower
compression process.
literalContextBits
Range 0-8, default 3
Sometimes literalContextBits=4 gives gain for big files.
literalPosBits
Range 0-4, default 0
This switch is intended for periodical data when period is equal 2^N.
For example, for 32-bit (4 bytes) periodical data you can use
literalPosBits=2. Often it's better to set literalContextBits=0, if you
change the literalPosBits switch.
posBits
Range 0-4, default 2
This switch is intended for periodical data when period is equal 2^N.
algorithm
Compression mode 0 = fast, 1 = normal, 2 = max (Default: 2)
The lower the number specified for algorithm, the faster compression will
perform.
multithreading
Use multithreading if available? (Default yes)
Currently, multithreading is only available on Windows platforms.
matchfinder
Matchfinder algorithm to use. Possible values are bt2, bt3, bt4, bt4b,
pat2r, pat2, pat2h, pat3h, pat4h, hc3, hc4 (Default: bt4).
Compression ratio for all bt* and pat* almost the same. Algorithms from hc*
group don't provide good compression ratio, but they often work pretty fast
in combination with fast mode (algorithm=0). Methods from bt* group require
less memory than methods from pat* group. Usually bt4 works faster than
any pat*, but for some types of files pat* can work faster.
Memory requirements depend from dictionary size (parameter "d" in table below).
===== ============ =======================================================
MF_ID Memory Description
===== ============ =======================================================
bt2 d*9.5 + 1MB Binary Tree with 2 bytes hashing.
bt3 d*9.5 + 65MB Binary Tree with 2-3(full) bytes hashing.
bt4 d*9.5 + 6MB Binary Tree with 2-3-4 bytes hashing.
bt4b d*9.5 + 34MB Binary Tree with 2-3-4(big) bytes hashing.
pat2r d*26 + 1MB Patricia Tree with 2-bits nodes, removing.
pat2 d*38 + 1MB Patricia Tree with 2-bits nodes.
pat2h d*38 + 77MB Patricia Tree with 2-bits nodes, 2-3 bytes hashing.
pat3h d*62 + 85MB Patricia Tree with 3-bits nodes, 2-3 bytes hashing.
pat4h d*110 +101MB Patricia Tree with 4-bits nodes, 2-3 bytes hashing.
hc3 d*5.5 + 1MB Hash Chain with 2-3 bytes hashing.
hc4 d*5.5 + 6MB Hash Chain with 2-3-4 bytes hashing.
===== ============ =======================================================
eos
Should the `End Of Stream` marker be written? (Default yes)
You can save some bytes if the marker is omitted, but the total uncompressed
size must be stored by the application and used when decompressing:
>>> compressed1 = pylzma.compress('Hello world!', eos=1)
>>> compressed2 = pylzma.compress('Hello world!', eos=0)
>>> len(compressed1) > len(compressed2)
True
>>> pylzma.decompress(compressed2)
Traceback (most recent call last):
...
ValueError: data error during decompression
>>> pylzma.decompress(compressed2, maxlength=12)
'Hello world!'
If you don't know the total uncompressed size, you can use the compatibility
decompression function from pylzma version 0.0.3. Be aware that this old
method is slower than the new decompression function, so you should use
`pylzma.decompress` whenever possible.
>>> pylzma.decompress_compat(compressed2)
'Hello world!'
If you need to compress larger amounts of data, you should use the streaming
version of the library. If supports compressing any file-like objects::
>>> from cStringIO import StringIO
>>> fp = StringIO('Hello world!')
>>> c_fp = pylzma.compressfile(fp, eos=1)
>>> compressed = ''
>>> while True:
... tmp = c_fp.read(1)
... if not tmp: break
... compressed += tmp
...
>>> pylzma.decompress(compressed)
'Hello world!'
Using a similar technique, you can decompress large amounts of data without
keeping everything in memory::
>>> from cStringIO import StringIO
>>> fp = StringIO(pylzma.compress('Hello world!'))
>>> obj = pylzma.decompressobj()
>>> plain = ''
>>> while True:
... tmp = fp.read(1)
... if not tmp: break
... plain += obj.decompress(tmp)
...
>>> plain += obj.flush()
>>> plain
'Hello world!'
However this only works for streams that contain the `End Of Stream` marker.
You must provide the size of the decompressed data if you don't include the
EOS marker::
>>> from cStringIO import StringIO
>>> fp = StringIO(pylzma.compress('Hello world!', eos=0))
>>> obj = pylzma.decompressobj(maxlength=13)
>>> plain = ''
>>> while True:
... tmp = fp.read(1)
... if not tmp: break
... plain += obj.decompress(tmp)
...
>>> plain += obj.flush()
Traceback (most recent call last):
...
ValueError: data error during decompression
>>> obj.reset(maxlength=12)
>>> fp.seek(0)
>>> plain = ''
>>> while True:
... tmp = fp.read(1)
... if not tmp: break
... plain += obj.decompress(tmp)
...
>>> plain += obj.flush()
>>> plain
'Hello world!'
Please note that the compressed data is not compatible to the lzma.exe command
line utility! To get compatible data, you can use the following utility
function::
>>> import struct
>>> from cStringIO import StringIO
>>> def compress_compatible(data):
... c = pylzma.compressfile(StringIO(data))
... # LZMA header
... result = c.read(5)
... # size of uncompressed data
... result += struct.pack('<Q', len(data))
... # compressed data
... return result + c.read()