3.1 KiB
% LZMA2 format
The LZMA2 format supports flushing, parallel encoding or decoding. Chunks of data that cannot be compressed are copied as such.
Dictionary Size
LZMA2 requires information about the size of the dictionary. This is provided by a single byte.
Bits | Mask | Description |
---|---|---|
0-5 | 0x3F | Dictionary Size |
6-7 | 0xC0 | Reserved for future use; Must be zero |
The dictionary size is encoded with a one-bit mantissa and five-bit exponent. The smallest dictionary size is 4 KiB and the biggest is 4 GiB
- 1 B.
Raw Value | Mantissa | Exponent | Dictionary size |
---|---|---|---|
0 | 2 | 11 | 4 KiB |
1 | 3 | 11 | 6 KiB |
2 | 2 | 12 | 8 KiB |
3 | 3 | 12 | 12 KiB |
... | ... | ... | ... |
36 | 2 | 29 | 1024 MiB |
37 | 3 | 29 | 1536 MiB |
38 | 2 | 30 | 2048 MiB |
39 | 3 | 30 | 3072 MiB |
40 | 2 | 31 | 4096 MiB - 1B |
For test purposes we add the dictionary size byte as first byte of an LZMA2 stream.
Chunks
An LZMA2 stream is a sequence of chunks. Each chunk is preceded by a control byte and other information.
Following the C implementation in the LZMA SDK the control byte can be described as such:
Chunk header | Description |
---|---|
00000000 |
End of LZMA2 stream |
00000001 U U |
Uncompressed chunk, reset dictionary |
00000010 U U |
Uncompressed chunk, no reset of dictionary |
100uuuuu U U C C |
LZMA, no reset |
101uuuuu U U C C |
LZMA, reset state |
110uuuuu U U C C S |
LZMA, reset state, new properties |
111uuuuu U U C C S |
LZMA, reset state, new properties, reset dictionary |
The symbols used are described by following table.
Symbol | Description |
---|---|
u | uncompressed size bit |
U | uncompressed size byte |
C | uncompressed size byte |
S | properties byte |
A dictionary reset requires always new properties. If this is an uncompressed chunk the properties need to be provided in the next compressed chunk. New properties require a reset of the state.
A dictionary reset puts the current position to zero. Uncompressed data is written into the dictionary.
The uncompressed size and compressed size are given in big-endian byte order. The values need to be incremented for the actual size. So a chunk with 1 byte uncompressed data will store size 0 in the uncompressed bits and bytes.
The properties byte provides the parameters pb, lc, lp using following formula:
S = (pb * 5 + lp) * 9 + lc
This is same encoding used for LZMA. For LZMA2 following condition has been introduced:
lc + lp <= 4.
The parameters are defined as follows:
Name | Range | Description |
---|---|---|
lc | [0,8] | number of literal context bits |
lp | [0,4] | number of literal pos bits |
pb | [0,4] | the number of pos bits |