route/vendor/github.com/ulikunitz/xz/doc/LZMA2.md

3.1 KiB

% LZMA2 format

The LZMA2 format supports flushing, parallel encoding or decoding. Chunks of data that cannot be compressed are copied as such.

Dictionary Size

LZMA2 requires information about the size of the dictionary. This is provided by a single byte.

Bits Mask Description
0-5 0x3F Dictionary Size
6-7 0xC0 Reserved for future use; Must be zero

The dictionary size is encoded with a one-bit mantissa and five-bit exponent. The smallest dictionary size is 4 KiB and the biggest is 4 GiB

  • 1 B.
Raw Value Mantissa Exponent Dictionary size
0 2 11 4 KiB
1 3 11 6 KiB
2 2 12 8 KiB
3 3 12 12 KiB
... ... ... ...
36 2 29 1024 MiB
37 3 29 1536 MiB
38 2 30 2048 MiB
39 3 30 3072 MiB
40 2 31 4096 MiB - 1B

For test purposes we add the dictionary size byte as first byte of an LZMA2 stream.

Chunks

An LZMA2 stream is a sequence of chunks. Each chunk is preceded by a control byte and other information.

Following the C implementation in the LZMA SDK the control byte can be described as such:

Chunk header Description
00000000 End of LZMA2 stream
00000001 U U Uncompressed chunk, reset dictionary
00000010 U U Uncompressed chunk, no reset of dictionary
100uuuuu U U C C LZMA, no reset
101uuuuu U U C C LZMA, reset state
110uuuuu U U C C S LZMA, reset state, new properties
111uuuuu U U C C S LZMA, reset state, new properties, reset dictionary

The symbols used are described by following table.

Symbol Description
u uncompressed size bit
U uncompressed size byte
C uncompressed size byte
S properties byte

A dictionary reset requires always new properties. If this is an uncompressed chunk the properties need to be provided in the next compressed chunk. New properties require a reset of the state.

A dictionary reset puts the current position to zero. Uncompressed data is written into the dictionary.

The uncompressed size and compressed size are given in big-endian byte order. The values need to be incremented for the actual size. So a chunk with 1 byte uncompressed data will store size 0 in the uncompressed bits and bytes.

The properties byte provides the parameters pb, lc, lp using following formula:

S = (pb * 5 + lp) * 9 + lc

This is same encoding used for LZMA. For LZMA2 following condition has been introduced:

lc + lp <= 4.

The parameters are defined as follows:

Name Range Description
lc [0,8] number of literal context bits
lp [0,4] number of literal pos bits
pb [0,4] the number of pos bits