ddr_lzo(1) | LZO de/compression plugin for dd_rescue | ddr_lzo(1) |
NAME
ddr_lzo - Data de/compression plugin for dd_rescue
SYNOPSIS
-L lzo[=option[:option[:...]]]
or
-L /path/to/libddr_lzo.so[=option[:option[:...]]]
DESCRIPTION
About
LZO is an algorithm that de/compresses data. It is tuned for speed (especially decompression speed) and trades the size of the compressed file for it to some degree. There are variants with slow compression (yet still very fast decompression) available though. See the algorithm parameter below.
This plugin has been written for dd_rescue and uses the plugin interface from it. See the dd_rescue(1) man page for more information on dd_rescue.
OPTIONS
Options are passed using dd_rescue option passing syntax: The name of the plugin (lzo) is optionally followed by an equal sign (=) and options are separated by a colon (:). the lzo plugin also allows for most options to be abbreviated to five or six letters. See the EXAMPLES section below.
Compression or decompression
The lzo dd_rescue plugin (subsequently referred to as just ddr_lzo
which reflects the variable parts of the filename libddr_lzo.so) choses
compression or decompression mode automatically if one of the input/output
files has an [lt]zo suffix; otherwise you may specify compr[ess] or
decom[press] parameters on the command line.
The parameter opt[imize] will tell ddr_lzo to do an optimization pass
after compression. This might speed up decompression by a few percent when
creating compressed data with high compression levels and large block
sizes.
The plugin also supports the parameter bench[mark] ; if it's specified, it will output some information about CPU usage and resulting compression or decompression bandwidth. (For small files, the numbers become meaningless due to jitter and limited time resolution -- ddr_lzo will skip the output if the numbers are very tiny.)
De/compression algorithm
The lzo plugin supports a number of the (de)compression algorithms
from liblzo2. You can specify which one you want to use by passing
algo=XXX , where XXX can be lzo1x_1, lzo1x_1_15, lzo1x_999,
lzo1x_1_11, lzo1x_1_12, lzo1y_1, lzo1y_999, lzo1f_1, lzo1f_999, lzo1b_1 ...
lzo1b_9, lzo1b_99, lzo1b_999, lzo2a_999. Pass algo=help to get a list
of available algorithms. Consult the liblzo documentation for more
information on the algorithms. Note that only the first three are supported
by lzop (it can decompress the first five though, as they're all handled by
the same decompression routine).
The default (lzo1x_1) is a good choice for fast compression and very fast
decompression and ensures compatibility with lzop. For higher compression
you might want to chose lzo1x_999, which is very slow but lzop compatible or
lzo2a_999, which is twice as fast, but not compatible with lzop.
Debugging
The debug flag will cause the ddr_lzo to output information about blocks and other internal data. It's meant for debugging purposes.
Finally there is also a flags=XXXX parameter. This sets the flags field in the header (default is 0x03000403) and is used for testing only. It is not sanity checked and you can easily set values that will break decompression or cause ddr_lzo to abort. Really only use for development purposes when you know meaning of the various bits.
Error recovery
On compression, when input bytes can't be read, ddr_lzo will encode holes in the compressed output file -- these will be skipped over on decompression.
On decompression, erroneous blocks can be detected by the
checksums (most often) or by the decompressor. The lzo plugin tries to
continue in that case if the block header that specifies de/compressed
lengths is intact. It will then result in a block being skipped over (hole)
and the decompression will be continued with the next block. This avoids
corrupt data to end up in the output file (or preexisting, potentially good
data there being overwritten).
The behaviour can be modified by specifying the nodisc[ard] option.
When given, the decompressor's output (filled up with zeros if too short for
the block) will be written to the output file. Even if we know that the data
is incorrect, with some luck, parts of the block may actually be valid.
When the block headers are corrupt, your situation is desperate, as you will have lost the remainder of the file. To recover pieces after such a block header corruption, ddr_lzo supports the search option. With it, the plugin will search the input file (starting from the position given in dd_rescue with -s) for data that looks like a block header and if a valid looking header is found, it will start decompressing from that position. (If you can't find the data you look for, you might actually study the output generated with the debug flag.)
Supported dd_rescue features
dd_rescue supports appending to files with the -x/--extend option. If ddr_lzo is loaded and the output file is an existing .lzo file, the new data will be appended in the format specified by the existing LZOP header. If the header does not indicate a multipart (archive) file, the EOF marker will be overwritten, so that a valid .lzo file is created. Otherwise a new part will be appended.
When dd_rescue can't read data or a sizable amount of zero-filled
data is found and the -a/--sparse option is active, then dd_rescue will
create sparse files (files with holes inside). This is an optimization to
save space -- the holes are interpreted as zeroes again on normal reads, so
this is transparent. The holes also can be useful to ensure that good data
is not overwritten with zeroes when data couldn't be read.
When the lzo module gets fed holes in compression mode, it will encode them in
the compressed output file in a special way (using lzop multipart feature,
as lzop unfortunately chokes on blocks with 0 compressed length). On
decompression, the holes will result in the data being jumped over again
(creating a hole in the output file, if no data preexists at the
location).
lzop compatibility
The plugin uses the lzo1x_1 algorithm by default (just like lzop
does by default) and generates adler32 checksums to allow detecting data
corruption. The compressed files are compatible with lzop and ddr_lzo should
handle files generated by lzop.
Multipart (archive) files from lzop are decompressed to ONE output file in the
order they are stored.
Multipart files created by the lzo plugin to encode holes will be extracted to
several files from lzop. The holes are encoded in the filenames (with a
sequence number and the hole size up to 1TB; use the timestamp for huge
holes), so a proper assembly of the fragments is possible even without
ddr_lzo.
lzop only supports the lzo1x_ family of algorithms. If you chose
another algorithm to compress data with ddr_lzo, it will set the
needed_version_to_extract field in the resulting lzop file to ddr_lzo's own
version (1.789) to indicate incompatibility with lzop (as of 1.03).
lzop by default uses block sizes of 256kiB (on Unix systems), but supports
de/compression with smaller block sizes as well. It needs to be recompiled
to support block sizes up to a possible maximum of 64MiB. Thus staying below
or at 256kiB is recommended; even when lzop compatibility is no concern,
blocks larger than 16MiB are not recommended, see below.
Blocksize considerations
When decompressing, the (soft) block size chosen in dd_rescue must
be sufficient (at least half the size of the blocksize used when
compressing); if you chose too small blocks, ddr_lzo will warn and exit.
For compression, the chosen (soft)blocksize in dd_rescue will determine the
size of blocks to be fed to the lzo??_?_compress() routines. Larger block
sizes will typically result in slightly better compression ratios, though
the returns on increasing the block size quickly diminish after 64k.
The default from dd_rescue (128kiB) is a good choice. It is NOT recommended to
increase the block size too much -- when an lzo file gets corrupted, at
least one block will be lost; larger blocks result in larger damage. Also,
blocks larger than 16MiB will not work well with the error tolerance
features of ddr_lzo. Also note that blocks larger than 256kiB need
recompilation of lzop if you want to be able to use lzop to process the .lzo
files; blocks larger than 64MiB prevent decompression even with a recompiled
lzop.
BUGS/LIMITATIONS
Maturity
The plugin is new as of dd_rescue 1.43. Do not yet rely on data
saved with ddr_lzo as the only backup for valuable data. Also expect some
changes to ddr_lzo in the not too distant future. (This should not break the
file format, as we're following lzop ....)
Compressed data is more sensitive to data corruption than plain data. Note
that the checksums (adler32 or crc32) in the lzop file format do NOT allow
to correct for errors; they just allow a somewhat reliable detection of data
corruption. (Ideally, a 32bit checksum just misses 1 out of 2^32
corruptions; on small changes, crc32 comes a bit closer to the ideal than
adler32. You may pass the crc32 option to use crc32 instead of
adler32 checksums at the expense of some speed -- unfortunately the crc32
polynomial for lzop/gzip/... is not the crc32c polynomial that has hardware
support on many CPUs these days.) Also note that the checksums are NOT
cryptographic hashes; a malicious attacker can easily find modifications of
data that do not alter the checksums. Use MD5 or better SHA-256/SHA-512 for
ensuring integrity against attackers. Use par2 or similar software to create
error correcting codes (Reed-Solomon / Erasure Codes) if you want to be able
to recover data in face of corruption.
Security
While care has been applied to check the result of memory allocations ..., the decompressor code has not been audited and only limited fuzzing has been applied to ensure it's not vulnerable to malicious data -- be careful when you process data from untrusted sources.
EXAMPLES
- dd_rescue -ptAL lzo=algo=lzo1x_1_15:compress,hash=alg=sha256 infile outfile
- compresses data from infile into outfile using the algorithm lzo1x_1_15 and calculates the sha256 hash value of outfile. outfile will have time stamp and access rights copied over from infile and it will be emptied before (if the file happens to exist). The output file won't have encoded holes; errors in the infile will result in zeros.
- dd_rescue -aL MD5,lzo=compr:bench,MD5,lzo=decompress,MD5 infile infile2
- will copy infile to infile2 compressing the data and decompressing it again on the fly. It will output MD5 hashes for the compressed data as well (though it's not stored) and for the two infiles -- the output should be identical, obviously. This command is rather artificial, used for testing. The -a flag makes dd_rescue detect zero blocks and create holes, thus testing hole encoding (sparse files) and decoding as well if the infile has sizable regions filled with zeros.
- dd_rescue -s1M -S0 -L lzo=search,nodiscard infile.lzo outfile
- will search for a lzop block header in infile.lzo starting at position 1MiB into the file and decompress the remainder of the file. On finding corrupted blocks, it will still write the output from the decompressor to outfile.
SEE ALSO
dd_rescue(1) liblzo2 documentation lzop(1)
AUTHOR
Kurt Garloff <kurt@garloff.de>
CREDITS
The liblzo2 library and algorithm has been written by Markus
Oberhumer.
http://www.oberhumer.com/opensource/lzo/
COPYRIGHT
This plugin is under the same license as dd_rescue: The GNU General Public License (GPL) v2 or v3 - at your option.
HISTORY
ddr_lzo plugin was first introduced with dd_rescue 1.43 (May 2014).
Some additional information can be found on
http://garloff.de/kurt/linux/ddrescue/
2014-05-12 | Kurt Garloff |