The commonly employed archive types on Linux and Unix systems are
tar.bz2 archives. Note that
tar.bz2 archives are simply
tar archives, respectively. Working with these files is made very simple through the use of GNU tar utility, which is included as part of the base packages in modern distros. In this article, I'll show you how to create and extract compressed archives using
So, how is
tar used? Lets start by looking at some of the common
tar operations and options.
man tar # ©2007 linux.dsplabs.com.au
tar usage information is taken directly from the
Usage: tar <operation> [options] Common Operations: -A, --catenate, --concatenate append tar files to an archive -c, --create create a new archive -r, --append append files to the end of an archive -x, --extract, --get extract files from an archive Common Options: -f, --file [HOSTNAME:]F use archive file or device F (default "-", meaning stdin/stdout) -j, --bzip2 filter archive through bzip2, use to decompress .bz2 files -p, --preserve-permissions extract all protection information -v, --verbose verbosely list files processed -z, --gzip, --ungzip filter the archive through gzip, use to decompress .gz files
Note that the GNU
tar utility does not require a preceding minus for the single letter options (thanks to correct, see comments below).
To create a
tar archive the
c switch is used. To further encode it using
gzip compression the
z option is also added, or for
bzip2 compression the
j switch is used instead. Note that
tar program pipes its output into
bzip2 tools in order to create the
tar.bz2 archives, respectively. OK, to compress a directory called
dir.tar.bz2 archives, the following commands are used, respectively.
tar cf dir.tar dir/ # ©2007 linux.dsplabs.com.au tar czf dir.tar.gz dir/ # ©2007 linux.dsplabs.com.au tar cjf dir.tar.bz2 dir/ # ©2007 linux.dsplabs.com.au
In the above examples, the use of the
f option specifies that the compressed version of the
dir directory is to be placed in a corresponding archive file. On the other hand, if the
f option was not given (and thus the archive name was also omitted), then the
stdout, i.e. your terminal screen, would be used as the output instead. This is useful if you want to pipe the output of your
tar command into another Linux tool. Anyhow, lets have a look at the resulting archives as well as the original directory using
total 424 drwx------ 3 kamil kamil 4096 Nov 27 22:39 . drwx------ 3 kamil kamil 4096 Nov 27 22:34 .. drwx------ 2 kamil kamil 4096 Nov 27 22:36 dir -rw------- 1 kamil kamil 276480 Nov 27 22:39 dir.tar -rw------- 1 kamil kamil 83330 Nov 27 22:39 dir.tar.gz -rw------- 1 kamil kamil 45927 Nov 27 22:39 dir.tar.bz2
Lets also have a look at the size of the original directory using
du -sh dir # ©2007 linux.dsplabs.com.au.
From the above shell output, you can see that by default
tar only archives files without compressing them, while
bzip2 filters achieve quite high compression. Note that
bzip2 typically achieves better compression that
gzip, although it might take longer time to do so. Also note, that in this case
bzip2 filters achieve quite high compression ratios because the
dir directory contains text files. If the directory contained already compressed files, say f.e. binary images compressed using the JPEG compression, then neither
bzip2 could do much more in terms of compression.
Extracting archives is also very simple. Instead of the
c switch the
x is used and the archive name is given as the only other parameter. The commands for archive extraction shown below correspond to the archive creation commands given earlier.
tar xf dir.tar # ©2007 linux.dsplabs.com.au tar xzf dir.tar.gz # ©2007 linux.dsplabs.com.au tar xjf dir.tar.bz2 # ©2007 linux.dsplabs.com.au
Infact, in most cases,
tar will figure out what archive type you are trying to extract (from their hex headers I suppose), so that the filter specifications are not really needed. Hence, the following still works fine.
tar xf dir.tar # ©2007 linux.dsplabs.com.au tar xf dir.tar.gz # ©2007 linux.dsplabs.com.au tar xf dir.tar.bz2 # ©2007 linux.dsplabs.com.au
However, if you explicitly specify the decoder, then
tar will assume that that is the encoding of the given archive. If for whatever reason that is not the case, an error will occur.
The verbose mode
v switch can be used to enable the verbose mode. This can be useful if you would like to see a list of files being compressed or extracted. For example, lets extract the
dir.tar.gz archive, with verbose mode enabled, using the following command.
tar xvzf dir.tar.gz # ©2007 linux.dsplabs.com.au
The above command produces a list of inflated files as shown in the following output.
dir/ dir/NVIDIA_DRIVER_README.txt dir/NVIDIA_LICENSE.txt dir/readme.txt
Some common errors
Lets take a look at some common error examples. As described previously, specifying incorrect filter type, f.e. using the following commands
tar xjf dir.tar # ©2007 linux.dsplabs.com.au tar xzf dir.tar.bz2 # ©2007 linux.dsplabs.com.au
results in the respective errors messages shown below.
bzip2: (stdin) is not a bzip2 file. tar: Child returned status 2 tar: Error exit delayed from previous errors gzip: stdin: not in gzip format tar: Child returned status 1 tar: Error exit delayed from previous errors
Also, if a file becomes damaged, say truncated, then obviously an error will occur. Lets simulate such a corruption. Lets keep only the initial 100 bytes of the
dir.tar.bz2 archive by using the following command.
head -c100 dir.tar.bz2 > corrup.tar.bz2 # ©2007 linux.dsplabs.com.au
If we now try to extract this archive,
tar xjf corrupt.tar.bz2 # ©2007 linux.dsplabs.com.au
then the following error message will be produced.
bzip2: Compressed file ends unexpectedly; perhaps it is corrupted? *Possible* reason follows. bzip2: Inappropriate ioctl for device Input file = (stdin), output file = (stdout) It is possible that the compressed file(s) have become corrupted. You can use the -tvv option to test integrity of such files. You can use the `bzip2recover' program to attempt to recover data from undamaged sections of corrupted files. tar: Child returned status 2 tar: Error exit delayed from previous errors
Interestingly, we are told to check archives integrity, so lets do that.
bzip2 -tvv corrupt.tar.bz2 # ©2007 linux.dsplabs.com.au
bzip2 utility tells us what we already know… we are missing part of the file.
corrupt.tar.bz2: [1: huff+mtf file ends unexpectedly You can use the `bzip2recover' program to attempt to recover data from undamaged sections of corrupted files.
Lets try the second suggestion, the
bzip2recover corrupt.tar.bz2 # ©2007 linux.dsplabs.com.au
Unfortunately, with only 100 bytes it is not possible to recover anything from the corrupted archive.
bzip2recover 1.0.3: extracts blocks from damaged .bz2 files. bzip2recover: searching for block boundaries ... block 1 runs from 80 to 800 (incomplete) bzip2recover: sorry, I couldn't find any block boundaries.
Similarly, if we truncate the
gzip archvie, then
gzip will inform us of unexpected end of file.
gzip: stdin: unexpected end of file tar: Child returned status 1 tar: Error exit delayed from previous errors
On the other hand, truncating a
tar archive does not cause error messages during extraction process. Obviously, only the files still fully contained (an not truncated/corrupted) in such an archive will be extracted. Such tuncation/corruption errors often occur when extracting archives downloaded from the Internet.
Did you find the above information useful and interesting? If so, please support this site by using the blog directory links at the bottom of this page. Thanks for your support!
If you have any Linux related problems or questions then please feel free to post them on our Linux Forums: http://linux.dsplabs.com.au/forums.