tar — how to create and extract tar.gz and tar.bz2 archives 

Introduction

The commonly employed archive types on Linux and Unix systems are tar, tar.gz and tar.bz2 archives. Note that tar.gz and tar.bz2 archives are simply gzip-ped and bzip(2)-ped tar archives, respectively. Working with these files is made very simple through the use of GNU tar utility, which is included as part of the base packages in modern distros. In this article, I'll show you how to create and extract compressed archives using tar.

So, how is tar used? Lets start by looking at some of the common tar operations and options.

man tar  # ©2007 linux.dsplabs.com.au

The following tar usage information is taken directly from the man pages.

Usage:
   tar <operation> [options]

Common Operations:
   -A, --catenate, --concatenate    append tar files to an archive
   -c, --create                     create a new archive
   -r, --append                     append files to the end of an archive
   -x, --extract, --get             extract files from an archive

Common Options:
   -f, --file [HOSTNAME:]F          use archive file or device F
                                    (default "-", meaning stdin/stdout)
   -j, --bzip2                      filter archive through bzip2,
                                    use to decompress .bz2 files
   -p, --preserve-permissions       extract all protection information
   -v, --verbose                    verbosely list files processed
   -z, --gzip, --ungzip             filter the archive through gzip,
                                    use to decompress .gz files

Note that the GNU tar utility does not require a preceding minus for the single letter options (thanks to correct, see comments below).

Creating archives

To create a tar archive the c switch is used. To further encode it using gzip compression the z option is also added, or for bzip2 compression the j switch is used instead. Note that tar program pipes its output into gzip and bzip2 tools in order to create the tar.gz and tar.bz2 archives, respectively. OK, to compress a directory called dir into dir.tar, dir.tar.gz and dir.tar.bz2 archives, the following commands are used, respectively.

tar cf dir.tar dir/  # ©2007 linux.dsplabs.com.au
tar czf dir.tar.gz dir/  # ©2007 linux.dsplabs.com.au
tar cjf dir.tar.bz2 dir/  # ©2007 linux.dsplabs.com.au

In the above examples, the use of the f option specifies that the compressed version of the dir directory is to be placed in a corresponding archive file. On the other hand, if the f option was not given (and thus the archive name was also omitted), then the stdout, i.e. your terminal screen, would be used as the output instead. This is useful if you want to pipe the output of your tar command into another Linux tool. Anyhow, lets have a look at the resulting archives as well as the original directory using ls -la.

total 424
drwx------ 3 kamil kamil   4096 Nov 27 22:39 .
drwx------ 3 kamil kamil   4096 Nov 27 22:34 ..
drwx------ 2 kamil kamil   4096 Nov 27 22:36 dir
-rw------- 1 kamil kamil 276480 Nov 27 22:39 dir.tar
-rw------- 1 kamil kamil  83330 Nov 27 22:39 dir.tar.gz
-rw------- 1 kamil kamil  45927 Nov 27 22:39 dir.tar.bz2

Lets also have a look at the size of the original directory using du -sh dir # ©2007 linux.dsplabs.com.au.

280K    dir

From the above shell output, you can see that by default tar only archives files without compressing them, while gzip and bzip2 filters achieve quite high compression. Note that bzip2 typically achieves better compression that gzip, although it might take longer time to do so. Also note, that in this case gzip and bzip2 filters achieve quite high compression ratios because the dir directory contains text files. If the directory contained already compressed files, say f.e. binary images compressed using the JPEG compression, then neither gzip nor bzip2 could do much more in terms of compression.

Extracting archives

Extracting archives is also very simple. Instead of the c switch the x is used and the archive name is given as the only other parameter. The commands for archive extraction shown below correspond to the archive creation commands given earlier.

tar xf dir.tar  # ©2007 linux.dsplabs.com.au
tar xzf dir.tar.gz  # ©2007 linux.dsplabs.com.au
tar xjf dir.tar.bz2  # ©2007 linux.dsplabs.com.au

Infact, in most cases, tar will figure out what archive type you are trying to extract (from their hex headers I suppose), so that the filter specifications are not really needed. Hence, the following still works fine.

tar xf dir.tar  # ©2007 linux.dsplabs.com.au
tar xf dir.tar.gz  # ©2007 linux.dsplabs.com.au
tar xf dir.tar.bz2  # ©2007 linux.dsplabs.com.au

However, if you explicitly specify the decoder, then tar will assume that that is the encoding of the given archive. If for whatever reason that is not the case, an error will occur.

The verbose mode

The v switch can be used to enable the verbose mode. This can be useful if you would like to see a list of files being compressed or extracted. For example, lets extract the dir.tar.gz archive, with verbose mode enabled, using the following command.

tar xvzf dir.tar.gz  # ©2007 linux.dsplabs.com.au

The above command produces a list of inflated files as shown in the following output.

dir/
dir/NVIDIA_DRIVER_README.txt
dir/NVIDIA_LICENSE.txt
dir/readme.txt

Some common errors

Lets take a look at some common error examples. As described previously, specifying incorrect filter type, f.e. using the following commands

tar xjf dir.tar  # ©2007 linux.dsplabs.com.au
tar xzf dir.tar.bz2  # ©2007 linux.dsplabs.com.au

results in the respective errors messages shown below.

bzip2: (stdin) is not a bzip2 file.
tar: Child returned status 2
tar: Error exit delayed from previous errors

gzip: stdin: not in gzip format
tar: Child returned status 1
tar: Error exit delayed from previous errors

Also, if a file becomes damaged, say truncated, then obviously an error will occur. Lets simulate such a corruption. Lets keep only the initial 100 bytes of the dir.tar.bz2 archive by using the following command.

head -c100 dir.tar.bz2 > corrup.tar.bz2  # ©2007 linux.dsplabs.com.au

If we now try to extract this archive,

tar xjf corrupt.tar.bz2  # ©2007 linux.dsplabs.com.au

then the following error message will be produced.

bzip2: Compressed file ends unexpectedly;
        perhaps it is corrupted?  *Possible* reason follows.
bzip2: Inappropriate ioctl for device
        Input file = (stdin), output file = (stdout)

It is possible that the compressed file(s) have become corrupted.
You can use the -tvv option to test integrity of such files.

You can use the `bzip2recover' program to attempt to recover
data from undamaged sections of corrupted files.

tar: Child returned status 2
tar: Error exit delayed from previous errors

Interestingly, we are told to check archives integrity, so lets do that.

bzip2 -tvv corrupt.tar.bz2  # ©2007 linux.dsplabs.com.au

The bzip2 utility tells us what we already know… we are missing part of the file.

corrupt.tar.bz2:    [1: huff+mtf file ends unexpectedly

You can use the `bzip2recover' program to attempt to recover
data from undamaged sections of corrupted files.

Lets try the second suggestion, the bzip2recover utility.

bzip2recover corrupt.tar.bz2  # ©2007 linux.dsplabs.com.au

Unfortunately, with only 100 bytes it is not possible to recover anything from the corrupted archive.

bzip2recover 1.0.3: extracts blocks from damaged .bz2 files.
bzip2recover: searching for block boundaries ...
   block 1 runs from 80 to 800 (incomplete)
bzip2recover: sorry, I couldn't find any block boundaries.

Similarly, if we truncate the gzip archvie, then gzip will inform us of unexpected end of file.

gzip: stdin: unexpected end of file
tar: Child returned status 1
tar: Error exit delayed from previous errors

On the other hand, truncating a tar archive does not cause error messages during extraction process. Obviously, only the files still fully contained (an not truncated/corrupted) in such an archive will be extracted. Such tuncation/corruption errors often occur when extracting archives downloaded from the Internet.


Did you find the above information useful and interesting? If so, please support this site by using the blog directory links at the bottom of this page. Thanks for your support!

If you have any Linux related problems or questions then please feel free to post them on our Linux Forums: http://linux.dsplabs.com.au/forums.




VPS Hosting Referral Code DZZCC3

Add me to Technorati Favorites Vote for me on Blog Catalog

14 Responses to “tar — how to create and extract tar.gz and tar.bz2 archives”

  1. correct Says:

    hi

    GNU tar does not require you to pass the minus sign for options, it is redundant.

    Also,

    recent (within the last few years) versions of GNU tar (as shipped for the majority of GNU/Linux distros) do not require you to specify the z or j flag when extracting archives

    eg instead of tar xfz foo.tar.gz or tar xfj foo.tar.bz2 you can simply say

    tar xf foo.tar.gz

    or
    tar xf foo.tar.bz2

  2. Kamil Says:

    Thanks correct,
    I have dropped the minus sign all together. I do mention that the j and z flags are optional when extracting archives (under the Extracting archives section).
    Cheers,
    Kamil

  3. Xeleema Says:

    Also, for those of us who find ourselves without a GNU version of tar present;

    compress (passthru with bzip2 or gzip )
    tar cvf - dirname | bzip2 -9 - > dirname.tar.bz2

    uncompress
    bzip2 -d

  4. Zim Says:

    "To further encode it using gzip compression the j option is also added"

    I believe that should be the "z" option

  5. Kamil Says:

    Fixed. Thanks Zim.

  6. majkel d?ekson Says:

    Super, dzieki Kamil!

  7. sindikat Says:

    >

    when i try to run

    $ tar xfj foo.tar.bz2

    instead of

    $ tar xjf foo.tar.bz2

    it throws an error:

    tar: j: Cannot open: No such file or directory
    tar: Error is not recoverable: exiting now

    I'm using GNU tar 1.23

  8. Kamil Says:

    hi sindikat, I am using GNU tar 1.20 and both 'tar xjf foo.tar.bz2' and 'tar xfj foo.tar.bz2' work fine for me.

  9. anurag Says:

    thnks for the beautiful representation of the info.

  10. spacerat Says:

    well it's because f needs to be followed by a filename, so it has to stay at the end. If it works both ways the your tar has probably been patched.

  11. Vinny Says:

    Very useful!
    Thanks and cheers from Brazil!

  12. George Baker Says:

    While I was backing up a partition on a system I got the following message:

    outbound2.img: file changed as we read it

    my question is did it stop reading the file at that point or did it continue reading the rest of the file, or did it start over on that file? And finally, how did it know that the file changed?

    Thanks

  13. Vamsi Says:

    Awesome !
    very good coverage of the tar command..

  14. mahotma Says:

    very good.tks.

Leave a Reply