Compress and Decompress Files in Linux with gzip and bzip2

This tutorial explains how to compress and decompress files in Linux along with the similarities and differences between gzip and bzip2 commands. Learn how to use the gzip, bzip2, gunzip and bunzip2 commands in Linux with practical examples.

A compressed file not only uses less disk space but also consumes less memory and network bandwidth when moved to another location. Linux contains several compression utilities. Among those, this tutorial discusses two most popular utilities; gzip and bzip2.

Similarities between gzip and bzip2

Both commands not only works in similar fashion but also use similar syntax and options to compress and decompress the files. For example, the gzip command uses following syntax.

#gzip [option] [file]

Just like the above syntax, the bzip2 command uses following syntax.

#bzip2 [option] [file]

Once compression is done, both commands replace the supplied source file with the compressed file. To decompress the compressed file, both commands also offer individual commands. These commands are gunzip and bunzip2 for gzip and bzip2 respectively. To decompress the compressed file, we can use the corresponding command or can use command’s inbuilt functionality.

Advertisements

Following table lists supported options and their descriptions.

Short option Long option Supported command Description
-h --help Both List all supported options
-d --decompress Both Decompress the compressed file
-f --force Both Overwrite existing output file
-t --test Both Test compressed file integrity
-c --stdout Both Write output to standard output device
-q --quiet Both Don’t display noncritical errors and warnings
-v --verbose Both Display verbose messages
-L --license Both The bzip2 displays both software version and license information. The gzip displays License information only.
-V --version Both The bzip2 displays both software version and license information. The gzip shows version information only.
-1 --fast Both The bzip2 sets block size to 100k. The gzip compresses faster
-9 --best Both The bzip2 sets block size to 900k. The gzip compresses better.
-z --compress bzip2 only Force compression
-k --keep bzip2 only Keep original file.
-s --small bzip2 only Use less memory
-l --list gzip only Display compressed and decompressed size
-n --no-name gzip only Do not save or restore original name and time stamp
-N --name gzip only Save or restore original name and time stamp
-r --recursive gzip only Operate recursively on directories
-S --suffix=SUF gzip only Use suffix SUF on compressed files

As we can see in above table: -

  • The options -h, -d, -f, -t, -c, -q and -v similarly work in both commands.
  • The options -1, -9 -L and -V work slightly different in both commands.
  • The options -z, -k and -s work only in the bzip2 command.
  • The options -l, -n, -S, -N and -r work only in the gzip command.

Besides command line options, there are few more differences between both commands. Following table lists those differences.

Differences between gzip command and bzip2 command

The gzip command The bzip2 command
It uses the DEFLATE algorithm. It uses the Burrows-Wheeler block sorting algorithm.
To denote the compressed file, it uses the extension .gz. To denote the compressed file, it uses the extension .bz2.
It compresses files at higher speed in comparison with the bzip2 command. It provides higher compression ratio in comparison with the gzip command.
It doesn’t provide any inbuilt functionality or associate program to recover the damaged .gz files. It provides an additional program bzip2recover that can recover the damaged .bz2 files.
For decompression, it provides the utility gunzip. For decompression, it provides the utility bunzip2.
It supports recursive compression. It doesn’t support recursive compression.

This tutorial is the first part of the article "Compressing and archiving explained in Linux". This tutorial explains following RHCSA/RHCE topic.

Archive, compress, unpack, and uncompress files using tar, star, gzip, and bzip2

Other parts of this article are following.

Tar command and Syntax Explained with Examples

This tutorial is the second part of the article. It explains basic usages of tar command with syntax and options.

Tar Command Examples in Linux

This tutorial is the last part of the article. It explains how to use the tar command in Linux with practical examples.

gzip, bzip2, gunzip and bunzip practical examples

Although both gzip and bzip2 commands use fairly simple and straightforward options, still if you forget any option or have any confusion about any option, you can list all supported options with the -h option.

To list all supported options of gzip command, use following command

#gzip –h

gzip command help

To list all supported options of gzip command, use following command

#bzip2 -h

bzip2 command help

Compressing and decompressing files

Compressing and decompressing files with gzip and bzip are relatively simple. To compress a file, simply specify its name (if file is located in same directory) or full path (if file is located in other directory) with these commands. For example to compress a file named file_a, we can use any one command from following commands.

#gzip file_a
#bzip2 file_a

As explained earlier, both commands replace the supplied file with the compressed file. So if we use gzip and bzip2 for compression, the supplied file file_a will be replaced with the compressed file file_a.gz and file_a.bz2 respectively.

To decompress the compressed file, we can use -d option with both commands or can use gunzip command and bunzip command if file is compressed with gzip and bzip2 respectively. For example, to decompress the file file_a.bz2, we can use any one command from following commands

#bzip2 -d file_a.bz2
#bunzip file_a.bz2

To decompress the file file_a.gz, we can use any one command from following commands.

#gzip -d file_a.gz
#gunzip file_a.gz

Following figure shows compression and decompression with gzip and gunzip commands.

compression and decompression with gzip command

Following figure shows compression and decompression with bzip2 and bunzip commands.

compression and decompression with bzip2

gzip vs bzip2 which provides higher compression ratio

The bzip2 provides higher compression ratio but take more time in compression. To verify it practically, let’s compress a file with both commands and compare the file size of compressed file.

comprare gzip with bzip2

As we can see in above figure, the file compressed with gzip is larger in size than the file compressed with bzip2.

It clearly shows that bzip2 provides more compression ratio than the gzip. If you need more proof, you can perform the same compression with -v option.

gzip vs bzip2

As we can in above figure, when we compressed the file file_a with bzip2, compression ratio was 62.58%. While when we compressed the same file with gzip, the compression ratio was 61.6%.

Redirecting output to a device or file

As we have seen above, by default both commands store output to a new compressed file. And once compression is done, both commands replace the supplied file with the compressed file.

If require, we can store output to any device, file or custom location. To send the output at custom location, the option –c is used. The option -c forces command to send output at standard output device (console) and keeps the original file intact.

Following figure shows an example. In this example, a small file is created and gzip command with option -c is used to compress it.

save gzip output to custom location

As we can see in above figure, if option -c is used, command writes output to the console.

We can use shell redirector (>) to store output in custom location. For example, following command compresses two files; small-file and small-file-2 in supplied sequence and writes the output to a new file small.gz.

#gzip -c small-file small-file-2 > small.gz

redirect output of gzip command

You can also use this feature to create a single compressed file from multiple files.

Getting information from a compressed file

The gzip command, if used with -l option, scans the supplied compressed file and lists following information about that file.

Compressed size, uncompressed size, compression ratio and uncompressed name

display compressed and decompressed size of file

This option only work with gzip command. The bzip2 doesn’t support this option.

Compressing files recursively

Use -r option with gzip command, to scan and compress all files from a directory and all of its sub-directories. For example, following command not only compresses all files of the directory named a_dir and but also recursively scans all of its sub-directories. If it finds any file in any sub-directory, it will also compress that file.

#gzip -r a_dir

We can also use this option with gunzip command to decompress all files form a directory and all of its sub-directories recursively.

#gunzip -r a_dir

compress directory with gzip

The bzip2 command neither supports this option nor provides any other option for recursive operation.

Keeping original file intact

By default, the bzip2 command replaces supplied input file with compressed output file. To keep input file intact, use -k option with bzip2 command. For example, following command keeps supplied file file_a along with the compressed output file.

#bzip2 -k file_a

keep original file after zip

This option only works with bzip2 command. The gzip command does not support this option.

Recovering damage compressed file

To recover the damage compressed file, bzip2 provides a separate tool known as bzip2recover. This tool scans damage file, skips corrupt data blocks and copy correct data blocks in a new file. To understand how this tool works, let’s take an example.

Create a compressed file with bzip2 and open it with a text editor. Add an extra line and save it. Now as, it contains text in both formats; compressed and decompressed, bzip2 treats it as a corrupt compressed file.

To repair this file, we can use bzip2recover tool. Once file is repaired, it can be decompressed with the bzip2.

Following figure shows this exercise.

recovering corrupt zip file

The bzip2recover tool works only with bzip2 compression. A file that is compressed with any other utility or tool can’t be repaired with it.

Adjusting speed and compression ratio

We can adjust the speed and the compression ratio in both commands. Both commands supports a scale of 1 to 9 where number 1 provides the highest speed but the lowest compression ratio while number 9 provides the highest compression ratio but the lowest speed. Compression ratio works inverse of the speed.

Default value is 6. To use any other value, specify that value as option.

Following figure shows, how changing this value can impact the compression ratio.

adjust compress ratio and speed

Compressing an already compressed file

When we compress a file, the information that is required to decompress it, is also stored with compressed data. If we compress it again, this information will be added again. Since data has already compressed in first time, it will remain unchanged in second time. So if we compress an already compressed file, we end up with a large file.

Although it’s a waste of time and space, but if require, you can compress an already compressed file again with -f option.

Let’s take an example. Compress a file with gzip and note down its size. Now compress it again. Since file has already compressed, gzip will not compress it again. Use option -f, to force it. Once file is compressed again, compare its size with noted size.

Following figure shows this exercise.

compressing an already compressed file

A file compressed two times, also need to be decompressed two times.

That’s all for this tutorial. If you like this tutorial, please don’t forget to share it with friends through your favorite social channel.

Advertisements

ComputerNetworkingNotes RHCE 7 Study Guide Compress and Decompress Files in Linux with gzip and bzip2