Technology

Compression and archival in linux

Explanation of common compression and archival techniques

Laxman Vijay
Laxman VijayFeb 3, 2023
Compression and archival in linux

File archiving is used when one or more files need to be transmitted or stored as efficiently as possible. Linux supports lots of file archival mechanisms. This article describes the most popular ones.

Before looking into those here's a quick definition of compression and archival

Compression: Makes the files smaller by removing redundant information.

Archival: Combines multiple files into one, which eliminates the overhead in individual files and makes the files easier to transmit.

Simply compression reduces size and archival combines files.

Compression algorithms:

  • gzip
  • bzip2
  • xz

gzip:

gzip (GNU zip) is a compression algorithm. It uses the Lempel-Ziv-Markov (LZMA) chain algorithm. It is quite fast but the file size may be larger.

gzip {filename}

The original file is deleted and replaced by the compressed file.

Decompression of gzipped files:

Decompression is done using gunzip command.

gunzip {gzipped filename}
  • Including -l flag will show the compression information without actually compressing/decompressing.

bzip2:

bzip2 uses a different compression algorithm called Burrows-Wheeler block sorting, which can compress files smaller than gzip at the expense of more CPU time.

bzip2 {filename}

Decompression of bzip2'ed files:

Decompression is done using bunzip2 command.

bunzip2 {bunzip2'ed filename}

xz:

xz also uses the LZMA algorithm. It has the benefits of both gzip and bzip2. It compresses quickly and also results in smaller file sizes.

xz {filename}

Decompression of xzed files:

Decompression is done using unxz command.

unxz {xzed filename}

Archival:

  • tar
  • zip

tar:

Tar is a short form of TApe Archive. The tar command takes in several files and creates a single output file that can be split up again into the original files. The tar archived file is often called a tarball.

tar -f {filename} {options} {files to archive}

The tar command has three modes (pass the appropriate flag to mention the mode):

  • Create: Make a new archive out of a series of files. (-c)
  • Extract: Extract files out of an archive. (-x)
  • List: Show the contents without extracting. (-t)

Tar can also compress the resulting archive using the above compression algorithms.

Provide any of the following flags to mention the compression algorithm.

  • gzip (-z)
  • bzip2 (-j)
  • xz (-J)

The -v flag can be provided for a verbose result.

Example:

tar -cvJf backup.tar.xz projects/

This creates a tarball of the projects folder that is compressed using xz with a filename of backup.tar.xz

The extension can be anything but it is generally preferred to name this way.

  • for gzip, it is .tar.gz or .tgz, .taz

  • for bzip2, it is tar.bz2 or .tb2, .tbz, .tbz2, .tz2

  • for xz, it is tar.xz

Listing:

You can list the contents inside the archive without actually extracting using the -t flag.

Example:

tar -tvf backup.tar.xz

This command lists the contents of backup.tar.xz

Unarchival:

Unarchival is done by passing -x flag.

Example:

tar -xvJf backup.tar.xz

This command extracts the file in the same folder. If you wish to change it, pass the -C flag. (which will change directory to the specified one, therefore the directory should be present.)

tar -C backups -xvJf backup.tar.xz

This will extract in backups folder.

In order to extract a specific file/folder from the archive, provide the relative path of the file/folder.

tar -xvJf backup.tar.xz projects/project1

This will extract only project/project1 from the archive.

It is important to note that -f flag should always precede the filename

Zip:

Zip is an archival and compression mechanism. It does both. The compression is lossless. The default compression algorithm is DEFLATE. It is more common than tarballs. It has builtin support in Windows and Mac. Therefore, it is more preferred for archival than tar.

zip {options} {output filename} {files to compress}

By default, zip will not compress recursively. Therefore in order to compress files/subfolders inside a folder, use -r flag.

Example:

zip files.zip temp*

This will compress all files starting with temp.

zip -r projects.zip projects

This will recursively compress all files inside the projects folder.

Unzip:

the unzip command is used to extract the contents of the zipped archive.

Example:

unzip projects.zip

To just list without extracting,

unzip -l projects.zip

To extract specific files/folders inside the zip,

unzip projects.zip projects/project1

This is similar to tar.

Thanks for reading :)

Share
Laxman Vijay

Staff Writer

Laxman Vijay

A software engineer who likes writing.

Learn the newest gaming tech updates with our weekly newsletter

Get inspired from our coverage of the latest trends and breakthroughs in the world of gaming and AI.

Your privacy is important to us. We promise not to send you spam!