Compression and archival in linux
Explanation of common compression and archival techniques
File archiving is used when one or more files need to be transmitted or stored as efficiently as possible. Linux supports lots of file archival mechanisms. This article describes the most popular ones.
Before looking into those here's a quick definition of compression and archival
Compression: Makes the files smaller by removing redundant information.
Archival: Combines multiple files into one, which eliminates the overhead in individual files and makes the files easier to transmit.
Simply compression reduces size and archival combines files.
Compression algorithms:
- gzip
- bzip2
- xz
gzip:
gzip (GNU zip) is a compression algorithm. It uses the Lempel-Ziv-Markov (LZMA) chain algorithm. It is quite fast but the file size may be larger.
gzip {filename}
The original file is deleted and replaced by the compressed file.
Decompression of gzipped files:
Decompression is done using gunzip
command.
gunzip {gzipped filename}
- Including
-l
flag will show the compression information without actually compressing/decompressing.
bzip2:
bzip2 uses a different compression algorithm called Burrows-Wheeler block sorting, which can compress files smaller than gzip at the expense of more CPU time.
bzip2 {filename}
Decompression of bzip2'ed files:
Decompression is done using bunzip2
command.
bunzip2 {bunzip2'ed filename}
xz:
xz also uses the LZMA algorithm. It has the benefits of both gzip and bzip2. It compresses quickly and also results in smaller file sizes.
xz {filename}
Decompression of xzed files:
Decompression is done using unxz
command.
unxz {xzed filename}
Archival:
- tar
- zip
tar:
Tar is a short form of TApe Archive. The tar command takes in several files and creates a single output file that can be split up again into the original files. The tar archived file is often called a tarball.
tar -f {filename} {options} {files to archive}
The tar command has three modes (pass the appropriate flag to mention the mode):
- Create: Make a new archive out of a series of files. (-c)
- Extract: Extract files out of an archive. (-x)
- List: Show the contents without extracting. (-t)
Tar can also compress the resulting archive using the above compression algorithms.
Provide any of the following flags to mention the compression algorithm.
- gzip (-z)
- bzip2 (-j)
- xz (-J)
The -v
flag can be provided for a verbose result.
Example:
tar -cvJf backup.tar.xz projects/
This creates a tarball of the projects
folder that is compressed using xz with a filename of backup.tar.xz
The extension can be anything but it is generally preferred to name this way.
for gzip, it is
.tar.gz
or.tgz
,.taz
for bzip2, it is
tar.bz2
or.tb2
,.tbz
,.tbz2
,.tz2
for xz, it is
tar.xz
Listing:
You can list the contents inside the archive without actually extracting using the -t
flag.
Example:
tar -tvf backup.tar.xz
This command lists the contents of backup.tar.xz
Unarchival:
Unarchival is done by passing -x
flag.
Example:
tar -xvJf backup.tar.xz
This command extracts the file in the same folder. If you wish to change it, pass the -C
flag. (which will change directory to the specified one, therefore the directory should be present.)
tar -C backups -xvJf backup.tar.xz
This will extract in backups folder.
In order to extract a specific file/folder from the archive, provide the relative path of the file/folder.
tar -xvJf backup.tar.xz projects/project1
This will extract only project/project1 from the archive.
It is important to note that -f
flag should always precede the filename
Zip:
Zip is an archival and compression mechanism. It does both. The compression is lossless. The default compression algorithm is DEFLATE. It is more common than tarballs. It has builtin support in Windows and Mac. Therefore, it is more preferred for archival than tar.
zip {options} {output filename} {files to compress}
By default, zip will not compress recursively. Therefore in order to compress files/subfolders inside a folder, use -r
flag.
Example:
zip files.zip temp*
This will compress all files starting with temp.
zip -r projects.zip projects
This will recursively compress all files inside the projects folder.
Unzip:
the unzip
command is used to extract the contents of the zipped archive.
Example:
unzip projects.zip
To just list without extracting,
unzip -l projects.zip
To extract specific files/folders inside the zip,
unzip projects.zip projects/project1
This is similar to tar.
Thanks for reading :)