Question: How can I tar up about 12GB of files on a linux box and then split the files into 700MB chucks?
Answer: You can use the following 2 methods on any GNU tooled box or using the Cygwin environment on Windows.
Using tar, gzip, split and cat - Method 1
Compressing & Splitting the files — Method 1
- Create an uncompressed Tar Archive of the files you want
tar -cvf output.tar /path/to/files
- Create an md5sum so you can make sure it survives the split and re-join operations
md5sum -b output.tar > md5sum.txt - Split it into Chunks
split -d -b 734003200 output.tar your_prefix
Where "-d" means append a decimal suffix (00, 01, 02 etc)
"-b" is the chunk size in Bytes (Type in xxxMB in Bytes into google for a conversion)
650 megabytes = 681574400 Bytes
700 megabytes = 734003200 Bytes
"your_prefix" is whatever you want the chunk filenames to begin with
Creates a series of files your_prefix00, your_prefix01 etc - Gzip the Chunks (note for greater compression use bzip2
gzip your_prefix*
- Copy the gzipped chunks to CD
Joining them back together again - Method 1
- Put all the chunks in one directory
- Unzip all the gzipped chunks
gunzip your_prefix*
- Join the chunks
cat your_prefix* >> mynewoutput.tar
- Using md5sum check the archive integrity using the previously created md5sum.txt< br/>
md5sum -c md5sum.txt
- Finally untar to a new location
tar -xvf mynewoutput.tar
Using zip - Method 2
Compressing & Splitting the files - Method 2
- Zip up the files into a zip
zip -r zipfile.zip /path/to/files
- Use zipsplit to create chunks
zipsplit -n 12000000 zipfile.zip
Where -n is the number of bytes to make each chunk - Copy to CD
Unzipping to a new location - Method 2
- Copy the zip chunks to the same directory
- Change directory to the root of the new location
- Unzip all the chunks individually
for i in /path/to/zipfil[0-9][0-9].zip ; do unzip $i ; done
Data integrity
So how do you make sure that what you join back together again is the same as what you started with?
After tarring or zipping the original archive before splitting it run md5sum on it:
md5sum filename.extension > md5sum.txt
Then once you have split it up and re-joined it use the md5sum file to check it again:
md5sum -c md5sum.txt
The output of md5sum -c should say:
filename.zip: OK
This link was really very helpful. I found second zip/unzip method easy to use in ubuntu linux