Using Linux/Cygwin to split files into chunks for transfer to CD

Written by James McDonald

September 29, 2008

Question: How can I tar up about 12GB of files on a linux box and then split the files into 700MB chucks?

Answer: You can use the following 2 methods on any GNU tooled box or using the Cygwin environment on Windows.

Using tar, gzip, split and cat – Method 1

Compressing & Splitting the files — Method 1
  1. Create an uncompressed Tar Archive of the files you want
    tar -cvf output.tar /path/to/files
  2. Create an md5sum so you can make sure it survives the split and re-join operations
    md5sum -b output.tar > md5sum.txt
  3. Split it into Chunks
    split -d -b 734003200 output.tar your_prefix
    Where “-d” means append a decimal suffix (00, 01, 02 etc)
    “-b” is the chunk size in Bytes (Type in xxxMB in Bytes into google for a conversion)
    650 megabytes = 681574400 Bytes
    700 megabytes = 734003200 Bytes
    “your_prefix” is whatever you want the chunk filenames to begin with
    Creates a series of files your_prefix00, your_prefix01 etc
  4. Gzip the Chunks (note for greater compression use bzip2
    gzip your_prefix*
  5. Copy the gzipped chunks to CD
Joining them back together again – Method 1
  1. Put all the chunks in one directory
  2. Unzip all the gzipped chunks
    gunzip your_prefix*
  3. Join the chunks
    cat your_prefix* >> mynewoutput.tar
  4. Using md5sum check the archive integrity using the previously created md5sum.txt< br/>
    md5sum -c md5sum.txt
  5. Finally untar to a new location
    tar -xvf mynewoutput.tar

Using zip – Method 2

Compressing & Splitting the files – Method 2
  1. Zip up the files into a zip
    zip -r zipfile.zip /path/to/files
  2. Use zipsplit to create chunks
    zipsplit -n 12000000 zipfile.zip

    Where -n is the number of bytes to make each chunk
  3. Copy to CD
Unzipping to a new location – Method 2
  1. Copy the zip chunks to the same directory
  2. Change directory to the root of the new location
  3. Unzip all the chunks individually
    for i in /path/to/zipfil[0-9][0-9].zip ; do unzip $i ; done

Data integrity

So how do you make sure that what you join back together again is the same as what you started with?

After tarring or zipping the original archive before splitting it run md5sum on it:
md5sum filename.extension > md5sum.txt
Then once you have split it up and re-joined it use the md5sum file to check it again:
md5sum -c md5sum.txt
The output of md5sum -c should say:
filename.zip: OK

1 Comment

  1. Sheddu

    This link was really very helpful. I found second zip/unzip method easy to use in ubuntu linux

    Reply

Submit a Comment

Your email address will not be published. Required fields are marked *

You May Also Like…

PDF Poster

PDF Poster

Start with a wide long image. This is just a screen shot of my desktop. widescreen.png Convert it to postscript using...

Scribus to PDF Print

Just using Scribus to design and print some lapel cards Regarding Scribus - It seems that the stable version will not...