munpack — decode base64 mime multi-part email attachments 

The other day I went to scan a document using our Ricoh Aficio 3030 all-in-one photocopier, scanner and fax. The machine allows you to scan documents which then are emailed to you as attachments in PDF format. Sounds great in theory. Unfortunately, it has one limitation (possibly imposed on it by our network admins): the maximum attachment size is set to 2 MB, that is Aficio 3030 splits attachments greater than 2 MB into multi-part attachments and sends each in a separate email. Thus, after scanning a four page document I got two emails as shown in the screen-shot below.
Multi-part MIME attachments: Thunderbird emails (picture)
Each of the two emails contains an ASCII attachment file, called just that: attachment. I have saved these into two separate files named: attachment_part1.mime and attachment_part2.mime. Lets see their size by running:

ls -lah attachment_part*

this gives us:

-rw------- 1 kamil kamil 2.0M 2009-02-03 18:17 attachment_part1.mime
-rw------- 1 kamil kamil 354K 2009-02-03 18:17 attachment_part2.mime

The first file is 2 MB and the second is 354 KB. Lets take a look at the contents of attachment_part1.mime using a text editor:

gedit attachment_part1.mime

The result is shown below:
Multi-part MIME attachment: gedit attachment_part1.mime (picture)
So the first attachment file contains a header, that details some useful information, as well as a data chunk that looks like some random gibberish, while the second attachment contains the remainder of the data chunk. Great, but how do I get my scanned pdf? Unfortunately, as far as I know Thunderbird does not have an inbuilt functionality for combining and decoding multi-part MIME attachments from multiple emails. On a side note, it would make an interesting project to write a Thunderbird plugin that would add this functionality using JavaScript.

Decoding base64 MIME multi-part email attachments using mpack tools and munpack

The two attachment files (attachment_part1.mime and attachment_part2.mime) contain parts of the scanned pdf file encoded using the MIME encoding. We can decode the scanned PDF using for example mpack tools and more specifically using the munpack utility that comes with these tools. To install mpack tools on Ubuntu use apt:

sudo apt-get install mpack

On other distors use your favorite package manger. Once installed, we can simply use linux cat command and pipe the contents of the two attachment files into the munpack utility, which will take care of the rest for us:

cat attachment_part1.mime attachment_part2.mime |munpack

The output is as follows:

munpack: reading from standard input
scan.pdf (application/pdf)

That is it, we now have scan.pdf, lets see its size:

 ls -lah size.pdf

which gives

-rw------- 1 kamil kamil 1.8M 2009-02-03 18:17 scan.pdf

So in MIME encoding the total size of the scanned file is 2.35 MB, while in the original binary format the size is 1.8 MB. This makes sense, since MIME encoding potentially sacrifices a compact representation in favor of a purely ASCII representation. OK, now you can use your favorite PDF viewer to check that the file was decoded successfully:

acroread scan.pdf

I have used Adobe acrobat reader, the screen-shot is shown below:
Multi-part MIME attachment: decoded PDF in acroread (picture)

Decoding base64 MIME multi-part email attachments using PHP

Just for fun, we could also use PHP to do the decoding. Before we do that, let us take a look at the first 160 bytes of the scan.pdf file using linux head command:

head -c 160 scan.pdf

The output is:

%PDF-1.3
%Æáóè
4 0 obj
<</Type/XObject
/Subtype/Image
/Width 3308
/Height 4677
/BitsPerComponent 8
/ColorSpace/DeviceGray
/Filter[/DCTDecode]
/Length 907081
>>

As defined in the PDF file format, we can see that there is a header which tells us that the PDF version is 1.3. The scan.pdf file also contains an object definition. The object type is an image with specified properties, such as width of 3308 pixels, height of 4677 pixels, 8 bits per pixel encoding and some others. What follows in the scan.pdf file is the binary data for that image object, i.e. the image pixel values encoded in some way.

Lets create an ASCII file that contains only the data chunks from the attachment_part1.mime and attachment_part2.mime. Lets call this file attachment_all_parts.base64. The first three lines of that file can be listed using linux head command:

head -n 3 attachment_all_parts.base64

which gives the following output:

JVBERi0xLjMKJZKgoooKNCAwIG9iago8PC9UeXBlL1hPYmplY3QKL1N1YnR5cGUvSW1hZ2UK
L1dpZHRoIDMzMDgKL0hlaWdodCA0Njc3Ci9CaXRzUGVyQ29tcG9uZW50IDgKL0NvbG9yU3Bh
Y2UvRGV2aWNlR3JheQovRmlsdGVyWy9EQ1REZWNvZGVdCi9MZW5ndGggOTA3MDgxCj4+CnN0

OK, so here is how we could use a PHP client to do MIME base64 decoding from shell of the above three lines (less the newline characters):

php -r 'echo base64_decode($argv[1]);' 'JVBERi0xLjMKJZKgoooKNCAwIG9iago8PC9UeXBlL1hPYmplY3QKL1N1YnR5cGUvSW1hZ2UKL1dpZHRoIDMzMDgKL0hlaWdodCA0Njc3Ci9CaXRzUGVyQ29tcG9uZW50IDgKL0NvbG9yU3BhY2UvRGV2aWNlR3JheQovRmlsdGVyWy9EQ1REZWNvZGVdCi9MZW5ndGggOTA3MDgxCj4+CnN0'

And the output is:

%PDF-1.3
%Æáóè
4 0 obj
<</Type/XObject
/Subtype/Image
/Width 3308
/Height 4677
/BitsPerComponent 8
/ColorSpace/DeviceGray
/Filter[/DCTDecode]
/Length 907081
>>

which is exactly the same thing as we got before by using head -c 160 scan.pdf. This was an easy way for us to verify that the PHP based MIME base64 decoding works fine. Here is how we would read base64 ASCII MIME string from the attachment_all_parts.base64 file and redirect it to the output pdf:

php -r '$fh = fopen($argv[1], ''r''); $theData=fread($fh, filesize($argv[1])); fclose($fh); echo base64_decode($theData);' 'attachment_all_parts.base64' > scan.pdf

The output is again scan.pdf file.

Decoding base64 MIME with GNU base64 decoder

Another way to decode base64 MIME data is by using the GNU base64 decoder, which is part of GNU core utilities and thus should already be installed on your system. To check if the start of the PDF file is getting decoded correctly use the following:

head -c 214 attachment_all_parts.base64 |base64 -d

the output is again:

%PDF-1.3
%????
4 0 obj
<</Type/XObject
/Subtype/Image
/Width 3308
/Height 4677
/BitsPerComponent 8
/ColorSpace/DeviceGray
/Filter[/DCTDecode]
/Length 907081
>>

Note that 160 bytes of the decoded content roughly corresponds to 214 bytes of the MIME encoded content (i.e. ~160/3*4). This is because, loosely speaking, base64 MIME encoding maps 3 binary Bytes to 4 ASCII characters (i.e. to 4 binary Bytes). To be a bit more precise, base64 MIME encoding maps 6 arbitrary bits to 8 bits (where 8 bits is 1 Byte, i.e. an ASCII character). Since there are 2 to the power of 6, i.e. 64, possible 6 bit representations they are mapped to a subset of 64 ASCII characters, hence base 64.

Now to decode the whole PDF file using GNU base64 use

cat attachment_all_parts.base64 |base64 -d >scan.pdf

note that -d option stands for decode (equivalently –decode could be used instead) as the base64 tool can also be used for encoding, i.e. when -e (or –encode) switch is used. See the base64 man page for details:

man base64

Other ways to decode base64 MIME data: mimedecode, C/C++, JavaScript and Perl

Curiously enough there is another tool in linux that can decode MIME encoded data. It is called mimedecode and it is used as follows:

mimedecode [-h|-d ] < encoded_msg > decoded_msg

Another way would be to write your own decoder in C or C++. As an example consider the b64 tool written by Bob Trower and available at base64.sourceforge.net. B64 is a complete solution for encoding and decoding base64 MIME data. It can be used from shell, it has error checking and shell option parsing. B64 could be used to encode and decode my name as follows:

echo "kamil" |b64 -e            # just the encoding part
echo "kamil" |b64 -e |b64 -d    # encoding followed by decoding

which gives:

a2FtaWwK
kamil

where the first string is the base64 MIME encoded version of the second string.

As an exercise, I have simplified b64 to only include MIME decoding functionality and have no shell options or much error checking. The result is a very simple base 64 decoder, b64d. The source code, b64d.c, is shown below:

 
// Based on code by Bob Trower (url: http://base64.sourceforge.net/b64.c)
 
#include <stdlib.h>
#include <stdio.h>
 
// translation table for decoding
const char cd64[] =
    "|$$$}rstuvwxyz{$$$$$$$>?@ABCDEFGHIJKLMNOPQRSTUVW$$$$$$XYZ[\\]^_`abcdefghijklmnopq";
 
// decode 4 '6-bit' characters into 3 8-bit binary bytes
inline void decode_block(char in[4], char out[3]) {
    out[0] = in[0]<<2 | in[1]>>4;
    out[1] = in[1]<<4 | in[2]>>2;
    out[2] = ((in[2]<<6) & 0xc0) | in[3];
}
 
// populate the base64 mime buffer
char populate_block(char in[4]) {
    static int i;
    static char ch;
    for(i=0; i<4; i++) {
        while(((ch=getchar())!=EOF) && (ch==0x0D || ch==0x0A || ch=='$'));
        in[i] = ((ch<43 || ch>122) ? 0 : cd64[ch-43])-62;
    }
    return ch;
}
 
// main: process input from stdin and output to stdout
int main(int argc, char *argv[]) {
    char in[4], out[3];
    int i;
    while(populate_block(in) != EOF) {
        decode_block(in, out);
        for(i=0; i<3; i++) putchar(out[i]);
    }
    return 0;
}
 

The above can be compiled using:

gcc -O3 b64d.c -o b64d

and run like so:

cat attachment_all_parts.base64 |./b64d >scan.pdf

Yet another way would be to use JavaScript, see for example Base64 Encoder and Decoder with JavaScript. Also, PERL would be a very good candidate, and in particular the MIME::Decoder module with the following simple filter shown as an example usage:

 
#!/usr/bin/perl -lw
 
use MIME::Decoder;
 
$decoder = new MIME::Decoder 'quoted-printable' or die "unsupported";
$decoder->decode(\*STDIN, \*STDOUT);
 

OK, this should suffice for now. Hope you have enjoyed this walk-through on decoding multi-part MIME email attachments.


Did you find the above information useful and interesting? If so, please support this site by using the blog directory links at the bottom of this page. Thanks for your support!

If you have any Linux related problems or questions then please feel free to post them on our Linux Forums: http://linux.dsplabs.com.au/forums.




VPS Hosting Referral Code DZZCC3

Add me to Technorati Favorites Vote for me on Blog Catalog

9 Responses to “munpack — decode base64 mime multi-part email attachments”

  1. illu Says:

    Another Way:
    uudeview FILENAME extracts all Files

  2. Miguel Says:

    Thank you - saved my day ;-)

  3. mk Says:

    Thanks, this saved my day too :)

  4. vossad01 Says:

    This can be done with no extra software too.

    cat attachment_part1.mime attachment_part2.mime > attachment.eml

    Then open attachment.eml with Thunderbird

  5. dowel Says:

    Thank you very much. My day has been saved, too!

  6. ???? Decode Base64 Mail MIME ???? Munpack | Just My Life Says:

    […] : http://linux.dsplabs.com.au/munpack-mime-base64-multi-part-attachment-php-perl-decode-email-pdf-p82/ [Translate] […]

  7. bertelle nicolas Says:

    hi,

    i tested your script with success.
    maybe you can go faster using openssl to encode/decode base64 contents

    nice article at all ;)

    @+++

  8. Will Says:

    Thanks for the pointers.

    Munpack just saved me a lot time!

  9. Imap PDF Anlagen speichern - php.de Says:

    […] Ich habe hier was interressantes gefunden! Komme trotzdem zwar nicht weiter, aber vielleicht ist das ein Denkhilfe http://linux.dsplabs.com.au/munpack-…email-pdf-p82/ […]

Leave a Reply