Home>Computers>pbzip2, Multithreaded bzip2 application

pbzip2, Multithreaded bzip2 application

Since I have a MASSIVE amount of parallel processing at my disposal it can come in real handy when working with extremely large files. This week I recovered a full terrabyte drive that had a completely hosed NTFS partition on it.

After dd’ing the entire disk into an image and recovering the files it’s time to format the drive an put our recovered data back onto the disk. The corrupted drive is physically okay, it appears that the computer either caught a bad virus or a power outage corrupted the NTFS file tables.

I know the image can be compressed considerably, considering there was much space still free at the upper end of the imaged drive so I decided I could compress it enough using bzip2 to squeeze the 100gb of recovered data and the image back onto the drive to deliver to the customer. Normally this is a pretty hefty task, chewing a single terabyte file. After knowing that there were multithreaded tar and zip applications, I figured there would probably be one for bzip2 as well (zip files are limited to a max size, bzip2 is not)..  and sure enough there is, pbzip2, or parallel bzip2. Fortunately for us Sun fans, the OpenCSW repository has it as the whisp of pkgutil -i pbzip2 .. Sweet.

It’s funny because I actually started with the standard bzip2, and then a glance at the load meter triggered the thoughts, so stopping and restarting the process and things were moving MUCH faster.

PBZIP2 Compressing a 1TB Disk image dumped from dd

I really need to do a video demonstrating the system working under full load, because it’s still incredibly fast thanks to context switching.

It has been working for a few hours now, and the file size is 212GB .. I really wonder what the final recovery file size will be.. Thank goodness this isn’t an x86box and I don’t need to sit here and watch paint dry 😛 .. This website doesn’t seem any slower to me for sure.

The command structure is the same as their counterparts so the only hard part will be remembering to use them. If you are on Linux of course these are available to you as well!

I hope you enjoyed 🙂

Things like this have been giving me thoughts on x86 design lately. I was thinking about how awesome it would be to design an ASIC for x86 that acts similar to the 4 way crossbar (See NUMA) within old SGI machines, tying CPUs together over a high speed full duplex serial link. The ASIC can handle the PCI and RAM, and then provide a 4 way crossbar for CPUs, maybe even more dense, say 8 way so it can route other requests for other CPUs even if the pipelines are choked for the 2 local CPUs.

/Geekdream

Rate this post