Joyent Manta Storage Service: Image Manipulation and Publishing Part 3
This Blog covers how to use the ImageMagick resize commands on the Joyent Manta Storage Service. I get a 4.7x speedup on Manta when converting images that are 0.25 Megapixel into thumbnails compared to my SSD drive equipped notebook. For the giant-sized Getty Open image originals I get a 72x speedup resizing them down to 0.25 Megapixels on Manta.
In Part 1 I introduced the Getty Open Image set we are computing on in this how-to. If you followed along with Part 2, you should be all set for this installment. If not, I have provided a catch-up script for the demo of thumbnail creation on Manta, so you can forge ahead!
Here you will:
- Create thumbnails on your local machine using the Unix
find
-exec
construct. - Create thumbnails on Manta using the
mfind
mjob
mpipe
pattern. - Learn how to share your Manta job example with
mjob share
. - Learn how to resize the very large Getty originals and preserve color like a pro.
Catch up!
If you skipped Part 2, let's get you on track first. Have a look at Trevor O's blog for quick Manta install information and very simple image resize examples. You will need to have the Node.js based Manta Command Line Utilities installed on your local Unix/Linux system.
Then you can run this catchup.sh
to populate the files you need on Manta and on your local Unix system to try the next steps:
Making Thumbnails Locally
On your local machine, you can make thumbnails out of the images individually using the convert
command like this. If you did Part 2, you probably left your files here ~/var/tmp/500x500_webp/getty
, otherwise if you used the catchup.sh
script above, it will report at the end where the files are.
Change to that directory and try one convert process:
convert.jpg -thumbnail 10000@ -strip -quality.png8.png
Here we are using the ImageMagick convert
command to resize images to a constant area. I use the constant area argument 10000@
because, as we saw in Part 2, the Getty Open image collection comes with a number of different aspect ratios.
To convert the whole batch into thumbnails on your local machine you can execute the convert
command with in the Unix find
command, as I explained for the One-Machine Unix Map in Part 2.
Here I start with the Unix time
command to output the overall time for the entire thumbnail creation process for all 4,596 JPEG images on my notebook. Your notebook will be CPU throttled with individual convert
jobs which each spawn multiple threads during this run when you try this:
real 5m13.196suser 4m51.707ssys 1m18.814s
And you can see the output
$ ls -al *.* | more-rw-r--r-- 1 cwvhogue staff 25720 10 Sep 15:14.jpg-rw-r--r-- 1 cwvhogue staff 5066 19 Sep 12:05.png-rw-r--r-- 1 cwvhogue staff 31520 10 Sep 15:14.jpg-rw-r--r-- 1 cwvhogue staff 5623 19 Sep 12:05.png
So it takes a little over 5 minutes on my Mac notebook (with an SSD drive!) to produce the entire set of thumbnails. But this is a tiny example.
Thumbnails on Manta - A Map Process
Thumbnail creation is, in MapReduce parlance, only a map step.
To create the thumbnails on Manta, you will need the the Manta directory full of JPEG files as set up in Part 2 or by the catchup.sh
script above.
So the Manta thumbnail creation command for this set of images is:
The convert
command in this case puts a file named simply out.png
into the POSIX read/write filesystem you have access to in /var/tmp
during the compute job.
At this stage, the out.png
file is not on the Manta Object store. To get it there, I use the && mpipe
construct. The mpipe
command is a Manta version of pipe
, and here I use it to directthe POSIX file output of the convert
command /var/tmp/out.png
to a Manta Object file named the same as the input *.jpg
file but with a bit of Unix shell to replace the .jpg
with .png
.
Here ${MANTA_INPUT_OBJECT%.*}
returns the original object file name with the .jpg
part removed, and the .png
appends the correct file name extension like this.
If you find these bits of shell commands confusing, try them out first like this:
$ export MIO=/foo/bar.jpg$ echo ${MIO%.*}.pngfoo/bar.png
This is the timed session for thumbnail creation job run on Manta, starting with the 0.25 Megapixel/50% quality JPEG versions of the Getty Open image set:
6bfce965-aa26-e7c8-8350-8101d41e995fadded 1000 inputs to 6bfce965-aa26-e7c8-8350-8101d41e995fadded 1000 inputs to 6bfce965-aa26-e7c8-8350-8101d41e995fadded 1000 inputs to 6bfce965-aa26-e7c8-8350-8101d41e995fadded 1000 inputs to 6bfce965-aa26-e7c8-8350-8101d41e995fadded 596 inputs to 6bfce965-aa26-e7c8-8350-8101d41e995freal 1m5.881suser 0m1.746ssys 0m0.222s$ mls -l /cwvhogue/public/getty | head-rwxr-xr-x 1 cwvhogue 25720 Sep 20 10:09.jpg-rwxr-xr-x 1 cwvhogue 4109 Sep 25 16:36.png-rwxr-xr-x 1 cwvhogue 31520 Sep 20 10:09.jpg-rwxr-xr-x 1 cwvhogue 4668 Sep 25 16:36.png...
Using the real time values above, the speedup here is from 313 seconds to 66 seconds. The user time for the Manta job reflects how much time my notebook spent on the job; 1.74 seconds.
While thisisn't a huge case, Manta comes out 4.74x faster than my notebook with SSD drive.
Sharing your Manta mjob
with the world
There is a cool Gist-like feature of mjob
- the mjob share
view. The mjob share
feature creates an html
page that lists the mjob
JSON equivalent, a sample of input and output files, and any errors that occurred. To share your mjob
, simply substitute your job code and this will make the html file in your /public/jobshares
directory on the Manta Storage Service. Then you can tweet it!
mjob share 6bfce965-aa26-e7c8-8350-8101d41e995f
creates this page:
us-east.manta.joyent.com/cwvhogue/public/jobshares/6bfce965-aa26-e7c8-8350-8101d41e995f/index.html
The Big Resize Job - The Getty Open Content Originals
So let's look at the huge case where I had to resize all the Getty Originals over 100GB of images - the same in number, but much larger in size.
With the full-size Getty Image set, (which are roughly 20MB each, but spanning sizes from 8MB to over 300MB - see Part 1), the ImageMagick convert command to make a single 0.25 Megapixel JPEG image from an original is this:
Here is the test file for the above example:
us-east.manta.joyent.com/mantademo/public/images/getty-open/originals.jpg
Preserving Color
Note that the -resize
command is surrounded by two -colorspace
commands. This construct applies the -resize
operation in RGB
space then converts it back to sRGB space
. This is done to preserve the image colors to ensure they are not lost by the resize operation, which is what would happen if it was left in sRGB
space. You can see on the right, that dark color areas are expanded and exaggerated on the resize operation without the -colorspace RGB
transformation
On the left is a close-up of the largest Getty Image resized after the -colorspace RGB
transformation:
convert.jpg -colorspace RGB -resize 250000@ -colorspace sRGB -quality 80 00099001_RGB.jpg
On the right is the same image resized without altering the colorspace settings:
convert.jpg -resize 250000@ -quality 80 00099001_sRGB.jpg
Running Hot With Local Resize of the Getty Originals
To process all the originals, I had to download them to my notebook, then run this command in the directory where I put them, again using the Unix find
-exec
construct.
So in the case of the originals, convert
is on run each file and the find construct processes the list of 4,596 images one at a time. This takes almost four hours to complete on my 2.66 GHz Intel Core i7 Mac notebook. I was not using the SSD drive this time - not enough space. The job uses all four cores of the processor as it is multithreaded. And there are errors reported by convert
on some of the image files.
Here is the time it took to run this resize job on 101.6Gb of the original Getty Open Content image data on my notebook:
real 228m23.300suser 541m11.777ssys 20m11.780s
Here are the results:
-rw-r--r-- 1 cwvhogue admin 23987315 24 Aug 17:23.jpg-rw-r--r-- 1 cwvhogue admin 48161 19 Sep 13:42 00000201.jpg-rw-r--r-- 1 cwvhogue admin 22151707 22 Aug 16:06.jpg-rw-r--r-- 1 cwvhogue admin 56954 19 Sep 13:42 00000301.jpg-rw-r--r-- 1 cwvhogue admin 19408035 22 Aug 16:06.jpg-rw-r--r-- 1 cwvhogue admin 44910 19 Sep 13:42 00000401.jpg
Feel like reproducing this yourself? Here is a Gist for downloading the Getty Open originals.
Resize of the Getty Originals on Joyent Manta
The Manta version of the ImageMagick resize command operating on the originals uses the same mfind
mjob
mpipe
pattern as did the thumbnail method above. The mjob
runs the ImageMagick convert
command called with the color preservation arguments. Note here I used the --memory 2048
to increase the memory of the Manta Storage Service compute job to 2GB.
47193818-1aab-c64b-bdd6-9440c2c039c1added 1000 inputs to 47193818-1aab-c64b-bdd6-9440c2c039c1added 1000 inputs to 47193818-1aab-c64b-bdd6-9440c2c039c1added 1000 inputs to 47193818-1aab-c64b-bdd6-9440c2c039c1added 1000 inputs to 47193818-1aab-c64b-bdd6-9440c2c039c1added 596 inputs to 47193818-1aab-c64b-bdd6-9440c2c039c1real 3m11.674suser 0m3.096ssys 0m0.357s
Again, I can share the results with you with mjob share
:
mjob share 47193818-1aab-c64b-bdd6-9440c2c039c1
creates this page:
us-east.manta.joyent.com/cwvhogue/public/jobshares/47193818-1aab-c64b-bdd6-9440c2c039c1/index.html
So the resize job on my Mac takes 13703 seconds and on Manta it takes 191 seconds.
Manta is 72x faster than my notebook at resizing 100Gb of image data.
$ mls -l /cwvhogue/public/art | head-rwxr-xr-x 1 cwvhogue 23987315 Aug 23 20:27.jpg-rwxr-xr-x 1 cwvhogue 48157 Sep 25 17:11 00000201_s.jpg-rwxr-xr-x 1 cwvhogue 22151707 Aug 23 20:27.jpg-rwxr-xr-x 1 cwvhogue 56967 Sep 25 17:11 00000301_s.jpg-rwxr-xr-x 1 cwvhogue 19408035 Aug 23 20:27.jpg-rwxr-xr-x 1 cwvhogue 44910 Sep 25 17:11 00000401_s.jpg
Want to try to reproduce this yourself? This Gist will move some (default 20) or all of the Getty Originals into your own Manta account.
That concludes Part 3. In Part 4 I will roll out a parameterized shell script with a loop that does small, medium and large resize operations, and directs the output into different Manta subdirectories. The script will also be used to extract the XML metadata buried in the JPEG
files which will be used with a MapReduce mjob
to craft a file of one-line annotations of each image in the set that you can search with grep
.
Post written by Christopher Hogue, Ph.D.