Since it's been over a decade since I've done anything computer vision related, I need to acquaint myself with progress in the space. I figure reproducing a current paper should get me in the zone; furthermore, I'm familiar with
latent Dirichlet allocation (LDA) as applied to text and I'm aware that is has been applied to images, so I figure that's a good place to start.
Wang and Grimson describe an extension to LDA which incorporates spatial correlation; but there is enough to confuse me already, so I need something involving vanilla LDA (which I'm familiar with). Going back a bit further,
Sivic et. al. describe an application of vanilla LDA to image object recognition. Since LDA applies to discrete documents (that is, something consisting of "words"), the images need to be decomposed into sets of tokens. The approach taken is
- Extract feature vectors from local patches of the image.
- Use vector quantization to reduce the feature vectors to a codebook.
Step #1 for
Winn et. al. is to decompose the image from RGB to
CIELAB. CIELAB is a cool encoding of color space which is supposed to mimic the response of the human visual system, so that makes sense. Poking around for an implementation of a converter, I came across
OpenCV, which has tons of routines in it including color conversion. Writing a program to split an image into CIELAB components is pretty straightforward:
#include <cassert>
#include <string>
#include "cv.h"
#include "highgui.h"
using namespace cv;
int main (int, char** argv)
{
string file (argv[1]);
Mat img = imread (file);
vector<Mat> color (3);
assert (! img.empty ());
assert (img.channels () == 3);
cvtColor (img, img, CV_BGR2Lab);
split (img, color);
imwrite (file + ".L.png", color[0]);
imwrite (file + ".a.png", color[1]);
imwrite (file + ".b.png", color[2]);
return 0;
}
(I guessed that a loaded .jpg image would be BGR color order, based upon the documentation for imwrite). So let's see this in action: here's a picture of a police car.
Here's the L (luminosity) channel:
Here's the a* (magenta vs. green) channel:
Here's the b* (yellow vs. green) channel: