Sunday, February 21, 2010

Better Image Capture

I tried to figure out what webcams would work from Linux and allow me to control the shutter speed. The existence of setpwc focused me on Logitech webcams. The marketing copy didn't mention shutter speed but did mention something called "RightLight2" technology which I figured was related (parenthetically, it seems like as a particular technology improves, it becomes less intelligible to technologists; the reason being of course is that it becomes more intelligible to the non-technologists who outnumber us). Anyway I bought a Logitech C905 which was like $70. Sadly, it was not compatible with setpwc because it is too new: it uses the uvcvideo driver. However whatever the default settigs are on the C905, the image capture looked good in broad daylight. Plug and play for once ...

The real problem was mounting the camera. My cheap $10 webcam with horrible shutter speed had a perfect suction cup mount. The C905 comes with a clip that I'd hoped would clip to my rear-view mirror but the mirror was too thick on account of fancy auto-dimming logic. So I dismantled the cheap $10 webcam down to the suction cup and then tied the C905 to it. Hey, I paid $10 for a suction cup ...

Now I have some real data: we had to go to a birthday in Culver City today so I did a capture for the trip, which included a police car.

The next step will be some tools to manage monster collections of image training data because I had to manually binary search through the pile of today's capture to find this police car.

Thursday, February 4, 2010

Feature Extraction

I'm still trying to figure out a webcam which is Linux compatible with a shutter speed that admits capturing motion in direct sunlight. Until then, more armchair computer vision 101 ...

Winn et. al. use a standard set of filters for extracting features from images: Gaussian, derivatives of Gaussian (aka DoG), and Laplacian of Gaussian (aka LoG). Here's what they look like:

Laplacian of Gaussian
Y-derivative of Gaussian
X-derivative of Gaussian

OpenCV has a method for computing GaussianBlur which is sufficiently flexible to reproduce Winn et. al., but to reproduce their usage of DoG and LoG I wrote my own OpenCV kernels (although maybe a combination of the GaussianBlur method with the Laplacian method and Sobel method would work?). It turns out separable filters are much faster and everything above except the LoG is separable; in addition the LoG can be written as the sum of two separable filters, the second derivative of Gaussian in x and y. Thus with something like

#include <string>
#include "cv.h"
#include "highgui.h"

#include "cyclopsutil.hh"

using namespace cv;
using namespace cyclops;

int main (int, char** argv)
string file (argv[1]);
Mat img = imread (file);
vector<Mat> color (3);

assert (! img.empty ());
assert (img.channels () == 3);

cvtColor (img, img, CV_BGR2Lab);

split (img, color);

imwrite (file + ".L.png", color[0]);
imwrite (file + ".a.png", color[1]);
imwrite (file + ".b.png", color[2]);

for (int sigma = 1; sigma < 16; sigma *= 2)
string labels[3] = { string ("L"), string ("a"), string ("b") };
Mat conv = color[0].clone ();
Mat convtwo = color[0].clone ();
std::stringstream out;
out << sigma;

GaussianBlur (color[0], conv, Size (6 * sigma + 1, 6 * sigma + 1), sigma, sigma, BORDER_REPLICATE);
imwrite (file + "." + labels[0] + ".gauss." + out.str () + ".png", conv);

sepFilter2D (color[0], conv, -1, dygauss_kernelx (sigma), dygauss_kernely (sigma), Point (-1, -1), 0, BORDER_REPLICATE);
imwrite (file + "." + labels[0] + ".dygauss." + out.str () + ".png", conv);

sepFilter2D (color[0], conv, -1, dxgauss_kernelx (sigma), dxgauss_kernely (sigma), Point (-1, -1), 0, BORDER_REPLICATE);
imwrite (file + "." + labels[0] + ".dxgauss." + out.str () + ".png", conv);

sepFilter2D (color[0], conv, -1, dyygauss_kernelx (sigma), dyygauss_kernely (sigma), Point (-1, -1), 0, BORDER_REPLICATE);
imwrite (file + "." + labels[0] + ".dyygauss." + out.str () + ".png", conv);

sepFilter2D (color[0], convtwo, -1, dxxgauss_kernelx (sigma), dxxgauss_kernely (sigma), Point (-1, -1), 0, BORDER_REPLICATE);
imwrite (file + "." + labels[0] + ".dxxgauss." + out.str () + ".png", convtwo);

imwrite (file + "." + labels[0] + ".lapgauss." + out.str () + ".png", conv + convtwo);

return 0;

applied to our awesome cop car picture's luminosity component we have:
Gaussian (sigma = 1)
Gaussian (sigma = 8)
DxGaussian (sigma = 1)
DxGaussian (sigma = 8)
DyGaussian (sigma = 1)
DyGaussian (sigma = 8)
Laplacian of Gaussian (sigma = 1)
Laplacian of Gaussian (sigma = 8)

Sunday, January 31, 2010

Initial Data Collection Attempt

Well I finally got the webcam and the netbook into the car and starting doing video capture. The good news is, the suction cup mount on my $10 webcam is perfect for the windshield.

The bad news is, it turns out $10 webcams do not automatically adjust gain. Driving around during the day yields a time-lapse movie of what appears to be the afterlife. Here's a daytime driving still:

I see dead people! Twilight looks a little better:

But many of the stills have alot of blur in them, perhaps the low quality CCD has a slow shutter speed.

Maybe if I can find a software shutter speed control then I can both operate during the day and eliminate most of the blur. Otherwise, I'm going to have to pony up for a better camera.

Tuesday, January 26, 2010

Computer Vision 101

Since it's been over a decade since I've done anything computer vision related, I need to acquaint myself with progress in the space. I figure reproducing a current paper should get me in the zone; furthermore, I'm familiar with latent Dirichlet allocation (LDA) as applied to text and I'm aware that is has been applied to images, so I figure that's a good place to start.

Wang and Grimson describe an extension to LDA which incorporates spatial correlation; but there is enough to confuse me already, so I need something involving vanilla LDA (which I'm familiar with). Going back a bit further, Sivic et. al. describe an application of vanilla LDA to image object recognition. Since LDA applies to discrete documents (that is, something consisting of "words"), the images need to be decomposed into sets of tokens. The approach taken is
  1. Extract feature vectors from local patches of the image.
  2. Use vector quantization to reduce the feature vectors to a codebook.
Sivic et. al. extract features from interest regions, whereas Wang and Grimson follow the approach of Winn et. al. and extract features densely. Interest regions are just another way for to get confused and screw something up, so I'll defer that for now.

Step #1 for Winn et. al. is to decompose the image from RGB to CIELAB. CIELAB is a cool encoding of color space which is supposed to mimic the response of the human visual system, so that makes sense. Poking around for an implementation of a converter, I came across OpenCV, which has tons of routines in it including color conversion. Writing a program to split an image into CIELAB components is pretty straightforward:
#include <cassert>
#include <string>
#include "cv.h"
#include "highgui.h"

using namespace cv;

int main (int, char** argv)
string file (argv[1]);
Mat img = imread (file);
vector<Mat> color (3);
assert (! img.empty ());
assert (img.channels () == 3);

cvtColor (img, img, CV_BGR2Lab);

split (img, color);
imwrite (file + ".L.png", color[0]);
imwrite (file + ".a.png", color[1]);
imwrite (file + ".b.png", color[2]);

return 0;
(I guessed that a loaded .jpg image would be BGR color order, based upon the documentation for imwrite). So let's see this in action: here's a picture of a police car.

Here's the L (luminosity) channel:

Here's the a* (magenta vs. green) channel:

Here's the b* (yellow vs. green) channel:

Monday, January 25, 2010

Car Brain!

My New Year's resolution is to give my car a rudimentary visual system, which should give me plenty to blog about.
My initial goal is to train the car to identify police cars, mount 4 cameras to allow for panoramic view, and have some kind of audible or visual warning. Is identifying a police car feasible? I'm not a computer vision expert, but searching around these points appear salient:
  • single object class: the question is binary, "does this picture contain a police car?", which makes the problem easier.
  • pose variability is low: the cameras will be mounted at fixed points at my car, and both my car and any police car will (barring the "Dukes of Hazzard" scenario) have all four wheels on the ground, which should moderate the pose variability.
  • volatile illumination: ideally, the detector would operate under all weather conditions, day or night. so that means significant illumination changes.
Well, we see how far I get anyway.

Step #1 was to get a brain for my car, so I purchased a refurb Dell Mini 110-1030nr for $280 from Amazon and put Ubuntu Netbook Remix on it. I also got 1 webcam for now.

It is presumably underpowered for the image processing that will be required, but if I get that far that will justify spending more money. First, I need to install it in the car and have it do video capture; from the resulting data I imagine many interesting problems will suggest themselves.

Sunday, June 14, 2009

Did Bush Spend the Peace Dividend?

I'm no fan of Bush, but I find myself arguing against certain misconceptions about his fiscal performance which continue to persist. One such misconception is that he spent an insane amount of money on the military. Nominally (that is, in raw dollar amounts) this appears to be true, but while he did increase funding relative to the Clinton administration it was still very low by historical standards.

So let's start with a picture. Here is a graph of defense spending as a percentage of GDP from 1940 to 2003.So the end of the cold war did result in a peace dividend (lower defense spending) and this situation persisted through 2003. Bush did increase funding from the low point in 2000 but only to circa 1994 levels.

After 2003 the picture is more complicated because the Iraq war spending is not reported. I took Iraq war spending numbers and combined them with reported military outlays and then divided by nominal GDP. The result is summarized in the following graphic, where I include the levels from corresponding years in previous decades for reference.

Here we can see that even in 2006 with the Iraq war in full swing we spent less on total military expenditures relative to GDP than during 1976 (with the Vietnam war fully wound down) or 1986 (at the height of the cold war). The 1990s were definitely a period of relative low military expenditures but in historical perspective Bush's spending on defense was not very high even with Iraq war expenditures accounted for (and by the way, the entire practice of keeping things off the budget is very distasteful, shame on the Bush administration for that).

In particular we can still claim a peace dividend during the 2000s relative to the cold war.

Tuesday, June 9, 2009

On the Record

The RIAA has released 2008 Year-End Shipment Statistics which allows us to appreciate graphically just how screwed the record industry is. Compact Disc sales are plummeting, legal digital downloads are growing and the net impact is (significantly) negative. Here are the results (adjusted for inflation, which makes the story even worse!):

One popular theory is that, like newspapers, record labels have lost pricing power because they can no longer bundle (in particular, putting one good song per album and charging $13 is a thing of the past). The RIAA (implicitly) dismisses this line of reasoning in the associated notes:
If digital singles are converted into an album equivalent (divided by ten) and added to both CDs and digital albums, the overall album unit decline in 2008 was 14 percent (635 million to 545 million).
See ... less overall units means people must be stealing music, because their demand for quality product has not decreased! Quick, pass some laws ...

However we have established that an album is not equivalent to ten good singles. Let's be generous and say an album is equivalent to 3 good singles. In this case the picture is rosier:

Viewed this way demand for music is about the same, with the difference being attributable to the economic climate. Furthermore, if two singles are considered equivalent to one good album, then demand has actually gone up.

Finally, it's amusing to note that a greatest hits album from the early 1970s is the 2nd highest selling album of all time, selling 48 million copies and therefore roughly valued at $500 million dollars. Clearly the media consumer had less capabilities in the past if they were willing to collectively pay $500 million for someone to assemble a set of previously released tracks into a single physical format.