Work@Microsoft    Study@UW.edu    Live@Seattle

Computer Vision (CSE P576) Project 1: Feature Detection and Matching

Computer Vision (CSE P576) Project 1: Feature Detection and Matching
5 (100%) 1 vote

The project has three parts:  feature detection, description, and matching.

Skeleton  | Test Images

Feature Detection

I will identify points of interest in the image using the Harris corner detection method.

For each point in the image, I compute the Harris matrix H for the point by summing the gradient of the pixel and including a weighting scheme defined by a 5×5 Gaussian filter.  After that, I get the corner strength with the function

void computeHarrisValues(CFloatImage &srcImage, CFloatImage &harrisImage)
{
    int w = srcImage.Shape().width;
    int h = srcImage.Shape().height;

    // Compute the gradient image
    CFloatImage xGradient(srcImage.Shape());
    CFloatImage yGradient(srcImage.Shape());
    Convolve(srcImage, xGradient, ConvolveKernel_SobelX);
    Convolve(srcImage, yGradient, ConvolveKernel_SobelY);

    const int kernelRadius = 2;
    const int kernelWidth = 5;

    // For every pixel not on the border
    for (int y = 0 + kernelRadius; y < h - kernelRadius; y++)
    {
        for (int x = 0 + kernelRadius; x < w - kernelRadius; x++)
        {
            double H[4] = { 0, 0, 0, 0 };

            // Compute the harris matrix at this pixel
            for (int i = y - kernelRadius; i <= y + kernelRadius; i++)
            {
                for (int j = x - kernelRadius; j <= x + kernelRadius; j++)
                {
                    double w = gaussian5x5[5 * (i - y + 2) + (j - x + 2)];
                    double Ix = xGradient.Pixel(j, i, 0);
                    double Iy = yGradient.Pixel(j, i, 0);
                    H[0] += w * Ix * Ix;
                    H[1] += w * Ix * Iy;
                    H[2] += w * Iy * Ix;
                    H[3] += w * Iy * Iy;
                }

                double determinant = H[0] * H[3] - H[2] * H[1];
                double trace = H[0] + H[3];
                harrisImage.Pixel(x, y, 0) = (float)((trace != 0) ?
                                determinant / trace : 0.0f);
            }
        }
    }
}

After the corner strength value is computed for every pixel, I find the pixels whose corner strength value is above the threshold (0.017f), and is a local maximum among the surrounding pixels.

void computeLocalMaxima(CFloatImage &srcImage, CByteImage &destImage)
{
    int w = srcImage.Shape().width;
    int h = srcImage.Shape().height;

    const int rad = 3;
    const float threshold = 0.017f;

    for (int y = 0; y < h; y++)
    {
        for (int x = 0; x < w; x++)
        {
            // Check corner strength value of the current pixel
            float currentPixel = srcImage.Pixel(x, y, 0);
            if (currentPixel < threshold)
            {
                destImage.Pixel(x, y, 0) = 0;
                continue;
            }

            bool maximum = true;

            for (int dy = -rad; dy < rad; dy++)
            {
                for (int dx = -rad; dx < rad; dx++)
                {
                    // Determine if the pixel (x, y) is the local max
                    if (dx == 0 && dy == 0)
                        continue;

                    int yi = y + dy;
                    int xi = x + dx;

                    if (!srcImage.Shape().InBounds(xi, yi))
                        continue;

                    if (currentPixel <= srcImage.Pixel(xi, yi, 0))
                    {
                        maximum = false;
                        break;
                    }
                }
            }

            destImage.Pixel(x, y, 0) = (maximum) ? 255 : 0;
        }
    }
}

Result:

Original Image: Yosemite1.jpg

Harris.tga generated by my code:

harrisMax.tga generated by my code:

Original Image:  Graf / img1.ppm

Harris.tga generated by my code:

harrisMax.tga generated by my code:

Feature Description

After the features are detected, I will come up with a descriptor for the feature centered at each interest point.  This descriptor will be the representation you’ll use to compare features in different images to see if they match.

A Simple Feature Descriptor

I use a small square window (5×5) as the feature descriptor.  This is very easy to implement – I get the 5×5 window around the feature and store them in the feature data.  It works well when the images we are comparing are related by a translation.

// Simple Descriptor - using a small square window (say 5x5) as the feature descriptor.
// Get the 5x5 window around the feature and store them in the feature data.
void ComputeSimpleDescriptor(CFloatImage &srcImage, Feature &feature)
{
    for (int y = feature.y - 2; y < feature.y + 2; y++)
    {
        for (int x = feature.x - 2; x < feature.x + 2; x++)
        {
            if (srcImage.Shape().InBounds(x, y))
            {
                feature.data.push_back(srcImage.Pixel(x, y, 0));
            }
            else
            {
                feature.data.push_back(0.0);
            }
        }
    }
}

A More Advanced Feature Descriptor: MOPS Descriptor

1.    Calculate the angle of rotation for the feature.
2.    Take a 40×40 square window around the feature
3.    Rotate the square to horizontal
4.    Sample 8×8 square window centered at feature
5.    Intensity normalize the window by subtracting the mean, dividing by the standard deviation in the window.

void ComputeAdvancedDescriptor(CFloatImage &srcImage, Feature &f)
{
    // Find the dominant orientation of the feature image window.
    f.angleRadians = computeFeatureRotationAngleRadians(srcImage, f.x, f.y);

    // Extract 41x41 pixels around the feature
    CFloatImage splot(41, 41, 1);
    for (int y = -20; y <= 20; y++)
    {
        for (int x = -20; x <= 20; x++)
        {
            if (srcImage.Shape().InBounds(f.x + x, f.y + y))
            {
                splot.Pixel(x + 20, y + 20, 0) = srcImage.Pixel(f.x + x, f.y + y, 0);
            }
            else
            {
                splot.Pixel(x + 20, y + 20, 0) = 0;
            }
        }
    }

    // Rotate to horizontal
    WarpGlobal(splot, splot, CTransform3x3::Rotation(f.angleRadians * 180 / PI), eWarpInterpLinear);

    // The descriptor is a 8x8 window of intensities sampled centered on the feature point.
    CTransform3x3 scaleTransform;
    scaleTransform[0][0] = scaleTransform[1][1] = 41 / 8;
    CFloatImage postHomographyImage(8, 8, 1);
    WarpGlobal(splot, postHomographyImage, scaleTransform, eWarpInterpLinear);

    // Fill in the feature descriptor data
    f.data.resize(8 * 8);
    for (int y = 0; y < 8; y++)
        for (int x = 0; x < 8; x++)
            f.data[8 * y + x] = postHomographyImage.Pixel(x, y, 0);

    // Normalize intensity
    normalizeIntensities(f, 8, 8);
}

Feature Matching

After we detect and describe the noteworthy features in an image, it is time to compare the descriptions of features and define a matching strength between features in two photos.

SSD Matching

This just uses the SSD distance between two feature vectors, and matches a feature in the first image with the closest feature in the second image.  It can match multiple features in the first image to the same feature in the second image.

void ssdMatchFeatures(const FeatureSet &f1, const FeatureSet &f2, vector &matches, double &totalScore)
{
    int m = f1.size();
    int n = f2.size();

    matches.resize(m);
    totalScore = 0;

    double d;
    double dBest;
    int idBest;

    for (int i = 0; i < m; i++)
    {
        dBest = 1e100;
        idBest = 0;

        for (int j = 0; j < n; j++)
        {
            d = distanceSSD(f1[i].data, f2[j].data);

            if (d < dBest) {
                dBest = d;
                idBest = f2[j].id;
            }
        }

        matches[i].id1 = f1[i].id;
        matches[i].id2 = idBest;
        matches[i].score = dBest;
        totalScore += matches[i].score;
    }
}

Ratio Test Matching

This uses the ratio of the SSD distance of the two best matches as the score and matches a feature in the first image with the closest feature in the second image.  It can match multiple features in the first image to the same feature in the second image.

void ratioMatchFeatures(const FeatureSet &f1, const FeatureSet &f2, vector &matches, double &totalScore)
{
    int m = f1.size();
    int n = f2.size();

    matches.resize(m);
    totalScore = 0;

    double d;
    double dBest;
    double d2ndBest;
    int idBest;
    int id2ndBest;

    for (int i = 0; i < m; i++)
    {
        dBest = d2ndBest = 1e100;
        idBest = id2ndBest = 0;

        for (int j = 0; j < n; j++)
        {
            d = distanceSSD(f1[i].data, f2[j].data);

            if (d < dBest) {                 d2ndBest = dBest;                 id2ndBest = idBest;                 dBest = d;                 idBest = f2[j].id;             }             else if (d >= dBest && d < d2ndBest)
            {
                d2ndBest = d;
                id2ndBest = f2[j].id;
            }
        }

        matches[i].id1 = f1[i].id;
        matches[i].id2 = idBest;
        matches[i].score = dBest / d2ndBest;
        totalScore += matches[i].score;
    }
}

Performance

AUC values (in the table) and ROC curve for the graf images (img1.ppm and img2.ppm)

TypeAUC values
Simple Descriptor + SSD0.638798
Simple Descriptor + ratio test0.671085
MOPS + SSD0.675780
MOPS + ratio test0.720569

 

AUC values (in the table) and ROC curve for the Yosemite images (Yosemite1.jpg and Yosemite2.jpg)

TypeAUC values
Simple Descriptor + SSD0.797740
Simple Descriptor + ratio test0.854664
MOPS + SSD0.964496
MOPS + ratio test0.970430

AUC values (in the table) and ROC curve for my own images:

TypeAUC values
Simple Descriptor + SSD0.805243
Simple Descriptor + ratio test0.873232
MOPS + SSD0.959326
MOPS + ratio test0.981430

 

Benchmark Performance (Average AUC)

Simple Descriptor and SSD Test Matching

BikesGrafLeuvenWallAverage
0.2965170.5700760.2017670.2315620.3249805

Simple Descriptor and Ratio Test Matching

BikesGrafLeuvenWallAverage
0.4680660.5426900.4884060.5297170.50721975

MOPS Descriptor and SSD Test Matching

BikesGrafLeuvenWallAverage
0.4414480.6146560.5294040.5577910.53582475

MOPS Descriptor and Ratio Test Matching

BikesGrafLeuvenWallAverage
0.6182240.5722310.6928850.6120830.62385575

 

Strengths and Weaknesses

My implementation (Harris Corner + simplified version of MOPS + Ratio Test) works with changes in translation and rotation, but it does not work with changes in scale.   My benchmark performance shows the best result on images with variance in translation (bikes) and illumination (leuven). To further improve the algorithm, I should consider trying the SIFT solution, which is much more robust.


Comments to Computer Vision (CSE P576) Project 1: Feature Detection and Matching

Leave a Comment

Your email address will not be published. Required fields are marked *

Loading...
ScottGe.net