Computer Vision Talks All you want and should know about computer vision is here

19Aug/1151

Feature descriptor comparison report

Sharing my research work of behavior of several types of feature descriptors. This article is an update of old "Comparison of feature descriptors" post. I've added a brand new ORB feature descriptor to the test suite, also SIFT descriptor included as well. And a new version of LAZY descriptor present in this test too.

Introduction

For this test i have written special test framework, which allows me to easily add the new kind of descriptors and test cases and generate report data in CSV-like format. Than i upload it in Google docs and create this awesome charts. Five quality and one performance test was done for each kind of descriptor.

Test cases

  • Rotation test -  this test shows how the feature descriptor depends on feature orientation.
  • Scaling test -  this test shows how the feature descriptor depends on feature size.
  • Blur test -  this test shows how the feature descriptor is robust against blur.
  • Lighting test -  this test shows how the feature descriptor is robust against lighting.
  • Pattern detection test - this test performs detection of planar object (image) on the real video. In contrast to the synthetic tests, this test gives a real picture of the overall stability of the particular descriptor.
  • Performance test is a measurement of description extraction time.

All quality tests works in similar way. Using a given source image we generate a synthetic test data: transformed images corresponding feature points. The transformation algorithm depends on the particular test. For the rotation test case, it's the rotation of the source image around it's center for 360 degrees, for scaling - it's resizing of image from 0.25X to 2x size of original. Blur test uses gaussian blur with several steps and the lighting test changes the overall picture brightness.

The pattern detection test deserves a special attention. This test is done on very complex and noisy video sequence. So it's challenging task for any feature descriptor algorithm to demonstrate a good results in this test.

The metric for all quality tests is the percent of correct matches between the source image and the transformed one. Since we use planar object, we can easily select the inliers from all matches using the homography estimation. I use OpenCV's function cvFindHomography for this. This metric gives very good and stable results. I do no outlier detection of matches before homography estimation because this will affect the results in unexpected way. The matching of descriptors is done via brute-force matching from the OpenCV.

Rotation test

Descriptor's invariance to rotation summary report

In this test i obtain pretty expectable results, because all descriptors are rotation invariant expect the BRIEF. Slight changes in stability can be explained by the feature orientation calculation algorithm and descriptor nature. A detailed study of why the descriptor behaves exactly as it is, takes time and effort. It's a topic for another article. Maybe later on....

Scaling test

Descriptor's invariance to scaling summary report

SURF and SIFT descriptors demonstrate us very good stability in this test because they do expensive keypoint size calculation. Other descriptors uses fixed-size descriptor and you can see what it leads to. Currently for LAZY descriptor i do not have separate LAZY feature detector (i use ORB detector for tests) but I'm thinking on lightweight feature detector with feature size calculation, because it's a must-have feature. Actually, scale invariance is much more important rather than precise orientation calculation.

Blur test

Descriptor's invariance to blur summary report

In this test i tried to simulate the motion blur which can occurs if camera moves suddenly. All descriptors demonstrate good results in this test. By “good” I mean that the more blur size is applied the less percent of correct matches is obtained. Which is expected behavior.

Lighting test

Descriptor's invariance to lighting summary report

In lighting test the transformed images differs only in overall image brightness. All kinds of descriptors works well in this case. The major reason is that all descriptors extracted normalized, e.g the norm_2 of the descriptor vector equals 1. This normalization makes descriptor invariant to brightness changes.

Pattern detection on real video

Pattern detection test

Detection of the object on real video is the most complex task since ground truth contains rotation, scaling and motion blur. Also other objects are also present. And finally, it’s not HD quality. These conditions are dictated by the actual conditions of application of computer vision.

As you can see on diagram, the SIFT and SURF descriptors gives the best results, nevertheless they are far away from ideal, it’s quite enough for such challenging video. Unfortunately, scale-covariant descriptors show very bad results in this test because pattern image appears in 1:1 scale only at the beginning of the video (The “spike” near frame 20). On the rest of the video sequence target object moves from the camera back and scale-covariant descriptors can’t handle this situation.

Performance summary

Descriptor extraction time summaryThis chart shows the extraction time for N features. I made Y-axis as logarithm scale to make it more readable. For all descriptor extraction algorithm the extraction time depends on number of features linearly. Local spikes is probably caused by some vector resizing or L2 cache misses. This performance test was done on Mac Book Pro 2.2 with Core 2 Duo 2.13 Ghz.

Further works

Add new quality test cases. One additional test i know for sure - affine transformations. Your ideas for other tests are welcome!

  • Add new kind of descriptors. Definitely will add an A-SIFT implementation.
  • Create an LAZY detector with feature size and orientation estimation.
  • Improve the LAZY descriptor extraction procedure. Expect at least 20% performance gain.
  • Generate matching video for each test to demonstrate the behavior of each descriptor algorithm.
If you enjoyed this post, please consider sharing it
    Feature descriptor comparison report
    0 votes, 0.00 avg. rating (0% score)
    • Miguel Algaba

      Nice work Eugene!! I’m working on 3D slam using 2D feature matching in conjunction with 3D pointclouds and I found this post really helpful. I have tested FAST, SURF, SIFT, BRIEF and ORB and I have decided to use SURF for the moment as a compromise between robustness and performance. I would like see a comparison between the OpenCV gpu-SURF against the cpu version, if you can add it would be great!

      Best regards!

      • Ricardo

        Congratulations on the comparision report, it is very detailed, and it was very helpful on my research. I hope to see a new report as soon as OpenCV launches the new ORB+BRISK detector in January. :)

        Best regards

    • EKhvedchenya

      Hi!
      Thanks for the comment.
      I also thought about adding gpu version of SIFT and SURF descriptors. But the one thing prevents me of doing this – all my mac’s has ATI videocards :(
      So i’m waiting for new MBP with NVidia graphics. I also very interested in the results.

    • Alexander Gorlin

      Hi!

      Thanks for the nice review.

      It would probably be interesting to compare data requirements of the detection algorithms, say, a table with
      - number of bytes per feature;
      - number of features for the sample image;
      - total amount of data used by each algorithm.

      • EKhvedchenya

        Hi, thank for the comment.
        I agree some statistical information can be helpful.
        Can you clarify the last option ” total amount of data used by each algorithm”. Did you mean the memory consumption?

        • Alexander Gorlin

          Sorry, I haven’t noticed your reply.

          Yes, I mean how much memory (per object to be recognized) is required by each algorithm. This is often a really restricting issue on mobile platforms.

    • Miguel Algaba

      Hi again Eugene, I have tested the gpu SURF version on my Macbook (Nvidia 9400M [16 shaders]) and runs near 3 times faster than it does on my cpu (Core 2 Duo 2.0Ghz) detecting keypoints and computing descriptors. I have tested on 640×480 grayscale images. I believe that it would perform better on bigger images. In the other hand, matching is faster in the cpu (for now).

      • EKhvedchenya

        Thank you for sharing this information!
        I am increasingly convinced that i should add the gpu accelerated implementations in the next review.

    • dspmania

      Hi!
      All your articles and postings are so useful. Thanks!
      Actually, I’m using OpenCV 2.3 in iOS and trying to use ORB feature detector and descriptor.
      The problem is that BruteForceMatcher is not working with ORB descriptors.
      I found that ORB descriptor’s type is 0, which is ‘uchar’, so I set up BruteForceMatcher like this:
      cv::BruteForceMatcher<cv::L2 > matcher;
      And the number of matched pair is always zero. Can you let me know how can I use BruteForceMatcher with ORB feature descriptors?

      • EKhvedchenya

        Hi
        Try using cv::OrbFeatureDetector and cv::OrbDescriptorExtractor classes. This wrappers over cv::ORB does implicit conversion of the computed descriptors to double type, so you will be able to use it with BruteForceMatcher directly.

        Other workaround is to explicitly convert the descriptor matrix to floating-point element type using cv::Mat.convertTo function.

        • dspmania

          Hi

          I tried cv::OrbDescriptorExtractor but it doesn’t seem that it converts descriptors to floating point implicitly. So I used cv::Mat.convertTo to convert it after cv::ORB(). It works fine! Thank you.

    • dspmania

      Oops sorry! I missed some keywords. Here’s my codes:

      std::vector keypointsSrc, keypointsDst;
      cv::Mat descriptorsMatSrc, descriptorsMatDst;
      cv::ORB orb;
      orb(imgSrc, cv::Mat(), keypointsSrc, descriptorsMatSrc);
      orb(imgDst, cv::Mat(), keypointsDst, descriptorsMatDst);

      cv::BruteForceMatcher<cv::L2 > matcher;
      std::vector<std::vector > matches1;
      std::vector<std::vector > matches2;
      matcher.knnMatch(descriptorsMatSrc, descriptorsMatDst, matches1, 2); // zero matches…
      matcher.knnMatch(descriptorsMatDst, descriptorsMatSrc, matches2, 2); // zero matches…

      • dspmania

        this is weird… does this editor handle double brackets differently? all double brackets are missing in my code

    • Xiao-Dong

      This is nice work. But I am a little confused by the extraction time chart. From my experience, SUFR is much faster than SIFT.

      • EKhvedchenya

        There is logarithmic scale for Y axis for the extraction time chart.

    • Kenz Dale

      Nice work! Would you consider releasing your test code, so that those of us who have a little less familiarity with OpenCV and image processing in general could have a working example of how to use these various feature descriptors? It would be very interesting to be able to repeat your test, only with images that are more appropriate to a user’s likely environment.

    • Pingback: [zz] Feature descriptor comparison report | 增强视觉 | 计算机视觉 增强现实

    • kiraff

      Hi

      I’d like to know how you chose the threshold for the matching. Is the total number of matches the same for each method ? Because if the total number of matches is small, a higher percentage of correct matches can be expected.

      I maybe missed something, but basically my question is: how do you make your test “fair” when all the methods have different descriptors length and the distance measure used is not always the same (Euclidean, Hamming,…) ?

    • Francois Berthiaume

      Hi,

      Nice work, I suggest you to test the descriptors for robustness to noise (by adding white or colored noise) or compression to your images. Those degradations are typical of videos taken by surveillance cameras.

      • EKhvedchenya

        Hi, thanks for tip. It’s easy to add this test case.

    • Huang

      Hello~ thanks you for your great job~
      could you give some advises on building a test framework?

    • Arshiya Mahmoudi

      Hi,
      What is the matching algorithm used? namely:
      NN
      NNDR
      ?
      And How do you match keypoints? one by one? or hash table or…?

    • http://www.ingenuitas.com Anthony Oliver

      Thanks for posting this, very informative.

    • Rui Marques

      Great post, but everyone would appreciate more info on the implementations that you used, since opencv doesn’t implement all those descriptors. Can you share the implementation of RIFF or LAZY?

    • Yuan DONG

      very impressive contribution for sharing info

    • Jianwu Flynn

      This job is impressive, I saw it in August, 2011. I ‘m interest it, and want to cite this job in my article. But i didn’t see any published type. Can you send me the original published article.

      • EKhvedchenya

        Hi, this is an original article. I haven’t published it.
        So if you want to cite it, refer to it’s URL.

        • Jianwu Flynn

          Ok, Thanks.

    • Pingback: enddl22s Blog » Feature detector and descriptors bench marking information of OpenCV2.3.1

    • Pingback: Feature descriptor comparison report特征评测集锦 zz « Firefly's space

    • Rui Marques

      I think you should write an article about this, there are a few articles evaluating feature descriptors, but most are not as updated as yours. You should probably update it with the new FREAK and maybe BRISK. FREAK looks quite promising. And maybe repeat the tests for SURF, SIFT and ORB as their OpenCV implementations have been updated.

    • https://github.com/kikohs/freak Adoniseagle

      Brisk http://www.asl.ethz.ch/people/lestefan/personal/BRISK
      Freak https://github.com/kikohs/freak

      I think the two implementation are compatible with OpenCV, so integration to your test will be easy.

      • EKhvedchenya

        Thank you! I’ll include them and generate a new report in a week or so.

        • Aaron Wetzler

          Any update on that report? Really looking forward to seeing an independent comparison

          • http://computer-vision-talks.com Ievgen Khvedchenia

            Aaron, i’ll make the new report after i finish OpenCV Tutorial project. Right now i’m too busy to update the comparison report in parallel.
            But there is a good news too! I’m going to open source code of the comparison application.

            • Aaron Wetzler

              Great! Obviously no rush. Will check back here in a month or so :)

      • Rui Marques
    • Marcelo

      Awesome article! Is there somewhere your implemenation?

      Thanks

      • http://computer-vision-talks.com Ievgen Khvedchenia

        The source code for this comparison is still closed since it contains some parts of proprietary code. I was thinking to rewrite it and make it open source. After finishing OpenCV Tutorial app i will concentrate on this.

    • Choi

      It’s awsome!

      Many review papers show data just between SIFT and SURF or SIFT and BRIF :(

      And their system enviroment are different….

      I’m really appreciate to you for this good article.

    • karthik

      Hi

      I am new to opencv.
      May i know where to get implementation codes for surf,sift, FREAK AND BRISK detectors .

      Also may i know how to execute it ?
      Your timely help will help me a lot.

      THANKS

    • Pingback: A battle of three descriptors: SURF, FREAK and BRISK « Computer Vision Talks

    • ettogawa

      how about FAST with ORB? do you ever compare it with lighting test? which is better?

      • http://computer-vision-talks.com Ievgen Khvedchenia

        ORB has it’s own detector based on FAST. ORB detector computes orientation so it works better than FAST detector in tests. ORB detector is very sensitive to lighting changes and therefore shows poor results in lighting tests.

    • xamox

      I’ve read a few of your post, as well had asked this in previous post comments. I noticed in a comment in this you said you would release the code for running the performance test, does this live some where online?

      Also, thanks again for writing these post, very informative and wish I could find more post that showed applications of theory.

    • mahesh

      great summary, thanks!

    • http://photopeach.com/user/decadesatin6 multifocal toric contact lenses

      Great post. I’m experiencing many of these issues as well..

    • Youchang

      Great Post!
      Can you share your codes which are using for this tests?

    • Pingback: features - локальные особенности - за что зацепиться взгляду | Мои IT-заметки