CNN Features Off-the-Shelf: An Astounding Baseline for Recognition
This paper demonstrates that off-the-shelf convolutional neural network (CNN) features, trained on ImageNet for object classification, provide a strong and versatile baseline for a wide range of visual recognition tasks. The generic CNN features, when combined with simple classifiers like linear SVM, achieve competitive or superior results compared to state-of-the-art methods on tasks such as image classification, scene recognition, fine-grained recognition, attribute detection, and image retrieval, without task-specific fine-tuning.
Abstract Generic descriptors from convolutional neural networks are powerful for diverse recognition tasks, achieving superior results compared to state-of-the-art systems. | 1:49Explained | |
Introduction Convolutional neural network features can be exploited for a wide variety of vision tasks without task-specific retraining, demonstrating significant performance gains. | 2:00Explained | |
Network Architecture and Training Data The OverFeat network, trained on ImageNet for object classification, is used to extract generic features for various recognition tasks. | 1:56Explained | |
Experimental Setup Experiments utilize features from OverFeat's first fully connected layer combined with linear SVM classifiers, with optional data augmentation. | 1:50Explained | |
Image Classification OverFeat CNN features with linear SVMs significantly outperform previous methods on challenging object and scene classification datasets like Pascal VOC and MIT indoor scenes. | 2:11Explained | |
Object Detection and Fine-Grained Recognition While not directly tested for object detection, OverFeat features show promise, and they excel at fine-grained recognition tasks, capturing subtle differences between subclasses. | 1:42Explained | |
Fine-Grained Recognition Datasets The CNN-SVM approach achieves state-of-the-art performance on fine-grained datasets like CUB 200-2011 birds and Oxford 102 flowers, even without specialized annotations. | 1:46Explained | |
Attribute Detection CNN features demonstrate competitive performance in attribute detection tasks on the UIUC and H3D datasets, outperforming methods that use part-level annotations. | 1:28Explained | |
Implementation Details Experiments employ libsvm and liblinear with data augmentation, including crops, rotations, and power transforms, and sum responses for multiple test-time representations. | 1:52Explained | |
Instance Retrieval CNN features are competitive with established instance retrieval methods on various datasets, outperforming low memory footprint methods after standard processing steps. | 1:57Explained | |
Retrieval Results CNN representations, with or without spatial search and standard processing, achieve strong performance on diverse retrieval benchmarks, particularly against low memory footprint methods. | 1:13Explained | |
Conclusion Off-the-shelf CNN features from OverFeat, combined with simple classifiers, are a powerful and general solution for various visual recognition tasks, establishing a new baseline. | 1:29Explained |