Example16 min8 chapters8 audios readyExplained0% complete

Building high-level features using large-scale unsupervised learning

An unsupervised deep autoencoder with local receptive fields, pooling, and local contrast normalization learns high-level detectors (faces, cat faces, human bodies) from unlabeled YouTube frames. Using these learned features for ImageNet classification yields 15.8% accuracy on 22k categories, a ~70% relative improvement over prior state-of-the-art.

	Abstract A nine-layered autoencoder trained on ten million unlabeled internet images learns a face detector that is robust to various transformations and can be used for other high-level object recognition tasks.	1:45Explained
	Introduction and Motivation This work investigates the feasibility of learning high-level, class-specific feature detectors from unlabeled images, inspired by biological systems and motivated by the challenges of obtaining large labeled datasets.	1:52Explained
	Training Set Construction and Large-Scale Approach A large-scale approach using ten million YouTube videos, a deep autoencoder with local receptive fields, and extensive computational resources addresses prior limitations in unsupervised high-level feature learning.	2:06Explained
	Architecture and Learning Objectives A nine-layered locally connected autoencoder with pooling and local contrast normalization, comprising around one billion parameters, is designed to learn high-level features from unlabeled data.	1:51Explained
	Optimization, Parallelism and Training Details Model and data parallelism using a software framework called DistBelief and asynchronous stochastic gradient descent on a thousand-machine cluster enabled the training of a large-scale autoencoder for three days.	1:47Explained
	Experiments on Faces The trained network successfully learns a face detector with 81.7% accuracy from unlabeled data, demonstrating robustness to transformations and the importance of architectural choices like local contrast normalization.	2:18Explained
	Cat and Human Body Detectors and Discriminative Performance The network also learns detectors for cat faces and human bodies, and features learned unsupervisedly significantly improve performance on the ImageNet object recognition task.	2:00Explained
	Appendix and Implementation Details Implementation details of the locally-connected network, parallelism strategies, hyperparameter choices, and baselines for comparison highlight the robust and scalable nature of the unsupervised learning approach.	1:54Explained

Share this document