LISTENDOCK

PDF TO MP3

Example16 min8 chapters8 audios readyExplained0% complete

Building high-level features using large-scale unsupervised learning

An unsupervised deep autoencoder with local receptive fields, pooling, and local contrast normalization learns high-level detectors (faces, cat faces, human bodies) from unlabeled YouTube frames. Using these learned features for ImageNet classification yields 15.8% accuracy on 22k categories, a ~70% relative improvement over prior state-of-the-art.

Abstract

A nine-layered autoencoder trained on ten million unlabeled internet images learns a face detector that is robust to various transformations and can be used for other high-level object recognition tasks.

1:45Explained

Introduction and Motivation

This work investigates the feasibility of learning high-level, class-specific feature detectors from unlabeled images, inspired by biological systems and motivated by the challenges of obtaining large labeled datasets.

1:52Explained

Training Set Construction and Large-Scale Approach

A large-scale approach using ten million YouTube videos, a deep autoencoder with local receptive fields, and extensive computational resources addresses prior limitations in unsupervised high-level feature learning.

2:06Explained

Architecture and Learning Objectives

A nine-layered locally connected autoencoder with pooling and local contrast normalization, comprising around one billion parameters, is designed to learn high-level features from unlabeled data.

1:51Explained

Optimization, Parallelism and Training Details

Model and data parallelism using a software framework called DistBelief and asynchronous stochastic gradient descent on a thousand-machine cluster enabled the training of a large-scale autoencoder for three days.

1:47Explained

Experiments on Faces

The trained network successfully learns a face detector with 81.7% accuracy from unlabeled data, demonstrating robustness to transformations and the importance of architectural choices like local contrast normalization.

2:18Explained

Cat and Human Body Detectors and Discriminative Performance

The network also learns detectors for cat faces and human bodies, and features learned unsupervisedly significantly improve performance on the ImageNet object recognition task.

2:00Explained

Appendix and Implementation Details

Implementation details of the locally-connected network, parallelism strategies, hyperparameter choices, and baselines for comparison highlight the robust and scalable nature of the unsupervised learning approach.

1:54Explained

Share this document