Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
This paper introduces Batch Normalization, a technique that normalizes layer inputs to accelerate deep network training, allowing for higher learning rates and improved accuracy.
Abstract Batch Normalization accelerates deep network training by normalizing layer inputs, enabling higher learning rates, reducing reliance on initialization, and acting as a regularizer to achieve significant accuracy improvements. | 1:49Explained | |
Introduction and the Problem of Internal Covariate Shift Internal Covariate Shift, caused by changing input distributions to layers during training, slows down deep network optimization, but Batch Normalization addresses this by stabilizing layer inputs. | 1:59Explained | |
Normalization via Mini-Batch Statistics and the Batch Normalizing Transform Batch Normalization normalizes layer inputs using mini-batch statistics and learned scale/shift parameters, making the transform differentiable and providing regularization through mini-batch dependent normalization. | 2:09Explained | |
Training, Inference, and Practical Considerations for Batch-Normalized Networks Batch-normalized networks are trained using mini-batch statistics and frozen population statistics for inference, with minimal runtime cost and enhanced training stability and performance due to reduced sensitivity to parameter scale. | 1:49Explained | |
Experiments, Results on MNIST and ImageNet, and Conclusions Experiments on MNIST and ImageNet demonstrate that Batch Normalization significantly accelerates training, improves accuracy, and enables training of deeper networks, even with saturating nonlinearities, outperforming state-of-the-art results. | 2:03Explained |