Deep Neural Networks are Easily Fooled: High Confidence Predictions for Unrecognizable Images
This paper shows that state-of-the-art DNNs can output near-100% confidence for images completely unrecognizable to humans, demonstrated via evolutionary algorithms and gradient ascent. It discusses implications for generalization, security, and the divergence between discriminative DNNs and human vision.
Title Deep neural networks can be confidently fooled by images unrecognizable to humans, achieving over 99% confidence in predictions for nonsensical inputs. | 1:52Explained | |
Methods and Models The study employs AlexNet and LeNet models, using evolutionary algorithms (direct and CPPN encodings) and gradient ascent to generate images that fool these deep neural networks. | 1:52Explained | |
Results Evolved images, particularly those generated by CPPNs, consistently fooled MNIST and ImageNet DNNs with high confidence, often containing discriminative features recognizable by the network but not humans. | 2:15Explained | |
Discussion, Implications, and Conclusion The ease with which DNNs can be fooled highlights a gap between machine and human vision, raising practical concerns and suggesting the need for models that better represent input likelihoods and combine discriminative and generative approaches. | 2:07Explained |