深度神经网络:33年前与33年后

中文摘要

本文回顾了Yann LeCun等人1989年的论文《Backpropagation Applied to Handwritten Zip Code Recognition》,该论文被认为是神经网络在实际应用中的早期里程碑。作者通过PyTorch复现了该论文的实验,并探讨了深度学习在过去33年中的进展。文章详细描述了原始网络的架构、训练过程和实验结果,并对比了现代硬件和软件带来的性能提升。作者还尝试通过现代优化技术(如AdamW、数据增强、Dropout等)改进原始模型的性能,最终将错误率降低了60%。文章还讨论了数据集和模型规模的扩展对性能的影响,并展望了未来深度学习的发展趋势,特别是基础模型和微调技术的兴起。

English Summary

Deep Neural Nets: 33 years ago and 33 years from now

This article revisits the 1989 paper by Yann LeCun et al., 'Backpropagation Applied to Handwritten Zip Code Recognition,' which is considered an early milestone in the practical application of neural networks. The author reproduces the experiments using PyTorch and explores the progress of deep learning over the past 33 years. The article details the architecture, training process, and experimental results of the original network, comparing the performance improvements brought by modern hardware and software. The author also attempts to enhance the performance of the original model using modern optimization techniques (e.g., AdamW, data augmentation, Dropout), ultimately reducing the error rate by 60%. The article discusses the impact of scaling up datasets and model sizes on performance and looks ahead to future trends in deep learning, particularly the rise of foundation models and fine-tuning techniques.