AlexNet

AlexNet是一個卷積神經網絡，由亞歷克斯·克里澤夫斯基設計^[1]，與伊爾亞·蘇茨克維和克里澤夫斯基的博士導師傑弗里·辛頓共同發表^[2]^[3]。

AlexNet參加了2012年9月30日舉行的ImageNet大規模視覺識別挑戰賽^[4]，達到最低的15.3%的Top-5錯誤率，比第二名低10.8個百分點。原論文的主要結論是，模型的深度對於提高性能至關重要，AlexNet的計算成本很高，但因在訓練過程中使用了圖形處理器（GPU）而使得計算具有可行性^[4] 。

背景

AlexNet並不是卷積神經網絡（CNN）第一次利用快速GPU實現而贏得圖像識別競賽。K. Chellapilla等人（2006）在GPU上的CNN比同等的CPU實現速度快4倍^[5]。Dan Ciresan等人（2011）的深層CNN在IDSIA上已經快了60倍^[6]，並在2011年8月取得了超過人類的表現^[7]。從2011年5月15日到2012年9月10日，他們的CNN贏得了不少於四場圖像競賽^[8]^[9]。他們還極大提高了文獻中多個圖像數據庫的最佳性能^[10]。

根據AlexNet的論文^[4]，其與Ciresan的早期網絡「有些相似」。兩者最初都用CUDA編寫，可在GPU支持下運行。實際上，兩者都是楊立昆等人（1989）介紹的CNN設計的變體^[11]^[12]，他將反向傳播算法應用於福島邦彥（福島邦彦）最初提出的CNN架構「neocognitron」的一個變種^[13]^[14]。後來J. Weng提出的最大池化方法修改了該架構^[15]^[9]。

網絡設計

AlexNet包含八層。前五層是卷積層，之後一些層是最大池化層，最後三層是全連接層^[4]。它使用了非飽和的ReLU激活函數，顯示出比tanh和sigmoid更好的訓練性能^[4]。

影響

AlexNet被認為是計算機視覺領域最有影響力的論文之一，它刺激了更多使用卷積神經網絡和GPU來加速深度學習的論文的出現^[16]。根據Google scholar網站統計，截至2024年中，AlexNet論文已被引用超過157,000次^[17]。

亞歷克斯·克里澤夫斯基

亞歷克斯·克里澤夫斯基（出生於烏克蘭，在加拿大長大）是一名計算機科學家，以在人工神經網絡和深度學習方面的工作而著稱。在通過AlexNet贏得ImageNet 2012挑戰賽後不久，他和同事將他們的創業公司DNN研究公司（DNN Research Inc.）賣給了Google^[1]。克里澤夫斯基對這項工作失去興趣後，於2017年9月離開了Google^[1]。在Dessa公司，克里澤夫斯基將為新的深度學習技術提供建議和幫助^[1]。研究人員經常引用他的許多有關機器學習和計算機視覺的論文^[18]。

參考資料

^ ^1.0 ^1.1 ^1.2 ^1.3 Dave Gershgorn. The inside story of how AI got good enough to dominate Silicon Valley. Quartz. 2018-06-18 [2018-10-05]. （原始內容存檔於2019-12-12）.
^ The data that transformed AI research—and possibly the world. [2020-01-17]. （原始內容存檔於2017-07-27）.
^ ILSVRC2012 Results. [2020-01-17]. （原始內容存檔於2020-01-16）.
^ ^4.0 ^4.1 ^4.2 ^4.3 ^4.4 Krizhevsky, Alex; Sutskever, Ilya; Hinton, Geoffrey E. ImageNet classification with deep convolutional neural networks (PDF). Communications of the ACM. 2017-05-24, 60 (6): 84–90 [2020-01-17]. ISSN 0001-0782. doi:10.1145/3065386. （原始內容存檔 (PDF)於2017-05-16）.
^ Kumar Chellapilla; Sid Puri; Patrice Simard. High Performance Convolutional Neural Networks for Document Processing. Lorette, Guy (編). Tenth International Workshop on Frontiers in Handwriting Recognition. Suvisoft. 2006 [2020-01-17]. （原始內容存檔於2020-05-18）.
^ Ciresan, Dan; Ueli Meier; Jonathan Masci; Luca M. Gambardella; Jurgen Schmidhuber. Flexible, High Performance Convolutional Neural Networks for Image Classification (PDF). Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence-Volume Volume Two. 2011, 2: 1237–1242 [2013-11-17]. （原始內容存檔 (PDF)於2013-11-16）.
^ IJCNN 2011 Competition result table. OFFICIAL IJCNN2011 COMPETITION. 2010 [2019-01-14]. （原始內容存檔於2019-01-21）.
^ Schmidhuber, Jürgen. History of computer vision contests won by deep CNNs on GPU. 2017-03-17 [2019-01-14]. （原始內容存檔於2018-12-19）.
^ ^9.0 ^9.1 Schmidhuber, Jürgen. Deep Learning. Scholarpedia. 2015, 10 (11): 1527–54 [2020-01-17]. CiteSeerX 10.1.1.76.1541  . PMID 16764513. doi:10.1162/neco.2006.18.7.1527. （原始內容存檔於2016-04-19）.
^ Ciresan, Dan; Meier, Ueli; Schmidhuber, Jürgen. Multi-column deep neural networks for image classification. New York, NY: Institute of Electrical and Electronics Engineers (IEEE). June 2012: 3642–3649. CiteSeerX 10.1.1.300.3283  . ISBN 978-1-4673-1226-4. OCLC 812295155. arXiv:1202.2745  . doi:10.1109/CVPR.2012.6248110. |journal=被忽略 (幫助)
^ Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, L. D. Jackel, Backpropagation Applied to Handwritten Zip Code Recognition （頁面存檔備份，存於網際網路檔案館）; AT&T Bell Laboratories
^ LeCun, Yann; Léon Bottou; Yoshua Bengio; Patrick Haffner. Gradient-based learning applied to document recognition (PDF). Proceedings of the IEEE. 1998, 86 (11): 2278–2324 [2016-10-07]. CiteSeerX 10.1.1.32.9552  . doi:10.1109/5.726791. （原始內容 (PDF)存檔於2017-12-15）.
^ Fukushima, K. Neocognitron. Scholarpedia. 2007, 2 (1): 1717. doi:10.4249/scholarpedia.1717.
^ Fukushima, Kunihiko. Neocognitron: A Self-organizing Neural Network Model for a Mechanism of Pattern Recognition Unaffected by Shift in Position (PDF). Biological Cybernetics. 1980, 36 (4): 193–202 [2013-11-16]. PMID 7370364. doi:10.1007/BF00344251. （原始內容存檔 (PDF)於2014-06-03）.
^ Weng, J; Ahuja, N; Huang, TS. Learning recognition and segmentation of 3-D objects from 2-D images. Proc. 4th International Conf. Computer Vision. 1993: 121–128.
^ Deshpande, Adit. The 9 Deep Learning Papers You Need To Know About (Understanding CNNs Part 3). adeshpande3.github.io. [2018-12-04]. （原始內容存檔於2018-11-21）.
^ AlexNet paper on Google Scholar
^ Alex Krizhevsky. Google Scholar Citations. [2020-01-17]. （原始內容存檔於2020-04-17）.

[quartz-1] 1.0 ^1.1 ^1.2 ^1.3 Dave Gershgorn. The inside story of how AI got good enough to dominate Silicon Valley. Quartz. 2018-06-18 [2018-10-05]. （原始內容存檔於2019-12-12）.

[:1-2] The data that transformed AI research—and possibly the world. [2020-01-17]. （原始內容存檔於2017-07-27）.

[:2-3] ILSVRC2012 Results. [2020-01-17]. （原始內容存檔於2020-01-16）.

[:0-4] 4.0 ^4.1 ^4.2 ^4.3 ^4.4 Krizhevsky, Alex; Sutskever, Ilya; Hinton, Geoffrey E. ImageNet classification with deep convolutional neural networks (PDF). Communications of the ACM. 2017-05-24, 60 (6): 84–90 [2020-01-17]. ISSN 0001-0782. doi:10.1145/3065386. （原始內容存檔 (PDF)於2017-05-16）.

[5] Kumar Chellapilla; Sid Puri; Patrice Simard. High Performance Convolutional Neural Networks for Document Processing. Lorette, Guy (編). Tenth International Workshop on Frontiers in Handwriting Recognition. Suvisoft. 2006 [2020-01-17]. （原始內容存檔於2020-05-18）.

[flexible-6] Ciresan, Dan; Ueli Meier; Jonathan Masci; Luca M. Gambardella; Jurgen Schmidhuber. Flexible, High Performance Convolutional Neural Networks for Image Classification (PDF). Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence-Volume Volume Two. 2011, 2: 1237–1242 [2013-11-17]. （原始內容存檔 (PDF)於2013-11-16）.

[7] IJCNN 2011 Competition result table. OFFICIAL IJCNN2011 COMPETITION. 2010 [2019-01-14]. （原始內容存檔於2019-01-21）.

[8] Schmidhuber, Jürgen. History of computer vision contests won by deep CNNs on GPU. 2017-03-17 [2019-01-14]. （原始內容存檔於2018-12-19）.

[schdeepscholar-9] 9.0 ^9.1 Schmidhuber, Jürgen. Deep Learning. Scholarpedia. 2015, 10 (11): 1527–54 [2020-01-17]. CiteSeerX 10.1.1.76.1541  . PMID 16764513. doi:10.1162/neco.2006.18.7.1527. （原始內容存檔於2016-04-19）.

[mcdns-10] Ciresan, Dan; Meier, Ueli; Schmidhuber, Jürgen. Multi-column deep neural networks for image classification. New York, NY: Institute of Electrical and Electronics Engineers (IEEE). June 2012: 3642–3649. CiteSeerX 10.1.1.300.3283  . ISBN 978-1-4673-1226-4. OCLC 812295155. arXiv:1202.2745  . doi:10.1109/CVPR.2012.6248110. |journal=被忽略 (幫助)

[lecun1-11] Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, L. D. Jackel, Backpropagation Applied to Handwritten Zip Code Recognition （頁面存檔備份，存於網際網路檔案館）; AT&T Bell Laboratories

[lecun98-12] LeCun, Yann; Léon Bottou; Yoshua Bengio; Patrick Haffner. Gradient-based learning applied to document recognition (PDF). Proceedings of the IEEE. 1998, 86 (11): 2278–2324 [2016-10-07]. CiteSeerX 10.1.1.32.9552  . doi:10.1109/5.726791. （原始內容 (PDF)存檔於2017-12-15）.

[fukuneoscholar-13] Fukushima, K. Neocognitron. Scholarpedia. 2007, 2 (1): 1717. doi:10.4249/scholarpedia.1717.

[intro-14] Fukushima, Kunihiko. Neocognitron: A Self-organizing Neural Network Model for a Mechanism of Pattern Recognition Unaffected by Shift in Position (PDF). Biological Cybernetics. 1980, 36 (4): 193–202 [2013-11-16]. PMID 7370364. doi:10.1007/BF00344251. （原始內容存檔 (PDF)於2014-06-03）.

[weng1993-15] Weng, J; Ahuja, N; Huang, TS. Learning recognition and segmentation of 3-D objects from 2-D images. Proc. 4th International Conf. Computer Vision. 1993: 121–128.

[16] Deshpande, Adit. The 9 Deep Learning Papers You Need To Know About (Understanding CNNs Part 3). adeshpande3.github.io. [2018-12-04]. （原始內容存檔於2018-11-21）.

[17] AlexNet paper on Google Scholar

[GoogleScholar-18] Alex Krizhevsky. Google Scholar Citations. [2020-01-17]. （原始內容存檔於2020-04-17）.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]