激勵函數

在計算網路中，一個節點的激勵函數定義了該節點在給定的輸入或輸入的集合下的輸出。標準的電腦晶片電路可以看作是根據輸入得到開（1）或關（0）輸出的數位電路激勵函數。這與神經網路中的線性感知機的行為類似。然而，只有非線性激勵函數才允許這種網路僅使用少量節點來計算非平凡問題。在類神經網路中，這個功能也被稱為傳遞函式。

單變數輸入激勵函數

名稱	方程式	導數	區間	連續性^[1]	單調	一階導數單調	原點近似恆等
恆等函式	$f(x)=x$	$f'(x)=1$	$(-\infty ,\infty )$	$C^{\infty }$	是	是	是
單位階躍函式	$f(x)={\begin{cases}0&{\text{for }}x<0\\1&{\text{for }}x\geq 0\end{cases}}$	$f'(x)={\begin{cases}0&{\text{for }}x\neq 0\\{\text{不存在}}&{\text{for }}x=0\end{cases}}$	$\{0,1\}$	$C^{-1}$	是	否	否
邏輯函式 (S函式的一種)	$f(x)=\sigma (x)={\frac {1}{1+e^{-x}}}$ ^[2]	$f'(x)=f(x)(1-f(x))$	$(0,1)$	$C^{\infty }$	是	否	否
雙曲正切函式	$f(x)=\tanh(x)={\frac {(e^{x}-e^{-x})}{(e^{x}+e^{-x})}}$	$f'(x)=1-f(x)^{2}$	$(-1,1)$	$C^{\infty }$	是	否	是
反正切函式	$f(x)=\tan ^{-1}(x)$	$f'(x)={\frac {1}{x^{2}+1}}$	$\left(-{\frac {\pi }{2}},{\frac {\pi }{2}}\right)$	$C^{\infty }$	是	否	是
Softsign 函式^[1]^[2]	$f(x)={\frac {x}{1+\|x\|}}$	$f'(x)={\frac {1}{(1+\|x\|)^{2}}}$	$(-1,1)$	$C^{1}$	是	否	是
反平方根函式 (ISRU)^[3]	$f(x)={\frac {x}{\sqrt {1+\alpha x^{2}}}}$	$f'(x)=\left({\frac {1}{\sqrt {1+\alpha x^{2}}}}\right)^{3}$	$\left(-{\frac {1}{\sqrt {\alpha }}},{\frac {1}{\sqrt {\alpha }}}\right)$	$C^{\infty }$	是	否	是
線性整流函式 (ReLU)	$f(x)={\begin{cases}0&{\text{for }}x<0\\x&{\text{for }}x\geq 0\end{cases}}$	$f'(x)={\begin{cases}0&{\text{for }}x<0\\1&{\text{for }}x\geq 0\end{cases}}$	$[0,\infty )$	$C^{0}$	是	是	否
帶洩露線性整流函式 (Leaky ReLU)	$f(x)={\begin{cases}0.01x&{\text{for }}x<0\\x&{\text{for }}x\geq 0\end{cases}}$	$f'(x)={\begin{cases}0.01&{\text{for }}x<0\\1&{\text{for }}x\geq 0\end{cases}}$	$(-\infty ,\infty )$	$C^{0}$	是	是	否
參數化線性整流函式 (PReLU)^[4]	$f(\alpha ,x)={\begin{cases}\alpha x&{\text{for }}x<0\\x&{\text{for }}x\geq 0\end{cases}}$	$f'(\alpha ,x)={\begin{cases}\alpha &{\text{for }}x<0\\1&{\text{for }}x\geq 0\end{cases}}$	$(-\infty ,\infty )$	$C^{0}$	Yes iff $\alpha \geq 0$	是	Yes iff $\alpha =1$
帶洩露隨機線性整流函式 (RReLU)^[5]	$f(\alpha ,x)={\begin{cases}\alpha x&{\text{for }}x<0\\x&{\text{for }}x\geq 0\end{cases}}$ ^[3]	$f'(\alpha ,x)={\begin{cases}\alpha &{\text{for }}x<0\\1&{\text{for }}x\geq 0\end{cases}}$	$(-\infty ,\infty )$	$C^{0}$	是	是	否
指數線性函式 (ELU)^[6]	$f(\alpha ,x)={\begin{cases}\alpha (e^{x}-1)&{\text{for }}x<0\\x&{\text{for }}x\geq 0\end{cases}}$	$f'(\alpha ,x)={\begin{cases}f(\alpha ,x)+\alpha &{\text{for }}x<0\\1&{\text{for }}x\geq 0\end{cases}}$	$(-\alpha ,\infty )$	${\begin{cases}C_{1}&{\text{when }}\alpha =1\\C_{0}&{\text{otherwise }}\end{cases}}$	Yes iff $\alpha \geq 0$	Yes iff $0\leq \alpha \leq 1$	Yes iff $\alpha =1$
擴展指數線性函式 (SELU)^[7]	$f(\alpha ,x)=\lambda {\begin{cases}\alpha (e^{x}-1)&{\text{for }}x<0\\x&{\text{for }}x\geq 0\end{cases}}$ with $\lambda =1.0507$ and $\alpha =1.67326$	$f'(\alpha ,x)=\lambda {\begin{cases}\alpha (e^{x})&{\text{for }}x<0\\1&{\text{for }}x\geq 0\end{cases}}$	$(-\lambda \alpha ,\infty )$	$C^{0}$	是	否	否
S 型線性整流激勵函數 (SReLU)^[8]	$f_{t_{l},a_{l},t_{r},a_{r}}(x)={\begin{cases}t_{l}+a_{l}(x-t_{l})&{\text{for }}x\leq t_{l}\\x&{\text{for }}t_{l}<x<t_{r}\\t_{r}+a_{r}(x-t_{r})&{\text{for }}x\geq t_{r}\end{cases}}$ $t_{l},a_{l},t_{r},a_{r}$ are parameters.	$f'_{t_{l},a_{l},t_{r},a_{r}}(x)={\begin{cases}a_{l}&{\text{for }}x\leq t_{l}\\1&{\text{for }}t_{l}<x<t_{r}\\a_{r}&{\text{for }}x\geq t_{r}\end{cases}}$	$(-\infty ,\infty )$	$C^{0}$	否	否	否
反平方根線性函式 (ISRLU)^[3]	$f(x)={\begin{cases}{\frac {x}{\sqrt {1+\alpha x^{2}}}}&{\text{for }}x<0\\x&{\text{for }}x\geq 0\end{cases}}$	$f'(x)={\begin{cases}\left({\frac {1}{\sqrt {1+\alpha x^{2}}}}\right)^{3}&{\text{for }}x<0\\1&{\text{for }}x\geq 0\end{cases}}$	$\left(-{\frac {1}{\sqrt {\alpha }}},\infty \right)$	$C^{2}$	是	是	是
自適應分段線性函式 (APL)^[9]	$f(x)=\max(0,x)+\sum _{s=1}^{S}a_{i}^{s}\max(0,-x+b_{i}^{s})$	$f'(x)=H(x)-\sum _{s=1}^{S}a_{i}^{s}H(-x+b_{i}^{s})$ ^[4]	$(-\infty ,\infty )$	$C^{0}$	否	否	否
SoftPlus 函式^[10]	$f(x)=\ln(1+e^{x})$	$f'(x)={\frac {1}{1+e^{-x}}}$	$(0,\infty )$	$C^{\infty }$	是	是	否
彎曲恆等函式	$f(x)={\frac {{\sqrt {x^{2}+1}}-1}{2}}+x$	$f'(x)={\frac {x}{2{\sqrt {x^{2}+1}}}}+1$	$(-\infty ,\infty )$	$C^{\infty }$	是	是	是
S 型線性加權函式 (SiLU)^[11] (也被稱為Swish^[12])	$f(x)=x\cdot \sigma (x)$ ^[5]	$f'(x)=f(x)+\sigma (x)(1-f(x))$ ^[6]	$[\approx -0.28,\infty )$	$C^{\infty }$	否	否	否
軟指數函式^[13]	$f(\alpha ,x)={\begin{cases}-{\frac {\ln(1-\alpha (x+\alpha ))}{\alpha }}&{\text{for }}\alpha <0\\x&{\text{for }}\alpha =0\\{\frac {e^{\alpha x}-1}{\alpha }}+\alpha &{\text{for }}\alpha >0\end{cases}}$	$f'(\alpha ,x)={\begin{cases}{\frac {1}{1-\alpha (\alpha +x)}}&{\text{for }}\alpha <0\\e^{\alpha x}&{\text{for }}\alpha \geq 0\end{cases}}$	$(-\infty ,\infty )$	$C^{\infty }$	是	是	Yes iff $\alpha =0$
正弦函式	$f(x)=\sin(x)$	$f'(x)=\cos(x)$	$[-1,1]$	$C^{\infty }$	否	否	是
Sinc 函式	$f(x)={\begin{cases}1&{\text{for }}x=0\\{\frac {\sin(x)}{x}}&{\text{for }}x\neq 0\end{cases}}$	$f'(x)={\begin{cases}0&{\text{for }}x=0\\{\frac {\cos(x)}{x}}-{\frac {\sin(x)}{x^{2}}}&{\text{for }}x\neq 0\end{cases}}$	$[\approx -0.217234,1]$	$C^{\infty }$	否	否	否
高斯函式	$f(x)=e^{-x^{2}}$	$f'(x)=-2xe^{-x^{2}}$	$(0,1]$	$C^{\infty }$	否	否	否

說明

^ 若一函式是連續的，則稱其為

C^{0}

函式；若一函式

n

階可導，並且其

n

階導函式連續，則為

C^{n}

函式（

n\geq 1

）；若一函式對於所有

n

都屬於

C^{n}

函式，則稱其為 $C^{\infty }$ 函式，也稱光滑函式。

^ 此處

H

是單位階躍函式。

^

α

是在訓練時間從均勻分佈中抽取的隨機變數，並且在測試時間固定為分佈的期望值。

^ ^ ^ 此處

\sigma

是邏輯函式。

多變數輸入激勵函數

名稱	方程式	導數	區間	光滑性
Softmax函式	$f_{i}({\vec {x}})={\frac {e^{x_{i}}}{\sum _{j=1}^{J}e^{x_{j}}}}$ for $i$ = 1, …, $J$	${\frac {\partial f_{i}({\vec {x}})}{\partial x_{j}}}=f_{i}({\vec {x}})(\delta _{ij}-f_{j}({\vec {x}}))$ ^[7]	$(0,1)$	$C^{\infty }$
Maxout函式^[14]	$f({\vec {x}})=\max _{i}x_{i}$	${\frac {\partial f}{\partial x_{j}}}={\begin{cases}1&{\text{for }}j={\underset {i}{\operatorname {argmax} }}\,x_{i}\\0&{\text{for }}j\neq {\underset {i}{\operatorname {argmax} }}\,x_{i}\end{cases}}$	$(-\infty ,\infty )$	$C^{0}$

說明

^ 此處 $δ$ 是克羅內克δ函式。

參見

參考資料

^ Bergstra, James; Desjardins, Guillaume; Lamblin, Pascal; Bengio, Yoshua. Quadratic polynomials learn better image features". Technical Report 1337. Département d』Informatique et de Recherche Opérationnelle, Université de Montréal. 2009. （原始內容存檔於2018-09-25）.
^ Glorot, Xavier; Bengio, Yoshua, Understanding the difficulty of training deep feedforward neural networks (PDF), International Conference on Artificial Intelligence and Statistics (AISTATS’10), Society for Artificial Intelligence and Statistics, 2010, （原始內容存檔 (PDF)於2017-04-01）
^ ^3.0 ^3.1 Carlile, Brad; Delamarter, Guy; Kinney, Paul; Marti, Akiko; Whitney, Brian. Improving Deep Learning by Inverse Square Root Linear Units (ISRLUs). 2017-11-09. arXiv:1710.09967  [cs.LG].
^ He, Kaiming; Zhang, Xiangyu; Ren, Shaoqing; Sun, Jian. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification. 2015-02-06. arXiv:1502.01852  [cs.CV].
^ Xu, Bing; Wang, Naiyan; Chen, Tianqi; Li, Mu. Empirical Evaluation of Rectified Activations in Convolutional Network. 2015-05-04. arXiv:1505.00853  [cs.LG].
^ Clevert, Djork-Arné; Unterthiner, Thomas; Hochreiter, Sepp. Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs). 2015-11-23. arXiv:1511.07289  [cs.LG].
^ Klambauer, Günter; Unterthiner, Thomas; Mayr, Andreas; Hochreiter, Sepp. Self-Normalizing Neural Networks. 2017-06-08. arXiv:1706.02515  [cs.LG].
^ Jin, Xiaojie; Xu, Chunyan; Feng, Jiashi; Wei, Yunchao; Xiong, Junjun; Yan, Shuicheng. Deep Learning with S-shaped Rectified Linear Activation Units. 2015-12-22. arXiv:1512.07030  [cs.CV].
^ Forest Agostinelli; Matthew Hoffman; Peter Sadowski; Pierre Baldi. Learning Activation Functions to Improve Deep Neural Networks. 21 Dec 2014. arXiv:1412.6830  [cs.NE].
^ Glorot, Xavier; Bordes, Antoine; Bengio, Yoshua. Deep sparse rectifier neural networks (PDF). International Conference on Artificial Intelligence and Statistics. 2011. （原始內容存檔 (PDF)於2018-06-19）.
^ Sigmoid-Weighted Linear Units for Neural Network Function Approximation in Reinforcement Learning. [2018-06-13]. （原始內容存檔於2018-06-13）.
^ Searching for Activation Functions. [2018-06-13]. （原始內容存檔於2018-06-13）.
^ Godfrey, Luke B.; Gashler, Michael S. A continuum among logarithmic, linear, and exponential functions, and its potential to improve generalization in neural networks. 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management: KDIR. 2016-02-03, 1602: 481–486. Bibcode:2016arXiv160201321G. arXiv:1602.01321  .
^ Goodfellow, Ian J.; Warde-Farley, David; Mirza, Mehdi; Courville, Aaron; Bengio, Yoshua. Maxout Networks. JMLR WCP. 2013-02-18, 28 (3): 1319–1327. Bibcode:2013arXiv1302.4389G. arXiv:1302.4389  .

[1] Bergstra, James; Desjardins, Guillaume; Lamblin, Pascal; Bengio, Yoshua. Quadratic polynomials learn better image features". Technical Report 1337. Département d』Informatique et de Recherche Opérationnelle, Université de Montréal. 2009. （原始內容存檔於2018-09-25）.

[2] Glorot, Xavier; Bengio, Yoshua, Understanding the difficulty of training deep feedforward neural networks (PDF), International Conference on Artificial Intelligence and Statistics (AISTATS’10), Society for Artificial Intelligence and Statistics, 2010, （原始內容存檔 (PDF)於2017-04-01）

[isrlu-3] 3.0 ^3.1 Carlile, Brad; Delamarter, Guy; Kinney, Paul; Marti, Akiko; Whitney, Brian. Improving Deep Learning by Inverse Square Root Linear Units (ISRLUs). 2017-11-09. arXiv:1710.09967  [cs.LG].

[4] He, Kaiming; Zhang, Xiangyu; Ren, Shaoqing; Sun, Jian. Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification. 2015-02-06. arXiv:1502.01852  [cs.CV].

[5] Xu, Bing; Wang, Naiyan; Chen, Tianqi; Li, Mu. Empirical Evaluation of Rectified Activations in Convolutional Network. 2015-05-04. arXiv:1505.00853  [cs.LG].

[6] Clevert, Djork-Arné; Unterthiner, Thomas; Hochreiter, Sepp. Fast and Accurate Deep Network Learning by Exponential Linear Units (ELUs). 2015-11-23. arXiv:1511.07289  [cs.LG].

[7] Klambauer, Günter; Unterthiner, Thomas; Mayr, Andreas; Hochreiter, Sepp. Self-Normalizing Neural Networks. 2017-06-08. arXiv:1706.02515  [cs.LG].

[8] Jin, Xiaojie; Xu, Chunyan; Feng, Jiashi; Wei, Yunchao; Xiong, Junjun; Yan, Shuicheng. Deep Learning with S-shaped Rectified Linear Activation Units. 2015-12-22. arXiv:1512.07030  [cs.CV].

[9] Forest Agostinelli; Matthew Hoffman; Peter Sadowski; Pierre Baldi. Learning Activation Functions to Improve Deep Neural Networks. 21 Dec 2014. arXiv:1412.6830  [cs.NE].

[10] Glorot, Xavier; Bordes, Antoine; Bengio, Yoshua. Deep sparse rectifier neural networks (PDF). International Conference on Artificial Intelligence and Statistics. 2011. （原始內容存檔 (PDF)於2018-06-19）.

[11] Sigmoid-Weighted Linear Units for Neural Network Function Approximation in Reinforcement Learning. [2018-06-13]. （原始內容存檔於2018-06-13）.

[12] Searching for Activation Functions. [2018-06-13]. （原始內容存檔於2018-06-13）.

[13] Godfrey, Luke B.; Gashler, Michael S. A continuum among logarithmic, linear, and exponential functions, and its potential to improve generalization in neural networks. 7th International Joint Conference on Knowledge Discovery, Knowledge Engineering and Knowledge Management: KDIR. 2016-02-03, 1602: 481–486. Bibcode:2016arXiv160201321G. arXiv:1602.01321  .

[14] Goodfellow, Ian J.; Warde-Farley, David; Mirza, Mehdi; Courville, Aaron; Bengio, Yoshua. Maxout Networks. JMLR WCP. 2013-02-18, 28 (3): 1319–1327. Bibcode:2013arXiv1302.4389G. arXiv:1302.4389  .

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]