變分自編碼器

機器學習中，變分自編碼器（Variational Autoencoder，VAE）是由Diederik P. Kingma和Max Welling提出的一種人工神經網絡結構，屬於概率圖模式和變分貝葉斯方法。^[1]

VAE與自編碼器模型有關，因為兩者在結構上有一定親和力，但在目標和數學表述上有很大區別。VAE屬於概率生成模型（Probabilistic Generative Model），神經網絡僅是其中的一個組件，依照功能的不同又可分為編碼器和解碼器。編碼器可將輸入變量映射到與變分分佈的參數相對應的潛空間（Latent Space），這樣便可以產生多個遵循同一分佈的不同樣本。解碼器的功能基本相反，是從潛空間映射回輸入空間，以生成數據點。雖然噪聲模型的方差可以單獨學習而來，但它們通常都是用重參數化技巧（Reparameterization Trick）來訓練的。

此類模型最初是為無監督學習設計的，^[2]^[3]但在半監督學習^[4]^[5]和監督學習中也表現出卓越的有效性。^[6]

結構與操作概述

VAE是一個分別具有先驗和噪聲分佈的生成模型，一般用最大期望算法（Expectation-Maximization meta-algorithm）來訓練。這樣可以優化數據似然的下限，用其它方法很難實現這點，且需要q分佈或變分後驗。這些q分佈通常在一個單獨的優化過程中為每個單獨數據點設定參數；而VAE則用神經網絡作為一種攤銷手段來聯合優化各個數據點，將數據點本身作為輸入，輸出變分分佈的參數。從一個已知的輸入空間映射到低維潛空間，這是一種編碼過程，因此這張神經網絡也叫「編碼器」。

解碼器則從潛空間映射回輸入空間，如作為噪聲分佈的平均值。也可以用另一個映射到方差的神經網絡，為簡單起見一般都省略掉了。這時，方差可以用梯度下降法進行優化。

優化模型常用的兩個術語是「重構誤差（reconstruction error）」和「KL散度」。它們都來自概率模型的自由能表達式（Free Energy Expression ），因而根據噪聲分佈和數據的假定先驗而有所不同。例如，像IMAGENET這樣的標準VAE任務一般都假設具有高斯分佈噪聲，但二值化的MNIST這樣的任務則需要伯努利噪聲。自由能表達式中的KL散度使得與p分佈重疊的q分佈的概率質量最大化，但這樣可能導致出現搜尋模態（Mode-Seeking Behaviour）。自由能表達式的剩餘部分是「重構」項，需要用採樣逼近來計算其期望。^[7]

系統闡述

VAE的基本框架。模型接受

x

為輸入。編碼器將其壓縮到潛空間。解碼器以在潛空間採樣的信息為輸入，並產生

{x'}

，使其與

x

儘可能相似。

從建立概率模型的角度來看，人們希望用他們選擇的參數化概率分佈 $p_{\theta }(x)=p(x|\theta )$ 使數據 $x$ 的概率最大化。這一分佈常是高斯分佈 $N(x|\mu ,\sigma )$ ，分別參數化為 $\mu$ 和 $\sigma$ ，作為指數族的一員很容易作為噪聲分佈來處理。簡單的分佈很容易最大化，但如果假設了潛質（latent） $z$ 的先驗分佈，可能會產生難以解決的積分。讓我們通過對 $z$ 的邊緣化找到 $p_{\theta }(x)$ 。

p_{\theta }(x)=\int _{z}p_{\theta }({x,z})\,dz,

其中， $p_{\theta }({x,z})$ 表示可觀測數據 $x$ 於 $p_{\theta }$ 下的聯合分佈，和在潛空間中的形式（也就是編碼後的 $z$ ）。根據連鎖法則，方程可以改寫為

p_{\theta }(x)=\int _{z}p_{\theta }({x|z})p_{\theta }(z)\,dz

在香草VAE中，通常認為 $z$ 是實數的有限維向量， $p_{\theta }({x|z})$ 則是高斯分佈。那麼 $p_{\theta }(x)$ 便是高斯分佈的混合物。

現在，可將輸入數據和其在潛空間中的表示的映射定義為

先驗 $p_{\theta }(z)$
似然值 $p_{\theta }(x|z)$
後驗 $p_{\theta }(z|x)$

不幸的是，對 $p_{\theta }(x)$ 的計算十分困難。為了加快計算速度，有必要再引入一個函數，將後驗分佈近似為

q_{\phi }({z|x})\approx p_{\theta }({z|x})

其中 $\phi$ 是參數化的 $q$ 的實值集合。這有時也被稱為「攤銷推理」（amortized inference），因為可以通過「投資」找到好的 $q_{\phi }$ ，之後不用積分便可以從 $x$ 快速推斷出 $z$ 。

這樣，問題就變成了找到一個好的概率自編碼器，其中條件似然分佈 $p_{\theta }(x|z)$ 由概率解碼器（probabilistic decoder）計算得到，後驗分佈近似 $q_{\phi }(z|x)$ 由概率編碼器（probabilistic encoder）計算得到。

下面將編碼器參數化為 $E_{\phi }$ ，將解碼器參數化為 $D_{\theta }$ 。

證據下界（Evidence lower bound，ELBO）

如同每個深度學習問題，為了通過反向傳播算法更新神經網絡的權重，需要定義一個可微損失函數。

對於VAE，這一思想可以實現為聯合優化生成模型參數 $\theta$ 和 $\phi$ ，以減少輸入輸出間的重構誤差，並使 $q_{\phi }({z|x})$ 儘可能接近 $p_{\theta }(z|x)$ 。重構損失常用均方誤差和交叉熵。

作為兩個分佈之間的距離損失，反向KL散度 $D_{KL}(q_{\phi }({z|x})\parallel p_{\theta }({z|x}))$ 可以很有效地將 $q_{\phi }({z|x})$ 擠壓到 $p_{\theta }(z|x)$ 之下。^[8]^[9]

剛剛定義的距離損失可擴展為

{\begin{aligned}D_{KL}(q_{\phi }({z|x})\parallel p_{\theta }({z|x}))&=\mathbb {E} _{z\sim q_{\phi }(\cdot |x)}\left[\ln {\frac {q_{\phi }(z|x)}{p_{\theta }(z|x)}}\right]\\&=\mathbb {E} _{z\sim q_{\phi }(\cdot |x)}\left[\ln {\frac {q_{\phi }({z|x})p_{\theta }(x)}{p_{\theta }(x,z)}}\right]\\&=\ln p_{\theta }(x)+\mathbb {E} _{z\sim q_{\phi }(\cdot |x)}\left[\ln {\frac {q_{\phi }({z|x})}{p_{\theta }(x,z)}}\right]\end{aligned}}

現在定義證據下界（Evidence lower bound，ELBO）： $L_{\theta ,\phi }(x):=\mathbb {E} _{z\sim q_{\phi }(\cdot |x)}\left[\ln {\frac {p_{\theta }(x,z)}{q_{\phi }({z|x})}}\right]=\ln p_{\theta }(x)-D_{KL}(q_{\phi }({\cdot |x})\parallel p_{\theta }({\cdot |x}))$ 使ELBO最大化 $\theta ^{*},\phi ^{*}={\underset {\theta ,\phi }{\operatorname {argmax} }}\,L_{\theta ,\phi }(x)$ 等於同時最大化 $\ln p_{\theta }(x)$ 、最小化 $D_{KL}(q_{\phi }({z|x})\parallel p_{\theta }({z|x}))$ 。即，最大化觀測數據似然的對數值，同時最小化近似後驗 $q_{\phi }(\cdot |x)$ 與精確後驗 $p_{\theta }(\cdot |x)$ 的差值。

給出的形式不大方便進行最大化，可以用下面的等價形式： $L_{\theta ,\phi }(x)=\mathbb {E} _{z\sim q_{\phi }(\cdot |x)}\left[\ln p_{\theta }(x|z)\right]-D_{KL}(q_{\phi }({\cdot |x})\parallel p_{\theta }(\cdot ))$ 其中 $\ln p_{\theta }(x|z)$ 實現為 $\|x-D_{\theta }(z)\|_{2}^{2}$ ，因為這是在加性常數的前提下 $x\sim {\mathcal {N}}(D_{\theta }(z),I)$ 得到的東西。也就是說，我們把 $x$ 在 $z$ 上的條件分佈建模為以 $D_{\theta }(z)$ 為中心的高斯分佈。 $q_{\phi }(z|x)$ 和 $p_{\theta }(z)$ 的分佈通常也被選為高斯分佈，因為 $z|x\sim {\mathcal {(}}E_{\phi }(x),\sigma _{\phi }(x)^{2}I)$ 和 $z\sim {\mathcal {(}}0,I)$ 可以通過高斯分佈的KL散度公式得到： $L_{\theta ,\phi }(x)=-{\frac {1}{2}}\mathbb {E} _{z\sim q_{\phi }(\cdot |x)}\left[\|x-D_{\theta }(z)\|_{2}^{2}\right]-{\frac {1}{2}}\left(N\sigma _{\phi }(x)^{2}+\|E_{\phi }(x)\|_{2}^{2}-2N\ln \sigma _{\phi }(x)\right)+Const$

重參數化

重參數化技巧方案。隨機變量

{\varepsilon }

可作為外部輸入注入潛空間

z

，這樣一來便可以不更新隨機變量，而反向傳播梯度。

有效搜索到 $\theta ^{*},\phi ^{*}={\underset {\theta ,\phi }{\operatorname {argmax} }}\,L_{\theta ,\phi }(x)$ 的典型方法是梯度下降法。

它可以很直接地找到 $\nabla _{\theta }\mathbb {E} _{z\sim q_{\phi }(\cdot |x)}\left[\ln {\frac {p_{\theta }(x,z)}{q_{\phi }({z|x})}}\right]=\mathbb {E} _{z\sim q_{\phi }(\cdot |x)}\left[\nabla _{\theta }\ln {\frac {p_{\theta }(x,z)}{q_{\phi }({z|x})}}\right]$ 但是， $\nabla _{\phi }\mathbb {E} _{z\sim q_{\phi }(\cdot |x)}\left[\ln {\frac {p_{\theta }(x,z)}{q_{\phi }({z|x})}}\right]$ 不允許將 $\nabla _{\phi }$ 置於期望中，因為 $\phi$ 出現在概率分佈本身之中。重參數化技巧（也被稱為隨機反向傳播^[10]）則繞過了這個難點。^[8]^[11]^[12]

最重要的例子是當 $z\sim q_{\phi }(\cdot |x)$ 遵循正態分佈時，如 ${\mathcal {N}}(\mu _{\phi }(x),\Sigma _{\phi }(x))$ 。

重參數化技巧之後的VAE方案

可以通過讓 ${\boldsymbol {\varepsilon }}\sim {\mathcal {N}}(0,{\boldsymbol {I}})$ 構成「標準隨機數生成器」來實現重參數化，並將 $z$ 構建為 $z=\mu _{\phi }(x)+L_{\phi }(x)\epsilon$ 。這裏， $L_{\phi }(x)$ 通過科列斯基分解得到： $\Sigma _{\phi }(x)=L_{\phi }(x)L_{\phi }(x)^{T}$ 接着我們有 $\nabla _{\phi }\mathbb {E} _{z\sim q_{\phi }(\cdot |x)}\left[\ln {\frac {p_{\theta }(x,z)}{q_{\phi }({z|x})}}\right]=\mathbb {E} _{\epsilon }\left[\nabla _{\phi }\ln {\frac {p_{\theta }(x,\mu _{\phi }(x)+L_{\phi }(x)\epsilon )}{q_{\phi }(\mu _{\phi }(x)+L_{\phi }(x)\epsilon |x)}}\right]$ 由此，我們得到了梯度的無偏估計，這就可以應用隨機梯度下降法了。

由於我們重參數化了 $z$ ，所以需要找到 $q_{\phi }(z|x)$ 。令 $q_{0}$ 為 $\epsilon$ 的概率密度函數，那麼 $\ln q_{\phi }(z|x)=\ln q_{0}(\epsilon )-\ln |\det(\partial _{\epsilon }z)|$ ，其中 $\partial _{\epsilon }z$ 是 $\epsilon$ 相對於 $z$ 的雅可比矩陣。由於 $z=\mu _{\phi }(x)+L_{\phi }(x)\epsilon$ ，這就是 $\ln q_{\phi }(z|x)=-{\frac {1}{2}}\|\epsilon \|^{2}-\ln |\det L_{\phi }(x)|-{\frac {n}{2}}\ln(2\pi )$

變體

許多VAE的應用和擴展已被用來使其適應其他領域，並提升性能。

$\beta$ -VAE是帶加權KL散度的實現，用於自動發現並解釋因子化的潛空間形式。這種實現可以對大於1的 $\beta$ 值強制進行流形分解。這個架構可以在無監督下發現解耦的潛因子。^[13]^[14]

條件性VAE（CVAE）在潛空間中插入標籤信息，強制對所學數據進行確定性約束表示（Deterministic Constrained Representation）。^[15]

一些結構可以直接處理生成樣本的質量，^[16]^[17]或實現多個潛空間，以進一步改善表徵學習的效果。^[18]^[19]

一些結構將VAE和生成對抗網絡混合起來，以獲得混合模型。^[20]^[21]^[22]

另見

參考

^ Pinheiro Cinelli, Lucas; et al. Variational Autoencoder. Variational Methods for Machine Learning with Applications to Deep Networks. Springer. 2021: 111–149. ISBN 978-3-030-70681-4. S2CID 240802776. doi:10.1007/978-3-030-70679-1_5.
^ Dilokthanakul, Nat; Mediano, Pedro A. M.; Garnelo, Marta; Lee, Matthew C. H.; Salimbeni, Hugh; Arulkumaran, Kai; Shanahan, Murray. Deep Unsupervised Clustering with Gaussian Mixture Variational Autoencoders. 2017-01-13. arXiv:1611.02648  [cs.LG].
^ Hsu, Wei-Ning; Zhang, Yu; Glass, James. Unsupervised domain adaptation for robust speech recognition via variational autoencoder-based data augmentation. 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU). December 2017: 16–23 [2023-02-24]. ISBN 978-1-5090-4788-8. S2CID 22681625. arXiv:1707.06265  . doi:10.1109/ASRU.2017.8268911. （原始內容存檔於2021-08-28）.
^ Ehsan Abbasnejad, M.; Dick, Anthony; van den Hengel, Anton. Infinite Variational Autoencoder for Semi-Supervised Learning. 2017: 5888–5897 [2023-02-24]. （原始內容存檔於2021-06-24）.
^ Xu, Weidi; Sun, Haoze; Deng, Chao; Tan, Ying. Variational Autoencoder for Semi-Supervised Text Classification. Proceedings of the AAAI Conference on Artificial Intelligence. 2017-02-12, 31 (1) [2023-02-24]. S2CID 2060721. doi:10.1609/aaai.v31i1.10966  . （原始內容存檔於2021-06-16）（英語）.
^ Kameoka, Hirokazu; Li, Li; Inoue, Shota; Makino, Shoji. Supervised Determined Source Separation with Multichannel Variational Autoencoder. Neural Computation. 2019-09-01, 31 (9): 1891–1914 [2023-02-24]. PMID 31335290. S2CID 198168155. doi:10.1162/neco_a_01217. （原始內容存檔於2021-06-16）.
^ Kingma, Diederik. Autoencoding Variational Bayes. 2013. arXiv:1312.6114  [stat.ML].
^ ^8.0 ^8.1 Kingma, Diederik P.; Welling, Max. Auto-Encoding Variational Bayes. 2014-05-01. arXiv:1312.6114  [stat.ML].
^ From Autoencoder to Beta-VAE. Lil'Log. 2018-08-12 [2023-02-24]. （原始內容存檔於2021-05-14）（英語）.
^ Rezende, Danilo Jimenez; Mohamed, Shakir; Wierstra, Daan. Stochastic Backpropagation and Approximate Inference in Deep Generative Models. International Conference on Machine Learning (PMLR). 2014-06-18: 1278–1286 [2023-02-24]. arXiv:1401.4082  . （原始內容存檔於2023-02-24）（英語）.
^ Bengio, Yoshua; Courville, Aaron; Vincent, Pascal. Representation Learning: A Review and New Perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2013, 35 (8): 1798–1828 [2023-02-24]. ISSN 1939-3539. PMID 23787338. S2CID 393948. arXiv:1206.5538  . doi:10.1109/TPAMI.2013.50. （原始內容存檔於2021-06-27）.
^ Kingma, Diederik P.; Rezende, Danilo J.; Mohamed, Shakir; Welling, Max. Semi-Supervised Learning with Deep Generative Models. 2014-10-31. arXiv:1406.5298  [cs.LG].
^ Higgins, Irina; Matthey, Loic; Pal, Arka; Burgess, Christopher; Glorot, Xavier; Botvinick, Matthew; Mohamed, Shakir; Lerchner, Alexander. beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework. 2016-11-04 [2023-02-24]. （原始內容存檔於2021-07-20）（英語）.
^ Burgess, Christopher P.; Higgins, Irina; Pal, Arka; Matthey, Loic; Watters, Nick; Desjardins, Guillaume; Lerchner, Alexander. Understanding disentangling in β-VAE. 2018-04-10. arXiv:1804.03599  [stat.ML].
^ Sohn, Kihyuk; Lee, Honglak; Yan, Xinchen. Learning Structured Output Representation using Deep Conditional Generative Models (PDF). 2015-01-01 [2023-02-24]. （原始內容存檔 (PDF)於2021-07-09）（英語）.
^ Dai, Bin; Wipf, David. Diagnosing and Enhancing VAE Models. 2019-10-30. arXiv:1903.05789  [cs.LG].
^ Dorta, Garoe; Vicente, Sara; Agapito, Lourdes; Campbell, Neill D. F.; Simpson, Ivor. Training VAEs Under Structured Residuals. 2018-07-31. arXiv:1804.01050  [stat.ML].
^ Tomczak, Jakub; Welling, Max. VAE with a VampPrior. International Conference on Artificial Intelligence and Statistics (PMLR). 2018-03-31: 1214–1223 [2023-02-24]. arXiv:1705.07120  . （原始內容存檔於2021-06-24）（英語）.
^ Razavi, Ali; Oord, Aaron van den; Vinyals, Oriol. Generating Diverse High-Fidelity Images with VQ-VAE-2. 2019-06-02. arXiv:1906.00446  [cs.LG].
^ Larsen, Anders Boesen Lindbo; Sønderby, Søren Kaae; Larochelle, Hugo; Winther, Ole. Autoencoding beyond pixels using a learned similarity metric. International Conference on Machine Learning (PMLR). 2016-06-11: 1558–1566 [2023-02-24]. arXiv:1512.09300  . （原始內容存檔於2021-05-17）（英語）.
^ Bao, Jianmin; Chen, Dong; Wen, Fang; Li, Houqiang; Hua, Gang. CVAE-GAN: Fine-Grained Image Generation Through Asymmetric Training. 2017. arXiv:1703.10155  [cs.CV]. cite arXiv模板填寫了不支持的參數 (幫助)
^ Gao, Rui; Hou, Xingsong; Qin, Jie; Chen, Jiaxin; Liu, Li; Zhu, Fan; Zhang, Zhao; Shao, Ling. Zero-VAE-GAN: Generating Unseen Features for Generalized and Transductive Zero-Shot Learning. IEEE Transactions on Image Processing. 2020, 29: 3665–3680 [2023-02-24]. Bibcode:2020ITIP...29.3665G. ISSN 1941-0042. PMID 31940538. S2CID 210334032. doi:10.1109/TIP.2020.2964429. （原始內容存檔於2021-06-28）.

[1] Pinheiro Cinelli, Lucas; et al. Variational Autoencoder. Variational Methods for Machine Learning with Applications to Deep Networks. Springer. 2021: 111–149. ISBN 978-3-030-70681-4. S2CID 240802776. doi:10.1007/978-3-030-70679-1_5.

[2] Dilokthanakul, Nat; Mediano, Pedro A. M.; Garnelo, Marta; Lee, Matthew C. H.; Salimbeni, Hugh; Arulkumaran, Kai; Shanahan, Murray. Deep Unsupervised Clustering with Gaussian Mixture Variational Autoencoders. 2017-01-13. arXiv:1611.02648  [cs.LG].

[3] Hsu, Wei-Ning; Zhang, Yu; Glass, James. Unsupervised domain adaptation for robust speech recognition via variational autoencoder-based data augmentation. 2017 IEEE Automatic Speech Recognition and Understanding Workshop (ASRU). December 2017: 16–23 [2023-02-24]. ISBN 978-1-5090-4788-8. S2CID 22681625. arXiv:1707.06265  . doi:10.1109/ASRU.2017.8268911. （原始內容存檔於2021-08-28）.

[4] Ehsan Abbasnejad, M.; Dick, Anthony; van den Hengel, Anton. Infinite Variational Autoencoder for Semi-Supervised Learning. 2017: 5888–5897 [2023-02-24]. （原始內容存檔於2021-06-24）.

[5] Xu, Weidi; Sun, Haoze; Deng, Chao; Tan, Ying. Variational Autoencoder for Semi-Supervised Text Classification. Proceedings of the AAAI Conference on Artificial Intelligence. 2017-02-12, 31 (1) [2023-02-24]. S2CID 2060721. doi:10.1609/aaai.v31i1.10966  . （原始內容存檔於2021-06-16）（英語）.

[6] Kameoka, Hirokazu; Li, Li; Inoue, Shota; Makino, Shoji. Supervised Determined Source Separation with Multichannel Variational Autoencoder. Neural Computation. 2019-09-01, 31 (9): 1891–1914 [2023-02-24]. PMID 31335290. S2CID 198168155. doi:10.1162/neco_a_01217. （原始內容存檔於2021-06-16）.

[7] Kingma, Diederik. Autoencoding Variational Bayes. 2013. arXiv:1312.6114  [stat.ML].

[:0-8] 8.0 ^8.1 Kingma, Diederik P.; Welling, Max. Auto-Encoding Variational Bayes. 2014-05-01. arXiv:1312.6114  [stat.ML].

[9] From Autoencoder to Beta-VAE. Lil'Log. 2018-08-12 [2023-02-24]. （原始內容存檔於2021-05-14）（英語）.

[10] Rezende, Danilo Jimenez; Mohamed, Shakir; Wierstra, Daan. Stochastic Backpropagation and Approximate Inference in Deep Generative Models. International Conference on Machine Learning (PMLR). 2014-06-18: 1278–1286 [2023-02-24]. arXiv:1401.4082  . （原始內容存檔於2023-02-24）（英語）.

[11] Bengio, Yoshua; Courville, Aaron; Vincent, Pascal. Representation Learning: A Review and New Perspectives. IEEE Transactions on Pattern Analysis and Machine Intelligence. 2013, 35 (8): 1798–1828 [2023-02-24]. ISSN 1939-3539. PMID 23787338. S2CID 393948. arXiv:1206.5538  . doi:10.1109/TPAMI.2013.50. （原始內容存檔於2021-06-27）.

[12] Kingma, Diederik P.; Rezende, Danilo J.; Mohamed, Shakir; Welling, Max. Semi-Supervised Learning with Deep Generative Models. 2014-10-31. arXiv:1406.5298  [cs.LG].

[13] Higgins, Irina; Matthey, Loic; Pal, Arka; Burgess, Christopher; Glorot, Xavier; Botvinick, Matthew; Mohamed, Shakir; Lerchner, Alexander. beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework. 2016-11-04 [2023-02-24]. （原始內容存檔於2021-07-20）（英語）.

[14] Burgess, Christopher P.; Higgins, Irina; Pal, Arka; Matthey, Loic; Watters, Nick; Desjardins, Guillaume; Lerchner, Alexander. Understanding disentangling in β-VAE. 2018-04-10. arXiv:1804.03599  [stat.ML].

[15] Sohn, Kihyuk; Lee, Honglak; Yan, Xinchen. Learning Structured Output Representation using Deep Conditional Generative Models (PDF). 2015-01-01 [2023-02-24]. （原始內容存檔 (PDF)於2021-07-09）（英語）.

[16] Dai, Bin; Wipf, David. Diagnosing and Enhancing VAE Models. 2019-10-30. arXiv:1903.05789  [cs.LG].

[17] Dorta, Garoe; Vicente, Sara; Agapito, Lourdes; Campbell, Neill D. F.; Simpson, Ivor. Training VAEs Under Structured Residuals. 2018-07-31. arXiv:1804.01050  [stat.ML].

[18] Tomczak, Jakub; Welling, Max. VAE with a VampPrior. International Conference on Artificial Intelligence and Statistics (PMLR). 2018-03-31: 1214–1223 [2023-02-24]. arXiv:1705.07120  . （原始內容存檔於2021-06-24）（英語）.

[19] Razavi, Ali; Oord, Aaron van den; Vinyals, Oriol. Generating Diverse High-Fidelity Images with VQ-VAE-2. 2019-06-02. arXiv:1906.00446  [cs.LG].

[20] Larsen, Anders Boesen Lindbo; Sønderby, Søren Kaae; Larochelle, Hugo; Winther, Ole. Autoencoding beyond pixels using a learned similarity metric. International Conference on Machine Learning (PMLR). 2016-06-11: 1558–1566 [2023-02-24]. arXiv:1512.09300  . （原始內容存檔於2021-05-17）（英語）.

[21] Bao, Jianmin; Chen, Dong; Wen, Fang; Li, Houqiang; Hua, Gang. CVAE-GAN: Fine-Grained Image Generation Through Asymmetric Training. 2017. arXiv:1703.10155  [cs.CV]. cite arXiv模板填寫了不支持的參數 (幫助)

[22] Gao, Rui; Hou, Xingsong; Qin, Jie; Chen, Jiaxin; Liu, Li; Zhu, Fan; Zhang, Zhao; Shao, Ling. Zero-VAE-GAN: Generating Unseen Features for Generalized and Transductive Zero-Shot Learning. IEEE Transactions on Image Processing. 2020, 29: 3665–3680 [2023-02-24]. Bibcode:2020ITIP...29.3665G. ISSN 1941-0042. PMID 31940538. S2CID 210334032. doi:10.1109/TIP.2020.2964429. （原始內容存檔於2021-06-28）.

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]

[21]

[22]