筆記：讀Attention Is All You Need

阿新 • • 發佈：2022-01-18

筆記：Attention Is All You Need

作者：Ashish Vaswani et al.,NIPS 2017.

1 Motivation

本文使用self-attention代替RNN/CNN來實現encoder和decoder，原因有以下兩點：

使用注意力機制相比RNN並行度高
使用注意力機制相比RNN能夠更好的抓取長距離依賴

2 Model

本文模型架構總體上來看仍然是seq2seq架構，具體為encoder->decoder，如下圖Figure1所示為模型整理架構。

2.1 Encoder

使用多層堆疊而成如圖Figure1文中N=6，每層包括兩個子層，分別為Multi-Head Attention層和FC全連線層。對比RNN建模的encoder(Bahdanau,D et al.,ICLR 2015.)，align model即attention 得分或相關性矩陣計算使用\(s_{i-1}和h_j\)

乘引數矩陣線性變換後相加得到，其中\(s_{i-1}\)為decoder(RNN實現)得隱狀態即decoder生成，\(h_j\)為encoder隱狀態即encoder生成。本文self-attention，Q、K、V分別對應RNN attention-encoder中的s、h、h，通過對輸入語句得word embedding做線性對映得到，這裡Q由於是self-attention來源於輸入語句而非decoder，相似度計算使用dot product點積方式。同時為了模擬RNN編碼時的位置資訊加入了positional embedding這裡也是一個重要的研究點文中採用sincos函式實現，原理沒懂，獲取positional embedding，結合方式即與input embedding相加，同時每個子層都是用了殘差連線和layer normalization層內的歸一化。

最後encoder(內部堆疊六層同樣的結構)的輸出作為decoder輸入的一部分即K,V。

2.2 Decoder

相比encoder有幾點不一樣，首先每個堆疊的層中多了一個子層，encoder-decoder attention層，其次輸入為output embedding即之前的decoder輸出，至於第一次的輸入以及其實輸入就是加了mask的監督資料的y，詳見參考。

經過其子層self-attention的輸出作為encoder-decoder attention的輸入即Q，相似度計算也是dot product。

2.4 小結

簡單記錄幾點，具體見參考的幾篇文章以及視訊，講的很細緻清晰易懂，我就不在這裡詳細記筆記了，記了也是複述大佬們的，至此程式碼層面都沒怎麼下功夫，有些點還是看看程式碼後理解的應該會更好一些。

3 Attention

詳細閱讀參考，自己想明白之後，再看下圖Figure 2很清晰。

參考

[1] Ashish Vaswani,Noam Shazeer,Niki Parmar,Jakob Uszkoreit,Llion Jones,Aidan N. Gomez,Łukasz Kaiser.Attention Is All You Need.NIPS 2017.

[2] Dzmitry Bahdanau,KyungHyun Cho,Yoshua Bengio.Neural Machine Translation by Jointly Learning to Align and Translate.ICLR 2015.

[3] 臺大李巨集毅21年機器學習課程 self-attention和transformer.https://www.bilibili.com/video/BV1Xp4y1b7ih?spm_id_from=333.1007.top_right_bar_window_custom_collection.content.click.

[4] 文哥的學習日記.一步步解析Attention is All You Need!.簡書 2018.https://www.jianshu.com/p/b1030350aadb.

[5] 後青春期的工程師.《attention is all you need》解讀.知乎 2019.https://zhuanlan.zhihu.com/p/34781297.

[6] soccer.Attention注意力機制與self-attention自注意力機制.知乎 2020.https://zhuanlan.zhihu.com/p/265108616.

筆記：讀Attention Is All You Need

筆記：Attention Is All You Need 作者：Ashish Vaswani et al.,NIPS 2017. 目錄 Motivation Model Attention

paper 4：Attention is all you need

原博連結:論文解讀:Attention is All you need - 知乎 (zhihu.com) Attention用於計算“相關程度”。

Attention Is All You Need

目錄概主要內容Positional Encodingauto_regressive額外的細節程式碼 Vaswani A., Shazeer N., Parmar N., Uszkoreit J., Jones L., Gomez A. N., and Kaiser L. Attention is all you need. In Advances in Neural

Transformer-Attention is all you need

Attention（注意力機制）圖片展示的Encoder-Decoder框架沒有體現“注意力模型”，可以把它看做是注意力不集中分心模型。因為在生成目標句子的單詞時，不論生成哪個單詞，它們使用的輸入句子的語義編碼C都是一樣的，

Attension Is All You Need

attention機制將整個句子作為輸入，從中抽取有用的資訊。每個輸出都跟整個句子優化，輸出的值為輸入的句子的詞向量的一個加權求和值。

One Solution is Not All You Need: Few-Shot Extrapolation via Structured MaxEnt RL草讀

One Solution is Not All You Need: Few-Shot Extrapolation via Structured MaxEnt RL（增強學習，針對小樣本魯棒性場景）

筆記：Hybrid Attention-Based Prototypical Networks for Noisy Few-Shot Relation Classification

Hybrid Attention-Based Prototypical Networks for Noisy Few-Shot Relation Classification 作者：Tianyu Gao et al., AAAI 2019.

執行react專案，npm run start/build, 報錯 There might be a problem with the project dependency tree. It is likely not a bug in Create React App, but something you need to fix locally.

如題：這個問題困擾了我半天，網上搜索各種解決方法，都沒能解決，最後仔細讀一遍原因才發現問題很簡單，就是版本不一致

筆記：讀Attention Is All You Need

筆記：Attention Is All You Need

作者：Ashish Vaswani et al.,NIPS 2017.

目錄

1 Motivation

2 Model

2.1 Encoder

2.2 Decoder

2.4 小結

3 Attention

參考

筆記：讀Attention Is All You Need

paper 4：Attention is all you need

Attention Is All You Need

Transformer-Attention is all you need

Attension Is All You Need

One Solution is Not All You Need: Few-Shot Extrapolation via Structured MaxEnt RL草讀

筆記：Hybrid Attention-Based Prototypical Networks for Noisy Few-Shot Relation Classification

執行react專案，npm run start/build, 報錯 There might be a problem with the project dependency tree. It is likely not a bug in Create React App, but something you need to fix locally.

論文筆記：WWW 2019 Heterogeneous Graph Attention Network

《The Matrix Calculus You Need For Deep Learning》讀書筆記

論文筆記：Enhancing Pre-trained Chinese Character Representation with Word-aligned Attention

Debug筆記：解決AttributeError: ‘bool‘ object has no attribute ‘all‘

新增MySq出現The ‘InnoDB‘ feature is disabled； you need MySQL built with ‘InnoDB‘ to have it working

Python常見錯誤：ValueError: If using all scalar values, you must pass an index（四種解決方案）

筆記：Attention-Based Bidirectional Long Short-Term Memory Networks for Relation Classification

筆記：Neural Relation Extraction with Selective Attention over Instances

【ICRA 2021】【簡讀】論文閱讀： Graph Attention Spatio-temporal Convolutional Network for 3D Human Pose Estimation in Video

筆記：Position-aware Attention and Supervised Data Improve Slot Filling

NLP文字分類學習筆記5：帶attention的文字分類

magento2.4 管理員登入錯誤：You need to configure Two-Factor Authorization in order to proceed to your s

筆記：讀Attention Is All You Need

筆記：Attention Is All You Need

作者：Ashish Vaswani et al.,NIPS 2017.

目錄

1 Motivation

2 Model

2.1 Encoder

2.2 Decoder

2.4 小結

3 Attention

參考

相關推薦