diff --git a/_post_refer_img/TextAnalytics/10-01.png b/_post_refer_img/TextAnalytics/10-01.png index 93474778a..9e0140a4b 100644 Binary files a/_post_refer_img/TextAnalytics/10-01.png and b/_post_refer_img/TextAnalytics/10-01.png differ diff --git a/_post_refer_img/TextAnalytics/10-02.png b/_post_refer_img/TextAnalytics/10-02.png new file mode 100644 index 000000000..1d217880c Binary files /dev/null and b/_post_refer_img/TextAnalytics/10-02.png differ diff --git a/_post_refer_img/TextAnalytics/10-03.png b/_post_refer_img/TextAnalytics/10-03.png new file mode 100644 index 000000000..4540a8f05 Binary files /dev/null and b/_post_refer_img/TextAnalytics/10-03.png differ diff --git a/_post_refer_img/TextAnalytics/10-04.png b/_post_refer_img/TextAnalytics/10-04.png new file mode 100644 index 000000000..5958d2ab2 Binary files /dev/null and b/_post_refer_img/TextAnalytics/10-04.png differ diff --git a/_post_refer_img/TextAnalytics/10-05.png b/_post_refer_img/TextAnalytics/10-05.png new file mode 100644 index 000000000..c5aeddd5a Binary files /dev/null and b/_post_refer_img/TextAnalytics/10-05.png differ diff --git a/_posts/BayesianModeling/2024-07-17-MCMC.md b/_posts/BayesianModeling/2024-07-17-MCMC.md index 3ff4cca1d..f9674b0e4 100644 --- a/_posts/BayesianModeling/2024-07-17-MCMC.md +++ b/_posts/BayesianModeling/2024-07-17-MCMC.md @@ -244,6 +244,6 @@ image: ----- -### 이미지 출처 +### Reference - https://www.statlect.com/fundamentals-of-statistics/Markov-Chain-Monte-Carlo \ No newline at end of file diff --git a/_posts/BayesianModeling/2024-07-20-Multi_Armed_Bandits.md b/_posts/BayesianModeling/2024-07-20-Multi_Armed_Bandits.md index 84cc96f9f..bc18a1490 100644 --- a/_posts/BayesianModeling/2024-07-20-Multi_Armed_Bandits.md +++ b/_posts/BayesianModeling/2024-07-20-Multi_Armed_Bandits.md @@ -121,7 +121,7 @@ image: ----- -### 이미지 출처 +### Reference - https://multithreaded.stitchfix.com/blog/2020/08/05/bandits/ - https://link.springer.com/article/10.1007/s10489-023-04955-0?fromPaywallRec=false \ No newline at end of file diff --git a/_posts/DeepLearning/2024-01-24-Optimizer.md b/_posts/DeepLearning/2024-01-24-Optimizer.md index 778482284..934844619 100644 --- a/_posts/DeepLearning/2024-01-24-Optimizer.md +++ b/_posts/DeepLearning/2024-01-24-Optimizer.md @@ -229,6 +229,6 @@ $$ ----- -### 이미지 출처 +### Reference - https://towardsdatascience.com/an-intuitive-explanation-of-gradient-descent-83adf68c9c33 \ No newline at end of file diff --git a/_posts/DeepLearning/2024-01-26-RNN.md b/_posts/DeepLearning/2024-01-26-RNN.md index 25173d0b4..bce6358af 100644 --- a/_posts/DeepLearning/2024-01-26-RNN.md +++ b/_posts/DeepLearning/2024-01-26-RNN.md @@ -113,6 +113,6 @@ image: ----- -### 이미지 출처 +### Reference - https://dgkim5360.tistory.com/entry/understanding-long-short-term-memory-lstm-kr \ No newline at end of file diff --git a/_posts/ImageAnalytics/2024-08-16-VAE.md b/_posts/ImageAnalytics/2024-08-16-VAE.md index ed2e9ce73..75a03695a 100644 --- a/_posts/ImageAnalytics/2024-08-16-VAE.md +++ b/_posts/ImageAnalytics/2024-08-16-VAE.md @@ -182,6 +182,6 @@ image: ----- -### 이미지 출처 +### Reference - https://velog.io/@jochedda/%EB%94%A5%EB%9F%AC%EB%8B%9D-Autoencoder-%EA%B0%9C%EB%85%90-%EB%B0%8F-%EC%A2%85%EB%A5%98 \ No newline at end of file diff --git a/_posts/MachineLearning/2024-01-01-Data_Science.md b/_posts/MachineLearning/2024-01-01-Data_Science.md index 5b9e4bc6f..175dead25 100644 --- a/_posts/MachineLearning/2024-01-01-Data_Science.md +++ b/_posts/MachineLearning/2024-01-01-Data_Science.md @@ -141,7 +141,7 @@ image: ----- -### 이미지 출처 +### Reference - https://www.spotfire.com/glossary/what-is-regression-analysis - https://www.engati.com/glossary/classification-algorithm diff --git a/_posts/MachineLearning/2024-01-05-k_NN.md b/_posts/MachineLearning/2024-01-05-k_NN.md index e748ef570..dc56769be 100644 --- a/_posts/MachineLearning/2024-01-05-k_NN.md +++ b/_posts/MachineLearning/2024-01-05-k_NN.md @@ -153,6 +153,6 @@ image: ----- -### 이미지 출처 +### Reference - https://076923.github.io/posts/Python-opencv-43/ \ No newline at end of file diff --git a/_posts/MachineLearning/2024-01-06-SVM.md b/_posts/MachineLearning/2024-01-06-SVM.md index 73135a350..717b7b308 100644 --- a/_posts/MachineLearning/2024-01-06-SVM.md +++ b/_posts/MachineLearning/2024-01-06-SVM.md @@ -424,7 +424,7 @@ f(\overrightarrow{q}) ----- -## 이미지 출처 +## Reference - https://velog.io/@shlee0125 - https://medium.com/@niousha.rf/support-vector-regressor-theory-and-coding-exercise-in-python-ca6a7dfda927 \ No newline at end of file diff --git a/_posts/MachineLearning/2024-01-07-Clustering.md b/_posts/MachineLearning/2024-01-07-Clustering.md index 3eeb7d45d..377613c62 100644 --- a/_posts/MachineLearning/2024-01-07-Clustering.md +++ b/_posts/MachineLearning/2024-01-07-Clustering.md @@ -143,7 +143,7 @@ image: ----- -### 이미지 출처 +### Reference - https://www.scaler.com/topics/supervised-and-unsupervised-learning/ - https://towardsdatascience.com/a-brief-introduction-to-unsupervised-learning-20db46445283 diff --git a/_posts/MachineLearning/2024-01-08-k_Means.md b/_posts/MachineLearning/2024-01-08-k_Means.md index 4edac0275..91d1b99f6 100644 --- a/_posts/MachineLearning/2024-01-08-k_Means.md +++ b/_posts/MachineLearning/2024-01-08-k_Means.md @@ -91,7 +91,7 @@ image: ----- -### 이미지 출처 +### Reference - https://ai-times.tistory.com/158 - https://github.com/pilsung-kang/multivariate-data-analysis/blob/master/09%20Clustering/09-2_K-Means%20Clustering.pdf diff --git a/_posts/MachineLearning/2024-01-09-DBSCAN.md b/_posts/MachineLearning/2024-01-09-DBSCAN.md index 31b34c7ab..e353617d5 100644 --- a/_posts/MachineLearning/2024-01-09-DBSCAN.md +++ b/_posts/MachineLearning/2024-01-09-DBSCAN.md @@ -87,7 +87,7 @@ $$ ----- -### 이미지 출처 +### Reference - https://ai.plainenglish.io/dbscan-density-based-clustering-aaebd76e2c8c - https://journals.sagepub.com/doi/10.1177/1748301817735665 \ No newline at end of file diff --git a/_posts/MachineLearning/2024-01-10-Hierarchical_Clustering.md b/_posts/MachineLearning/2024-01-10-Hierarchical_Clustering.md index 02b81c157..dbb89b916 100644 --- a/_posts/MachineLearning/2024-01-10-Hierarchical_Clustering.md +++ b/_posts/MachineLearning/2024-01-10-Hierarchical_Clustering.md @@ -103,7 +103,7 @@ image: ----- -### 이미지 출처 +### Reference - https://towardsdatascience.com/hierarchical-clustering-explained-e59b13846da8 - https://harshsharma1091996.medium.com/hierarchical-clustering-996745fe656b \ No newline at end of file diff --git a/_posts/MachineLearning/2024-01-12-PCA.md b/_posts/MachineLearning/2024-01-12-PCA.md index b2c6ad37f..8d29ff274 100644 --- a/_posts/MachineLearning/2024-01-12-PCA.md +++ b/_posts/MachineLearning/2024-01-12-PCA.md @@ -209,6 +209,6 @@ image: ----- -### 이미지 출처 +### Reference - http://alexhwilliams.info/itsneuronalblog/2016/03/27/pca/ \ No newline at end of file diff --git a/_posts/Microeconomics/2019-07-15-What_Micro.md b/_posts/Microeconomics/2019-07-15-What_Micro.md index 5e98a3916..1e8961c50 100644 --- a/_posts/Microeconomics/2019-07-15-What_Micro.md +++ b/_posts/Microeconomics/2019-07-15-What_Micro.md @@ -202,7 +202,7 @@ image: ----- -### 이미지 출처 +### Reference - https://policonomics.com/supply-and-demand/ - https://thismatter.com/economics/supply.htm \ No newline at end of file diff --git a/_posts/Microeconomics/2019-07-16-Consumer_Theory_1.md b/_posts/Microeconomics/2019-07-16-Consumer_Theory_1.md index 0329169e0..72aee18f8 100644 --- a/_posts/Microeconomics/2019-07-16-Consumer_Theory_1.md +++ b/_posts/Microeconomics/2019-07-16-Consumer_Theory_1.md @@ -170,6 +170,6 @@ $$ ----- -### 이미지 출처 +### Reference - https://enotesworld.com/price-budget-line-or-budget-constraint/ \ No newline at end of file diff --git a/_posts/RecommenderSystem/2024-01-18-What_RecSys.md b/_posts/RecommenderSystem/2024-01-18-What_RecSys.md index ac115b8ab..a4f090863 100644 --- a/_posts/RecommenderSystem/2024-01-18-What_RecSys.md +++ b/_posts/RecommenderSystem/2024-01-18-What_RecSys.md @@ -118,7 +118,7 @@ image: ----- -### 이미지 출처 +### Reference - https://www.idownloadblog.com/2016/04/26/youtube-new-homepage-design/ - https://towardsdatascience.com/essentials-of-recommendation-engines-content-based-and-collaborative-filtering-31521c964922 \ No newline at end of file diff --git a/_posts/RecommenderSystem/2024-02-01-CF.md b/_posts/RecommenderSystem/2024-02-01-CF.md index 9359dff98..884903c1c 100644 --- a/_posts/RecommenderSystem/2024-02-01-CF.md +++ b/_posts/RecommenderSystem/2024-02-01-CF.md @@ -109,7 +109,7 @@ image: ----- -### 이미지 출처 +### Reference - https://towardsdatascience.com/essentials-of-recommendation-engines-content-based-and-collaborative-filtering-31521c964922 - https://buomsoo-kim.github.io/recommender%20systems/2020/09/25/Recommender-systems-collab-filtering-12.md/ \ No newline at end of file diff --git a/_posts/RecommenderSystem/2024-02-15-LFM.md b/_posts/RecommenderSystem/2024-02-15-LFM.md index f27093964..a7dfbfb64 100644 --- a/_posts/RecommenderSystem/2024-02-15-LFM.md +++ b/_posts/RecommenderSystem/2024-02-15-LFM.md @@ -126,7 +126,7 @@ image: ----- -### 이미지 출처 +### Reference - https://towardsdatascience.com/essentials-of-recommendation-engines-content-based-and-collaborative-filtering-31521c964922 - https://buomsoo-kim.github.io/recommender%20systems/2020/09/25/Recommender-systems-collab-filtering-12.md/ \ No newline at end of file diff --git a/_posts/RegressionAnalysis/2024-07-09-Regression_Coefficient_Estimation.md b/_posts/RegressionAnalysis/2024-07-09-Regression_Coefficient_Estimation.md index 357d0e6d0..c9cfba0ed 100644 --- a/_posts/RegressionAnalysis/2024-07-09-Regression_Coefficient_Estimation.md +++ b/_posts/RegressionAnalysis/2024-07-09-Regression_Coefficient_Estimation.md @@ -261,6 +261,6 @@ image: ----- -### 이미지 출처 +### Reference - https://medium.com/@luvvaggarwal2002/linear-regression-in-machine-learning-9e8af948d3eb \ No newline at end of file diff --git a/_posts/RegressionAnalysis/2024-07-10-Multiple_Linear_Regression_Analysis.md b/_posts/RegressionAnalysis/2024-07-10-Multiple_Linear_Regression_Analysis.md index a3d562f98..c83e6a54f 100644 --- a/_posts/RegressionAnalysis/2024-07-10-Multiple_Linear_Regression_Analysis.md +++ b/_posts/RegressionAnalysis/2024-07-10-Multiple_Linear_Regression_Analysis.md @@ -185,6 +185,6 @@ image: ----- -### 이미지 출처 +### Reference - https://www.linkedin.com/pulse/understanding-linear-regression-basics-divyesh-sonar-snv4c/ \ No newline at end of file diff --git a/_posts/RegressionAnalysis/2024-07-13-Improvement_of_OLS.md b/_posts/RegressionAnalysis/2024-07-13-Improvement_of_OLS.md index 1f1f28533..f31614e59 100644 --- a/_posts/RegressionAnalysis/2024-07-13-Improvement_of_OLS.md +++ b/_posts/RegressionAnalysis/2024-07-13-Improvement_of_OLS.md @@ -166,7 +166,7 @@ $$\begin{aligned} ----- -### 이미지 출처 +### Reference - http://scott.fortmann-roe.com/docs/BiasVariance.html - https://github.com/lovit/python_ml_intro diff --git a/_posts/Statistics/2024-07-01-Statistics.md b/_posts/Statistics/2024-07-01-Statistics.md index 934d156be..b1166c75c 100644 --- a/_posts/Statistics/2024-07-01-Statistics.md +++ b/_posts/Statistics/2024-07-01-Statistics.md @@ -236,7 +236,7 @@ image: ----- -### 이미지 출처 +### Reference - https://thirdspacelearning.com/gcse-maths/statistics/frequency-table/ - https://www.jaspersoft.com/articles/what-is-a-bar-chart diff --git a/_posts/Statistics/2024-07-05-Statistical_Inference.md b/_posts/Statistics/2024-07-05-Statistical_Inference.md index 59882305c..4abb1bc20 100644 --- a/_posts/Statistics/2024-07-05-Statistical_Inference.md +++ b/_posts/Statistics/2024-07-05-Statistical_Inference.md @@ -313,7 +313,7 @@ image: ----- -### 이미지 출처 +### Reference - https://u5man.medium.com/to-err-is-human-what-the-heck-is-type-i-and-type-ii-error-b2c78190a45c - https://wikidocs.net/163986 \ No newline at end of file diff --git a/_posts/Statistics/2024-07-06-A_B_Test.md b/_posts/Statistics/2024-07-06-A_B_Test.md index f4e20240a..f79548e4c 100644 --- a/_posts/Statistics/2024-07-06-A_B_Test.md +++ b/_posts/Statistics/2024-07-06-A_B_Test.md @@ -325,6 +325,6 @@ image: ----- -### 이미지 출처 +### Reference - https://varify.io/en/blog/ab-testing/ \ No newline at end of file diff --git a/_posts/TextAnalytics/2024-07-29-Regular Expression.md b/_posts/TextAnalytics/2024-07-29-Regular Expression.md index 7e1eca9d1..e29968e3b 100644 --- a/_posts/TextAnalytics/2024-07-29-Regular Expression.md +++ b/_posts/TextAnalytics/2024-07-29-Regular Expression.md @@ -260,6 +260,6 @@ for result in results: ----- -### 이미지 출처 +### Reference - https://zephyrus1111.tistory.com/310 \ No newline at end of file diff --git a/_posts/TextAnalytics/2024-07-31-Word_Representation.md b/_posts/TextAnalytics/2024-07-31-Word_Representation.md index 36a6a87e0..e0d436bd5 100644 --- a/_posts/TextAnalytics/2024-07-31-Word_Representation.md +++ b/_posts/TextAnalytics/2024-07-31-Word_Representation.md @@ -167,7 +167,7 @@ image: ----- -### 이미지 출처 +### Reference - https://velog.io/@growthmindset/%EC%9B%90-%ED%95%AB-%EC%9D%B8%EC%BD%94%EB%94%A9One-Hot-Encoding - https://wikidocs.net/22660 diff --git a/_posts/TextAnalytics/2024-08-01-WORD2VEC_Improvements.md b/_posts/TextAnalytics/2024-08-01-WORD2VEC_Improvements.md index 0f47cc1a4..9901d62cf 100644 --- a/_posts/TextAnalytics/2024-08-01-WORD2VEC_Improvements.md +++ b/_posts/TextAnalytics/2024-08-01-WORD2VEC_Improvements.md @@ -176,7 +176,7 @@ image: ----- -### 이미지 출처 +### Reference - https://intoli.com/blog/pca-and-svd/ - https://github.com/dvgodoy/dl-visuals/ diff --git a/_posts/TextAnalytics/2024-08-04-Topic_Model.md b/_posts/TextAnalytics/2024-08-04-Topic_Model.md index aac8d9697..6e899a664 100644 --- a/_posts/TextAnalytics/2024-08-04-Topic_Model.md +++ b/_posts/TextAnalytics/2024-08-04-Topic_Model.md @@ -154,6 +154,6 @@ image: ----- -### 이미지 출처 +### Reference - https://intoli.com/blog/pca-and-svd/ \ No newline at end of file diff --git a/_posts/TextAnalytics/2024-08-05-SEQ2SEQ.md b/_posts/TextAnalytics/2024-08-05-SEQ2SEQ.md index e22f78ad7..a118ffd33 100644 --- a/_posts/TextAnalytics/2024-08-05-SEQ2SEQ.md +++ b/_posts/TextAnalytics/2024-08-05-SEQ2SEQ.md @@ -126,6 +126,6 @@ image: ----- -### 이미지 출처 +### Reference - https://yjjo.tistory.com/35 \ No newline at end of file diff --git a/_posts/TextAnalytics/2024-08-06-ATTN.md b/_posts/TextAnalytics/2024-08-06-ATTN.md index 01b701328..172aaed35 100644 --- a/_posts/TextAnalytics/2024-08-06-ATTN.md +++ b/_posts/TextAnalytics/2024-08-06-ATTN.md @@ -180,7 +180,7 @@ $$\begin{aligned} ----- -### 이미지 출처 +### Reference - https://www.linkedin.com/pulse/what-self-attention-impact-large-language-models-llm-nikhil-goel-srpbc - https://newsletter.theaiedge.io/p/the-multi-head-attention-mechanism diff --git a/_posts/TextAnalytics/2024-08-07-Transformer.md b/_posts/TextAnalytics/2024-08-07-Transformer.md index 92c7429da..5dcf0344c 100644 --- a/_posts/TextAnalytics/2024-08-07-Transformer.md +++ b/_posts/TextAnalytics/2024-08-07-Transformer.md @@ -14,9 +14,146 @@ image: ## Attention is all you need ----- -- **트랜스포머(Transformer)** : 순환 신경망을 배제하고 어텐션 메커니즘을 전적으로 활용한 기계 번역 아키텍처 +- **트랜스포머(Transformer)** : 시계열 데이터를 순차 입력 받는 RNN 계열 레이어를 배제하고 어텐션 메커니즘을 전적으로 활용하여 시계열 데이터의 병렬 처리를 도모하는 기계 번역 아키텍처 + +- **TOTAL ARCHITECTURE** : SEQ2SEQ 의 ENCODER-DECODER 구조를 따름 + + ![02](/_post_refer_img/TextAnalytics/10-03.png){: width="100%"} + +- **CORE TECHS** + + ![03](/_post_refer_img/TextAnalytics/10-02.png){: width="100%"} + + - **Token Embedding** : 입력 문장 내 단어들 각각의 정보를 표현하는 벡터를 생성함 + - **Positional Encoding** : 입력 문장 내 단어들의 위치 정보를 표현하는 벡터를 생성함 + - **Multi-Head Self Attention @ Encoder** : 인코더에서 입력 문장 내 단어들 간 관계를 학습하여 각 단어가 문장에서 어떤 역할을 하는지를 반영하는 문맥 벡터를 생성함 + - **Multi-Head Masked Self Attention @ Decoder** : 디코더에서 출력 문장 내 단어들 간 관계를 학습하여 각 단어의 문맥 벡터를 생성하되, 이전 순번까지의 단어만 참고하도록 마스킹하여 다음 순번에 관한 정보가 유출되는 것을 방지함 + - **Multi-Head Cross Attention @ Decoder** : 출력 문장의 각 단어가 인코더의 출력들과 맺는 관계를 학습하여 출력 문장 생성 시 개별 순번마다 입력 문장에서 어떤 부분이 중요한지 반영하는 문맥 벡터를 생성함 ## Positional Encoding ----- -![01](/_post_refer_img/TextAnalytics/10-01.png){: width="100%"} \ No newline at end of file +### Condition + +- **주기성(Periodicity)** : 벡터는 **단어 간 상대적 위치를** 표현할 수 있어야 함 + - 단어의 절대적 위치가 아니라 단어 간 상대적 위치에 따른 관계 패턴이 문장의 의미를 결정함 + - 어텐션 메커니즘은 각 단어가 서로를 참조하는 방식으로 정보를 처리함 + - 주기성을 띠는 함수가 관계 패턴을 결정짓는 상대적 위치를 표현하기에 효율적임 + +- **연속성(Continuity)** : 벡터는 연속적인 값을 가져야 함 + - 임의의 두 단어 순번 간 거리가 일정하다면, **벡터 간 거리도 일정해야 함** + - 임의의 두 단어 순번 간 위치가 비슷하다면, **벡터 값도 유사해야 함** + - 불연속적인 값이 존재하면 모형이 상대적 위치에 따른 관계 패턴을 학습하기 어려움 + +- **모형 독립성(Model Independence)** : 벡터는 모형 설계 방식과 상관없이 **일정한 방식으로** 위치 정보를 제공해야 함 + +- **시퀀스 스케일 불변성(Sequence Scale Invariance)** : 벡터는 시퀀스 길이에 상관없이 **일정한 정보 해상도를** 가져야 함 + +- **단어 임베딩과의 균형(Balance with Word Embedding)** : 벡터의 원소값은 단어의 의미 정보와 위치 정보가 **균형을 이룰 수 있는 범위 내에 존재해야 함** + - 원소값이 너무 크면 단어의 의미가 왜곡될 수 있음 + - 원소값이 너무 작으면 위치 정보가 무시될 수 있음 + +### Positional Encoding + +- **FUNCTION** + + ![01](/_post_refer_img/TextAnalytics/10-01.png){: width="100%"} + + $$\begin{aligned} + PE(POS,2i)&=\sin{\frac{POS}{10000^{2i/d}}}\\ + PE(POS,2i+1)&=\cos{\frac{POS}{10000^{2i/d}}} + \end{aligned}$$ + + - $POS$ : 문장 내 단어 순번 + - $d$ : 단어 임베딩 벡터 차원 + - $i=0,1,\cdots,\displaystyle\frac{d}{2}-1$ : 포지셔널 인코딩 벡터의 차원 인덱스 + +- **PERIODICITY** : 포지셔널 인코딩 벡터는 위치 간 관계(혹은 위치의 변화)를 부드러운(연속적인) 회전 변환 형태로 표현함 + + ![01](/_post_refer_img/TextAnalytics/10-05.png){: width="100%"} + + $$\begin{aligned} + \begin{pmatrix}\sin{\frac{POS+K}{10000^{2i/d}}} \\ \cos{\frac{POS+K}{10000^{2i/d}}}\end{pmatrix} + = \begin{pmatrix}\cos{\frac{K}{10000^{2i/d}}} & \sin{\frac{K}{10000^{2i/d}}} \\ -\sin{\frac{K}{10000^{2i/d}}} & \cos{\frac{K}{10000^{2i/d}}}\end{pmatrix} + \cdot \begin{pmatrix}\sin{\frac{POS}{10000^{2i/d}}} \\ \cos{\frac{POS}{10000^{2i/d}}}\end{pmatrix} + \end{aligned}$$ + + - $$\displaystyle\begin{pmatrix}\cos{\frac{K}{10000^{2i/d}}} & \sin{\frac{K}{10000^{2i/d}}} \\ -\sin{\frac{K}{10000^{2i/d}}} & \cos{\frac{K}{10000^{2i/d}}}\end{pmatrix}$$ : $2 \times 2$ Rotation Matrix + - 즉, 임베딩 차원 $d=2$ 일 때, 위치가 $K$ 만큼 이동하게 되면 벡터 공간 상에서 특정한 크기 $$\displaystyle\frac{K}{10000^{2i/d}}$$ 만큼의 회전 변환이 이루어짐 + +## Single Layers +----- + +![04](/_post_refer_img/TextAnalytics/10-04.png){: width="100%"} + +### Encoder Layer + +$$\begin{aligned} +\mathbf{X}^{(0)} +&=\text{Token-Embedding}\left(\text{Tokens}\right) + \text{Positional-Encoding}\left(\text{Tokens}\right)\\ +\mathbf{H}^{(k)} +&=\text{Layer-Norm}\Big[\text{Multi-Head}\left(\mathbf{X}^{(k)}\right) + \mathbf{X}^{(k)}\Big]\\ +\mathbf{Y}^{(k)} +&=\text{Layer-Norm}\Big[\text{FFN}\left(\mathbf{H}^{(k)}\right) + \mathbf{H}^{(k)}\Big] +\end{aligned}$$ + +- $\mathbf{X}$ is Input Data of Single Layer, $\mathbf{Y}$ is Output Data of Single Layer + - $$\mathbf{X}^{(k+1)}=\mathbf{Y}^{(k)}$$ : Input Data of $k+1$ Encoder Layer is Output Data of $k$ + - Input Data of Initial Layer $$\mathbf{X}^{(0)}$$ is Sum of Token Embedding & Positional Encoding Vector + - Output Data of Final Layer $$\mathbf{Z}=\mathbf{Y}^{(K)}$$ is Output of Encoder Module + +- $$\text{Multi-Head}\left(\mathbf{X}^{(k)}\right)$$ : Multi-Head Self Attention @ Encoder + +- $$\text{FFN}\left(\mathbf{H}^{(k)}\right)$$ : `F`eed-`F`orward `N`etworks @ Encoder + + $$\begin{aligned} + \text{FFN}\left(\mathbf{H}^{(k)}\right) + &=\mathbf{W}^{(k)}_{2} \cdot \left(\text{ReLU}\left[\mathbf{W}^{(k)}_{1} \cdot \mathbf{H}^{(k)} + \overrightarrow{\mathbf{b}}^{(k)}_{1}\right]\right) + \overrightarrow{\mathbf{b}}^{(k)}_{2} + \end{aligned}$$ + + - $\mathbf{W}^{(k)}_{1} \in \mathbb{R}^{M \times 4d}$ : **Dimension Expansion** to four times the Dimension of the Embedding Vector + - $\mathbf{W}^{(k)}_{2} \in \mathbb{R}^{M \times d}$ : **Dimension Reduction** to Embedding Vector Dimension + +### Decoder Layer + +$$\begin{aligned} +\mathcal{X}^{(0)} +&=\text{Token-Embedding}\left(\text{Tokens}\right) + \text{Positional-Encoding}\left(\text{Tokens}\right)\\ +\mathcal{H}^{(k)}_{1} +&=\text{Layer-Norm}\Big[\text{Multi-Head}\left(\mathcal{X}^{(k)};\mathcal{M}\right) + \mathcal{X}^{(k)}\Big]\\ +\mathcal{H}^{(k)}_{2} +&=\text{Layer-Norm}\Big[\text{Multi-Head}\left(\mathcal{H}^{(k)}_{1},\mathbf{Z},\mathbf{Z}\right) + \mathcal{H}^{(k)}_{1}\Big]\\ +\mathcal{Y}^{(k)} +&=\text{Layer-Norm}\Big[\text{FFN}\left(\mathcal{H}^{(k)}_{2}\right) + \mathcal{H}^{(k)}_{2}\Big] +\end{aligned}$$ + +- $\mathcal{X}$ is Input Data of Single Layer, $\mathcal{Y}$ is Output Data of Single Layer + - $$\mathcal{X}^{(k+1)}=\mathcal{Y}^{(k)}$$ : Input Data of $k+1$ Encoder Layer is Output Data of $k$ + - Input Data of Initial Layer $$\mathcal{X}^{(0)}$$ is Sum of Token Embedding & Positional Encoding Vector + - Output Data of Final Layer $$\mathcal{Z}=\mathcal{Y}^{(K)}$$ is Output of Decoder Module + +- $$\text{Multi-Head}\left(\mathcal{X}^{(k)};\mathcal{M}\right)$$ : Multi-Head Masked Self Attention @ Decoder + + - $\mathcal{M}$ : Causal Mask(Upper-triangular mask) + +- $$\text{Multi-Head}\left(\mathcal{H}^{(k)}_{1},\mathbf{Z},\mathbf{Z}\right)$$ : Multi-Head Cross Self Attention @ Decoder + + - $$\mathbf{Z}$$ : Output of Encoder Module + +- $$\text{FFN}\left(\mathcal{H}^{(k)}_{2}\right)$$ : `F`eed-`F`orward `N`etworks @ Decoder + + $$\begin{aligned} + \text{FFN}\left(\mathcal{H}^{(k)}_{2}\right) + &=\mathbf{W}^{(k)}_{2} \cdot \left(\text{ReLU}\left[\mathbf{W}^{(k)}_{1} \cdot \mathcal{H}^{(k)}_{2} + \overrightarrow{\mathbf{b}}^{(k)}_{1}\right]\right) + \overrightarrow{\mathbf{b}}^{(k)}_{2} + \end{aligned}$$ + + - $\mathbf{W}^{(k)}_{1} \in \mathbb{R}^{M \times 4d}$ : **Dimension Expansion** to four times the Dimension of the Embedding Vector + - $\mathbf{W}^{(k)}_{2} \in \mathbb{R}^{M \times d}$ : **Dimension Reduction** to Embedding Vector Dimension + +----- + +### Reference + +- https://zeuskwon-ds.tistory.com/88 +- https://bongholee.com/transformer-yoyag-jeongri-2/ +- https://wikidocs.net/162096 \ No newline at end of file