diff --git a/_post_refer_img/TextAnalytics/10-01.png b/_post_refer_img/TextAnalytics/10-01.png
index 93474778a..9e0140a4b 100644
Binary files a/_post_refer_img/TextAnalytics/10-01.png and b/_post_refer_img/TextAnalytics/10-01.png differ
diff --git a/_post_refer_img/TextAnalytics/10-02.png b/_post_refer_img/TextAnalytics/10-02.png
new file mode 100644
index 000000000..1d217880c
Binary files /dev/null and b/_post_refer_img/TextAnalytics/10-02.png differ
diff --git a/_post_refer_img/TextAnalytics/10-03.png b/_post_refer_img/TextAnalytics/10-03.png
new file mode 100644
index 000000000..4540a8f05
Binary files /dev/null and b/_post_refer_img/TextAnalytics/10-03.png differ
diff --git a/_post_refer_img/TextAnalytics/10-04.png b/_post_refer_img/TextAnalytics/10-04.png
new file mode 100644
index 000000000..5958d2ab2
Binary files /dev/null and b/_post_refer_img/TextAnalytics/10-04.png differ
diff --git a/_post_refer_img/TextAnalytics/10-05.png b/_post_refer_img/TextAnalytics/10-05.png
new file mode 100644
index 000000000..c5aeddd5a
Binary files /dev/null and b/_post_refer_img/TextAnalytics/10-05.png differ
diff --git a/_posts/BayesianModeling/2024-07-17-MCMC.md b/_posts/BayesianModeling/2024-07-17-MCMC.md
index 3ff4cca1d..f9674b0e4 100644
--- a/_posts/BayesianModeling/2024-07-17-MCMC.md
+++ b/_posts/BayesianModeling/2024-07-17-MCMC.md
@@ -244,6 +244,6 @@ image:
 
 -----
 
-### 이미지 출처
+### Reference
 
 - https://www.statlect.com/fundamentals-of-statistics/Markov-Chain-Monte-Carlo
\ No newline at end of file
diff --git a/_posts/BayesianModeling/2024-07-20-Multi_Armed_Bandits.md b/_posts/BayesianModeling/2024-07-20-Multi_Armed_Bandits.md
index 84cc96f9f..bc18a1490 100644
--- a/_posts/BayesianModeling/2024-07-20-Multi_Armed_Bandits.md
+++ b/_posts/BayesianModeling/2024-07-20-Multi_Armed_Bandits.md
@@ -121,7 +121,7 @@ image:
 
 -----
 
-### 이미지 출처
+### Reference
 
 - https://multithreaded.stitchfix.com/blog/2020/08/05/bandits/
 - https://link.springer.com/article/10.1007/s10489-023-04955-0?fromPaywallRec=false
\ No newline at end of file
diff --git a/_posts/DeepLearning/2024-01-24-Optimizer.md b/_posts/DeepLearning/2024-01-24-Optimizer.md
index 778482284..934844619 100644
--- a/_posts/DeepLearning/2024-01-24-Optimizer.md
+++ b/_posts/DeepLearning/2024-01-24-Optimizer.md
@@ -229,6 +229,6 @@ $$
 
 -----
 
-### 이미지 출처
+### Reference
 
 - https://towardsdatascience.com/an-intuitive-explanation-of-gradient-descent-83adf68c9c33
\ No newline at end of file
diff --git a/_posts/DeepLearning/2024-01-26-RNN.md b/_posts/DeepLearning/2024-01-26-RNN.md
index 25173d0b4..bce6358af 100644
--- a/_posts/DeepLearning/2024-01-26-RNN.md
+++ b/_posts/DeepLearning/2024-01-26-RNN.md
@@ -113,6 +113,6 @@ image:
 
 -----
 
-### 이미지 출처
+### Reference
 
 - https://dgkim5360.tistory.com/entry/understanding-long-short-term-memory-lstm-kr
\ No newline at end of file
diff --git a/_posts/ImageAnalytics/2024-08-16-VAE.md b/_posts/ImageAnalytics/2024-08-16-VAE.md
index ed2e9ce73..75a03695a 100644
--- a/_posts/ImageAnalytics/2024-08-16-VAE.md
+++ b/_posts/ImageAnalytics/2024-08-16-VAE.md
@@ -182,6 +182,6 @@ image:
 
 -----
 
-### 이미지 출처
+### Reference
 
 - https://velog.io/@jochedda/%EB%94%A5%EB%9F%AC%EB%8B%9D-Autoencoder-%EA%B0%9C%EB%85%90-%EB%B0%8F-%EC%A2%85%EB%A5%98
\ No newline at end of file
diff --git a/_posts/MachineLearning/2024-01-01-Data_Science.md b/_posts/MachineLearning/2024-01-01-Data_Science.md
index 5b9e4bc6f..175dead25 100644
--- a/_posts/MachineLearning/2024-01-01-Data_Science.md
+++ b/_posts/MachineLearning/2024-01-01-Data_Science.md
@@ -141,7 +141,7 @@ image:
 
 -----
 
-### 이미지 출처
+### Reference
 
 - https://www.spotfire.com/glossary/what-is-regression-analysis
 - https://www.engati.com/glossary/classification-algorithm
diff --git a/_posts/MachineLearning/2024-01-05-k_NN.md b/_posts/MachineLearning/2024-01-05-k_NN.md
index e748ef570..dc56769be 100644
--- a/_posts/MachineLearning/2024-01-05-k_NN.md
+++ b/_posts/MachineLearning/2024-01-05-k_NN.md
@@ -153,6 +153,6 @@ image:
 
 -----
 
-### 이미지 출처
+### Reference
 
 - https://076923.github.io/posts/Python-opencv-43/
\ No newline at end of file
diff --git a/_posts/MachineLearning/2024-01-06-SVM.md b/_posts/MachineLearning/2024-01-06-SVM.md
index 73135a350..717b7b308 100644
--- a/_posts/MachineLearning/2024-01-06-SVM.md
+++ b/_posts/MachineLearning/2024-01-06-SVM.md
@@ -424,7 +424,7 @@ f(\overrightarrow{q})
 
 -----
 
-## 이미지 출처
+## Reference
 
 - https://velog.io/@shlee0125
 - https://medium.com/@niousha.rf/support-vector-regressor-theory-and-coding-exercise-in-python-ca6a7dfda927
\ No newline at end of file
diff --git a/_posts/MachineLearning/2024-01-07-Clustering.md b/_posts/MachineLearning/2024-01-07-Clustering.md
index 3eeb7d45d..377613c62 100644
--- a/_posts/MachineLearning/2024-01-07-Clustering.md
+++ b/_posts/MachineLearning/2024-01-07-Clustering.md
@@ -143,7 +143,7 @@ image:
 
 -----
 
-### 이미지 출처
+### Reference
 
 - https://www.scaler.com/topics/supervised-and-unsupervised-learning/
 - https://towardsdatascience.com/a-brief-introduction-to-unsupervised-learning-20db46445283
diff --git a/_posts/MachineLearning/2024-01-08-k_Means.md b/_posts/MachineLearning/2024-01-08-k_Means.md
index 4edac0275..91d1b99f6 100644
--- a/_posts/MachineLearning/2024-01-08-k_Means.md
+++ b/_posts/MachineLearning/2024-01-08-k_Means.md
@@ -91,7 +91,7 @@ image:
 
 -----
 
-### 이미지 출처
+### Reference
 
 - https://ai-times.tistory.com/158
 - https://github.com/pilsung-kang/multivariate-data-analysis/blob/master/09%20Clustering/09-2_K-Means%20Clustering.pdf
diff --git a/_posts/MachineLearning/2024-01-09-DBSCAN.md b/_posts/MachineLearning/2024-01-09-DBSCAN.md
index 31b34c7ab..e353617d5 100644
--- a/_posts/MachineLearning/2024-01-09-DBSCAN.md
+++ b/_posts/MachineLearning/2024-01-09-DBSCAN.md
@@ -87,7 +87,7 @@ $$
 
 -----
 
-### 이미지 출처
+### Reference
 
 - https://ai.plainenglish.io/dbscan-density-based-clustering-aaebd76e2c8c
 - https://journals.sagepub.com/doi/10.1177/1748301817735665
\ No newline at end of file
diff --git a/_posts/MachineLearning/2024-01-10-Hierarchical_Clustering.md b/_posts/MachineLearning/2024-01-10-Hierarchical_Clustering.md
index 02b81c157..dbb89b916 100644
--- a/_posts/MachineLearning/2024-01-10-Hierarchical_Clustering.md
+++ b/_posts/MachineLearning/2024-01-10-Hierarchical_Clustering.md
@@ -103,7 +103,7 @@ image:
 
 -----
 
-### 이미지 출처
+### Reference
 
 - https://towardsdatascience.com/hierarchical-clustering-explained-e59b13846da8
 - https://harshsharma1091996.medium.com/hierarchical-clustering-996745fe656b
\ No newline at end of file
diff --git a/_posts/MachineLearning/2024-01-12-PCA.md b/_posts/MachineLearning/2024-01-12-PCA.md
index b2c6ad37f..8d29ff274 100644
--- a/_posts/MachineLearning/2024-01-12-PCA.md
+++ b/_posts/MachineLearning/2024-01-12-PCA.md
@@ -209,6 +209,6 @@ image:
 
 -----
 
-### 이미지 출처
+### Reference
 
 - http://alexhwilliams.info/itsneuronalblog/2016/03/27/pca/
\ No newline at end of file
diff --git a/_posts/Microeconomics/2019-07-15-What_Micro.md b/_posts/Microeconomics/2019-07-15-What_Micro.md
index 5e98a3916..1e8961c50 100644
--- a/_posts/Microeconomics/2019-07-15-What_Micro.md
+++ b/_posts/Microeconomics/2019-07-15-What_Micro.md
@@ -202,7 +202,7 @@ image:
 
 -----
 
-### 이미지 출처
+### Reference
 
 - https://policonomics.com/supply-and-demand/
 - https://thismatter.com/economics/supply.htm
\ No newline at end of file
diff --git a/_posts/Microeconomics/2019-07-16-Consumer_Theory_1.md b/_posts/Microeconomics/2019-07-16-Consumer_Theory_1.md
index 0329169e0..72aee18f8 100644
--- a/_posts/Microeconomics/2019-07-16-Consumer_Theory_1.md
+++ b/_posts/Microeconomics/2019-07-16-Consumer_Theory_1.md
@@ -170,6 +170,6 @@ $$
 
 -----
 
-### 이미지 출처
+### Reference
 
 - https://enotesworld.com/price-budget-line-or-budget-constraint/
\ No newline at end of file
diff --git a/_posts/RecommenderSystem/2024-01-18-What_RecSys.md b/_posts/RecommenderSystem/2024-01-18-What_RecSys.md
index ac115b8ab..a4f090863 100644
--- a/_posts/RecommenderSystem/2024-01-18-What_RecSys.md
+++ b/_posts/RecommenderSystem/2024-01-18-What_RecSys.md
@@ -118,7 +118,7 @@ image:
 
 -----
 
-### 이미지 출처
+### Reference
 
 - https://www.idownloadblog.com/2016/04/26/youtube-new-homepage-design/
 - https://towardsdatascience.com/essentials-of-recommendation-engines-content-based-and-collaborative-filtering-31521c964922
\ No newline at end of file
diff --git a/_posts/RecommenderSystem/2024-02-01-CF.md b/_posts/RecommenderSystem/2024-02-01-CF.md
index 9359dff98..884903c1c 100644
--- a/_posts/RecommenderSystem/2024-02-01-CF.md
+++ b/_posts/RecommenderSystem/2024-02-01-CF.md
@@ -109,7 +109,7 @@ image:
 
 -----
 
-### 이미지 출처
+### Reference
 
 - https://towardsdatascience.com/essentials-of-recommendation-engines-content-based-and-collaborative-filtering-31521c964922
 - https://buomsoo-kim.github.io/recommender%20systems/2020/09/25/Recommender-systems-collab-filtering-12.md/
\ No newline at end of file
diff --git a/_posts/RecommenderSystem/2024-02-15-LFM.md b/_posts/RecommenderSystem/2024-02-15-LFM.md
index f27093964..a7dfbfb64 100644
--- a/_posts/RecommenderSystem/2024-02-15-LFM.md
+++ b/_posts/RecommenderSystem/2024-02-15-LFM.md
@@ -126,7 +126,7 @@ image:
 
 -----
 
-### 이미지 출처
+### Reference
 
 - https://towardsdatascience.com/essentials-of-recommendation-engines-content-based-and-collaborative-filtering-31521c964922
 - https://buomsoo-kim.github.io/recommender%20systems/2020/09/25/Recommender-systems-collab-filtering-12.md/
\ No newline at end of file
diff --git a/_posts/RegressionAnalysis/2024-07-09-Regression_Coefficient_Estimation.md b/_posts/RegressionAnalysis/2024-07-09-Regression_Coefficient_Estimation.md
index 357d0e6d0..c9cfba0ed 100644
--- a/_posts/RegressionAnalysis/2024-07-09-Regression_Coefficient_Estimation.md
+++ b/_posts/RegressionAnalysis/2024-07-09-Regression_Coefficient_Estimation.md
@@ -261,6 +261,6 @@ image:
 
 -----
 
-### 이미지 출처
+### Reference
 
 - https://medium.com/@luvvaggarwal2002/linear-regression-in-machine-learning-9e8af948d3eb
\ No newline at end of file
diff --git a/_posts/RegressionAnalysis/2024-07-10-Multiple_Linear_Regression_Analysis.md b/_posts/RegressionAnalysis/2024-07-10-Multiple_Linear_Regression_Analysis.md
index a3d562f98..c83e6a54f 100644
--- a/_posts/RegressionAnalysis/2024-07-10-Multiple_Linear_Regression_Analysis.md
+++ b/_posts/RegressionAnalysis/2024-07-10-Multiple_Linear_Regression_Analysis.md
@@ -185,6 +185,6 @@ image:
 
 -----
 
-### 이미지 출처
+### Reference
 
 - https://www.linkedin.com/pulse/understanding-linear-regression-basics-divyesh-sonar-snv4c/
\ No newline at end of file
diff --git a/_posts/RegressionAnalysis/2024-07-13-Improvement_of_OLS.md b/_posts/RegressionAnalysis/2024-07-13-Improvement_of_OLS.md
index 1f1f28533..f31614e59 100644
--- a/_posts/RegressionAnalysis/2024-07-13-Improvement_of_OLS.md
+++ b/_posts/RegressionAnalysis/2024-07-13-Improvement_of_OLS.md
@@ -166,7 +166,7 @@ $$\begin{aligned}
 
 -----
 
-### 이미지 출처
+### Reference
 
 - http://scott.fortmann-roe.com/docs/BiasVariance.html
 - https://github.com/lovit/python_ml_intro
diff --git a/_posts/Statistics/2024-07-01-Statistics.md b/_posts/Statistics/2024-07-01-Statistics.md
index 934d156be..b1166c75c 100644
--- a/_posts/Statistics/2024-07-01-Statistics.md
+++ b/_posts/Statistics/2024-07-01-Statistics.md
@@ -236,7 +236,7 @@ image:
 
 -----
 
-### 이미지 출처
+### Reference
 
 - https://thirdspacelearning.com/gcse-maths/statistics/frequency-table/
 - https://www.jaspersoft.com/articles/what-is-a-bar-chart
diff --git a/_posts/Statistics/2024-07-05-Statistical_Inference.md b/_posts/Statistics/2024-07-05-Statistical_Inference.md
index 59882305c..4abb1bc20 100644
--- a/_posts/Statistics/2024-07-05-Statistical_Inference.md
+++ b/_posts/Statistics/2024-07-05-Statistical_Inference.md
@@ -313,7 +313,7 @@ image:
 
 -----
 
-### 이미지 출처
+### Reference
 
 - https://u5man.medium.com/to-err-is-human-what-the-heck-is-type-i-and-type-ii-error-b2c78190a45c
 - https://wikidocs.net/163986
\ No newline at end of file
diff --git a/_posts/Statistics/2024-07-06-A_B_Test.md b/_posts/Statistics/2024-07-06-A_B_Test.md
index f4e20240a..f79548e4c 100644
--- a/_posts/Statistics/2024-07-06-A_B_Test.md
+++ b/_posts/Statistics/2024-07-06-A_B_Test.md
@@ -325,6 +325,6 @@ image:
 
 -----
 
-### 이미지 출처
+### Reference
 
 - https://varify.io/en/blog/ab-testing/
\ No newline at end of file
diff --git a/_posts/TextAnalytics/2024-07-29-Regular Expression.md b/_posts/TextAnalytics/2024-07-29-Regular Expression.md
index 7e1eca9d1..e29968e3b 100644
--- a/_posts/TextAnalytics/2024-07-29-Regular Expression.md	
+++ b/_posts/TextAnalytics/2024-07-29-Regular Expression.md	
@@ -260,6 +260,6 @@ for result in results:
 
 -----
 
-### 이미지 출처
+### Reference
 
 - https://zephyrus1111.tistory.com/310
\ No newline at end of file
diff --git a/_posts/TextAnalytics/2024-07-31-Word_Representation.md b/_posts/TextAnalytics/2024-07-31-Word_Representation.md
index 36a6a87e0..e0d436bd5 100644
--- a/_posts/TextAnalytics/2024-07-31-Word_Representation.md
+++ b/_posts/TextAnalytics/2024-07-31-Word_Representation.md
@@ -167,7 +167,7 @@ image:
 
 -----
 
-### 이미지 출처
+### Reference
 
 - https://velog.io/@growthmindset/%EC%9B%90-%ED%95%AB-%EC%9D%B8%EC%BD%94%EB%94%A9One-Hot-Encoding
 - https://wikidocs.net/22660
diff --git a/_posts/TextAnalytics/2024-08-01-WORD2VEC_Improvements.md b/_posts/TextAnalytics/2024-08-01-WORD2VEC_Improvements.md
index 0f47cc1a4..9901d62cf 100644
--- a/_posts/TextAnalytics/2024-08-01-WORD2VEC_Improvements.md
+++ b/_posts/TextAnalytics/2024-08-01-WORD2VEC_Improvements.md
@@ -176,7 +176,7 @@ image:
 
 -----
 
-### 이미지 출처
+### Reference
 
 - https://intoli.com/blog/pca-and-svd/
 - https://github.com/dvgodoy/dl-visuals/
diff --git a/_posts/TextAnalytics/2024-08-04-Topic_Model.md b/_posts/TextAnalytics/2024-08-04-Topic_Model.md
index aac8d9697..6e899a664 100644
--- a/_posts/TextAnalytics/2024-08-04-Topic_Model.md
+++ b/_posts/TextAnalytics/2024-08-04-Topic_Model.md
@@ -154,6 +154,6 @@ image:
 
 -----
 
-### 이미지 출처
+### Reference
 
 - https://intoli.com/blog/pca-and-svd/
\ No newline at end of file
diff --git a/_posts/TextAnalytics/2024-08-05-SEQ2SEQ.md b/_posts/TextAnalytics/2024-08-05-SEQ2SEQ.md
index e22f78ad7..a118ffd33 100644
--- a/_posts/TextAnalytics/2024-08-05-SEQ2SEQ.md
+++ b/_posts/TextAnalytics/2024-08-05-SEQ2SEQ.md
@@ -126,6 +126,6 @@ image:
 
 -----
 
-### 이미지 출처
+### Reference
 
 - https://yjjo.tistory.com/35
\ No newline at end of file
diff --git a/_posts/TextAnalytics/2024-08-06-ATTN.md b/_posts/TextAnalytics/2024-08-06-ATTN.md
index 01b701328..172aaed35 100644
--- a/_posts/TextAnalytics/2024-08-06-ATTN.md
+++ b/_posts/TextAnalytics/2024-08-06-ATTN.md
@@ -180,7 +180,7 @@ $$\begin{aligned}
 
 -----
 
-### 이미지 출처
+### Reference
 
 - https://www.linkedin.com/pulse/what-self-attention-impact-large-language-models-llm-nikhil-goel-srpbc
 - https://newsletter.theaiedge.io/p/the-multi-head-attention-mechanism
diff --git a/_posts/TextAnalytics/2024-08-07-Transformer.md b/_posts/TextAnalytics/2024-08-07-Transformer.md
index 92c7429da..5dcf0344c 100644
--- a/_posts/TextAnalytics/2024-08-07-Transformer.md
+++ b/_posts/TextAnalytics/2024-08-07-Transformer.md
@@ -14,9 +14,146 @@ image:
 ## Attention is all you need
 -----
 
-- **트랜스포머(Transformer)** : 순환 신경망을 배제하고 어텐션 메커니즘을 전적으로 활용한 기계 번역 아키텍처
+- **트랜스포머(Transformer)** : 시계열 데이터를 순차 입력 받는 RNN 계열 레이어를 배제하고 어텐션 메커니즘을 전적으로 활용하여 시계열 데이터의 병렬 처리를 도모하는 기계 번역 아키텍처
+
+- **TOTAL ARCHITECTURE** : SEQ2SEQ 의 ENCODER-DECODER 구조를 따름
+
+    ![02](/_post_refer_img/TextAnalytics/10-03.png){: width="100%"}
+
+- **CORE TECHS**
+
+    ![03](/_post_refer_img/TextAnalytics/10-02.png){: width="100%"}
+
+    - **Token Embedding** : 입력 문장 내 단어들 각각의 정보를 표현하는 벡터를 생성함
+    - **Positional Encoding** : 입력 문장 내 단어들의 위치 정보를 표현하는 벡터를 생성함
+    - **Multi-Head Self Attention @ Encoder** : 인코더에서 입력 문장 내 단어들 간 관계를 학습하여 각 단어가 문장에서 어떤 역할을 하는지를 반영하는 문맥 벡터를 생성함
+    - **Multi-Head Masked Self Attention @ Decoder** : 디코더에서 출력 문장 내 단어들 간 관계를 학습하여 각 단어의 문맥 벡터를 생성하되, 이전 순번까지의 단어만 참고하도록 마스킹하여 다음 순번에 관한 정보가 유출되는 것을 방지함
+    - **Multi-Head Cross Attention @ Decoder** : 출력 문장의 각 단어가 인코더의 출력들과 맺는 관계를 학습하여 출력 문장 생성 시 개별 순번마다 입력 문장에서 어떤 부분이 중요한지 반영하는 문맥 벡터를 생성함
 
 ## Positional Encoding
 -----
 
-![01](/_post_refer_img/TextAnalytics/10-01.png){: width="100%"}
\ No newline at end of file
+### Condition
+
+- **주기성(Periodicity)** : 벡터는 **단어 간 상대적 위치를** 표현할 수 있어야 함
+    - 단어의 절대적 위치가 아니라 단어 간 상대적 위치에 따른 관계 패턴이 문장의 의미를 결정함
+    - 어텐션 메커니즘은 각 단어가 서로를 참조하는 방식으로 정보를 처리함
+    - 주기성을 띠는 함수가 관계 패턴을 결정짓는 상대적 위치를 표현하기에 효율적임
+
+- **연속성(Continuity)** : 벡터는 연속적인 값을 가져야 함
+    - 임의의 두 단어 순번 간 거리가 일정하다면, **벡터 간 거리도 일정해야 함**
+    - 임의의 두 단어 순번 간 위치가 비슷하다면, **벡터 값도 유사해야 함**
+    - 불연속적인 값이 존재하면 모형이 상대적 위치에 따른 관계 패턴을 학습하기 어려움
+
+- **모형 독립성(Model Independence)** : 벡터는 모형 설계 방식과 상관없이 **일정한 방식으로** 위치 정보를 제공해야 함
+
+- **시퀀스 스케일 불변성(Sequence Scale Invariance)** : 벡터는 시퀀스 길이에 상관없이 **일정한 정보 해상도를** 가져야 함
+
+- **단어 임베딩과의 균형(Balance with Word Embedding)** : 벡터의 원소값은 단어의 의미 정보와 위치 정보가 **균형을 이룰 수 있는 범위 내에 존재해야 함**
+    - 원소값이 너무 크면 단어의 의미가 왜곡될 수 있음
+    - 원소값이 너무 작으면 위치 정보가 무시될 수 있음
+
+### Positional Encoding
+
+- **FUNCTION**
+
+    ![01](/_post_refer_img/TextAnalytics/10-01.png){: width="100%"}
+
+    $$\begin{aligned}
+    PE(POS,2i)&=\sin{\frac{POS}{10000^{2i/d}}}\\
+    PE(POS,2i+1)&=\cos{\frac{POS}{10000^{2i/d}}}
+    \end{aligned}$$
+
+    - $POS$ : 문장 내 단어 순번
+    - $d$ : 단어 임베딩 벡터 차원
+    - $i=0,1,\cdots,\displaystyle\frac{d}{2}-1$ : 포지셔널 인코딩 벡터의 차원 인덱스
+
+- **PERIODICITY** : 포지셔널 인코딩 벡터는 위치 간 관계(혹은 위치의 변화)를 부드러운(연속적인) 회전 변환 형태로 표현함
+
+    ![01](/_post_refer_img/TextAnalytics/10-05.png){: width="100%"}
+
+    $$\begin{aligned}
+    \begin{pmatrix}\sin{\frac{POS+K}{10000^{2i/d}}} \\ \cos{\frac{POS+K}{10000^{2i/d}}}\end{pmatrix}
+    = \begin{pmatrix}\cos{\frac{K}{10000^{2i/d}}} & \sin{\frac{K}{10000^{2i/d}}} \\ -\sin{\frac{K}{10000^{2i/d}}} & \cos{\frac{K}{10000^{2i/d}}}\end{pmatrix}
+    \cdot \begin{pmatrix}\sin{\frac{POS}{10000^{2i/d}}} \\ \cos{\frac{POS}{10000^{2i/d}}}\end{pmatrix}
+    \end{aligned}$$
+
+    - $$\displaystyle\begin{pmatrix}\cos{\frac{K}{10000^{2i/d}}} & \sin{\frac{K}{10000^{2i/d}}} \\ -\sin{\frac{K}{10000^{2i/d}}} & \cos{\frac{K}{10000^{2i/d}}}\end{pmatrix}$$ : $2 \times 2$ Rotation Matrix
+    - 즉, 임베딩 차원 $d=2$ 일 때, 위치가 $K$ 만큼 이동하게 되면 벡터 공간 상에서 특정한 크기 $$\displaystyle\frac{K}{10000^{2i/d}}$$ 만큼의 회전 변환이 이루어짐
+
+## Single Layers
+-----
+
+![04](/_post_refer_img/TextAnalytics/10-04.png){: width="100%"}
+
+### Encoder Layer
+
+$$\begin{aligned}
+\mathbf{X}^{(0)}
+&=\text{Token-Embedding}\left(\text{Tokens}\right) + \text{Positional-Encoding}\left(\text{Tokens}\right)\\
+\mathbf{H}^{(k)}
+&=\text{Layer-Norm}\Big[\text{Multi-Head}\left(\mathbf{X}^{(k)}\right) + \mathbf{X}^{(k)}\Big]\\
+\mathbf{Y}^{(k)}
+&=\text{Layer-Norm}\Big[\text{FFN}\left(\mathbf{H}^{(k)}\right) + \mathbf{H}^{(k)}\Big]
+\end{aligned}$$
+
+- $\mathbf{X}$ is Input Data of Single Layer, $\mathbf{Y}$ is Output Data of Single Layer
+    - $$\mathbf{X}^{(k+1)}=\mathbf{Y}^{(k)}$$ : Input Data of $k+1$ Encoder Layer is Output Data of $k$
+    - Input Data of Initial Layer $$\mathbf{X}^{(0)}$$ is Sum of Token Embedding & Positional Encoding Vector
+    - Output Data of Final Layer $$\mathbf{Z}=\mathbf{Y}^{(K)}$$ is Output of Encoder Module
+
+- $$\text{Multi-Head}\left(\mathbf{X}^{(k)}\right)$$ : Multi-Head Self Attention @ Encoder
+
+- $$\text{FFN}\left(\mathbf{H}^{(k)}\right)$$ : `F`eed-`F`orward `N`etworks @ Encoder
+
+    $$\begin{aligned}
+    \text{FFN}\left(\mathbf{H}^{(k)}\right)
+    &=\mathbf{W}^{(k)}_{2} \cdot \left(\text{ReLU}\left[\mathbf{W}^{(k)}_{1} \cdot \mathbf{H}^{(k)} + \overrightarrow{\mathbf{b}}^{(k)}_{1}\right]\right) + \overrightarrow{\mathbf{b}}^{(k)}_{2}
+    \end{aligned}$$
+
+    - $\mathbf{W}^{(k)}_{1} \in \mathbb{R}^{M \times 4d}$ : **Dimension Expansion** to four times the Dimension of the Embedding Vector
+    - $\mathbf{W}^{(k)}_{2} \in \mathbb{R}^{M \times d}$ : **Dimension Reduction** to Embedding Vector Dimension
+
+### Decoder Layer
+
+$$\begin{aligned}
+\mathcal{X}^{(0)}
+&=\text{Token-Embedding}\left(\text{Tokens}\right) + \text{Positional-Encoding}\left(\text{Tokens}\right)\\
+\mathcal{H}^{(k)}_{1}
+&=\text{Layer-Norm}\Big[\text{Multi-Head}\left(\mathcal{X}^{(k)};\mathcal{M}\right) + \mathcal{X}^{(k)}\Big]\\
+\mathcal{H}^{(k)}_{2}
+&=\text{Layer-Norm}\Big[\text{Multi-Head}\left(\mathcal{H}^{(k)}_{1},\mathbf{Z},\mathbf{Z}\right) + \mathcal{H}^{(k)}_{1}\Big]\\
+\mathcal{Y}^{(k)}
+&=\text{Layer-Norm}\Big[\text{FFN}\left(\mathcal{H}^{(k)}_{2}\right) + \mathcal{H}^{(k)}_{2}\Big]
+\end{aligned}$$
+
+- $\mathcal{X}$ is Input Data of Single Layer, $\mathcal{Y}$ is Output Data of Single Layer
+    - $$\mathcal{X}^{(k+1)}=\mathcal{Y}^{(k)}$$ : Input Data of $k+1$ Encoder Layer is Output Data of $k$
+    - Input Data of Initial Layer $$\mathcal{X}^{(0)}$$ is Sum of Token Embedding & Positional Encoding Vector
+    - Output Data of Final Layer $$\mathcal{Z}=\mathcal{Y}^{(K)}$$ is Output of Decoder Module
+
+- $$\text{Multi-Head}\left(\mathcal{X}^{(k)};\mathcal{M}\right)$$ : Multi-Head Masked Self Attention @ Decoder
+
+    - $\mathcal{M}$ : Causal Mask(Upper-triangular mask)
+
+- $$\text{Multi-Head}\left(\mathcal{H}^{(k)}_{1},\mathbf{Z},\mathbf{Z}\right)$$ : Multi-Head Cross Self Attention @ Decoder
+
+	- $$\mathbf{Z}$$ : Output of Encoder Module
+
+- $$\text{FFN}\left(\mathcal{H}^{(k)}_{2}\right)$$ : `F`eed-`F`orward `N`etworks @ Decoder
+
+    $$\begin{aligned}
+    \text{FFN}\left(\mathcal{H}^{(k)}_{2}\right)
+    &=\mathbf{W}^{(k)}_{2} \cdot \left(\text{ReLU}\left[\mathbf{W}^{(k)}_{1} \cdot \mathcal{H}^{(k)}_{2} + \overrightarrow{\mathbf{b}}^{(k)}_{1}\right]\right) + \overrightarrow{\mathbf{b}}^{(k)}_{2}
+    \end{aligned}$$
+
+    - $\mathbf{W}^{(k)}_{1} \in \mathbb{R}^{M \times 4d}$ : **Dimension Expansion** to four times the Dimension of the Embedding Vector
+    - $\mathbf{W}^{(k)}_{2} \in \mathbb{R}^{M \times d}$ : **Dimension Reduction** to Embedding Vector Dimension
+
+-----
+
+### Reference
+
+- https://zeuskwon-ds.tistory.com/88
+- https://bongholee.com/transformer-yoyag-jeongri-2/
+- https://wikidocs.net/162096
\ No newline at end of file