UPdate links for embeddings

idvorkin · Feb 29, 2024 · 205c19f · 205c19f
1 parent 44a2e24
commit 205c19f
Showing 1 changed file with 22 additions and 21 deletions.
diff --git a/_td/nlp.md b/_td/nlp.md
@@ -5,18 +5,18 @@ title: Sentiment Analysis and NLP
 permalink: /nlp
 ---
 
-My explorations of NLP, mostly using my corpus of journal entries and other writing. My intent is two fold 1) learning about NLP and sentiment analysis 2) finding latent meaning in my writing, ideally to help me better understand my own psychological processes. I've had much more success with the former then the latter
+My explorations of NLP, mostly using my corpus of journal entries and other writing. My intent is twofold: 1) learning about NLP and sentiment analysis 2) finding latent meaning in my writing, ideally to help me better understand my own psychological processes. I've had much more success with the former than the latter.
 
-My Journal Entries are private, but you can see my [code here](https://github.com/idvorkin/nlp)(though it is tightly coupled to my corpus and format).
+My Journal Entries are private, but you can see my [code here](https://github.com/idvorkin/nlp) (though it is tightly coupled to my corpus and format).
 
-To beter underst all this check out this course: <https://www.nlpdemystified.org/course/seq2seq-and-attention>
+To better understand all this, check out this course: <https://www.nlpdemystified.org/course/seq2seq-and-attention>
 
 ## Success Criteria
 
 - Deeper understanding of what I care about.
-- Played with different cloud stacks for NLP
-- Do simple work frequency analysis
-- Played with word embedding
+- Played with different cloud stacks for NLP.
+- Do simple word frequency analysis.
+- Played with word embedding.
 - Topic/mood over time visualization.
 
 <!-- prettier-ignore-start -->
@@ -31,8 +31,8 @@ To beter underst all this check out this course: <https://www.nlpdemystified.org
     - [Focus on Verbs/Nouns](#focus-on-verbsnouns)
 - [Sentiment Analysis From Cloud Vendors](#sentiment-analysis-from-cloud-vendors)
     - [GPT3](#gpt3)
-    - [Call Google NLP from C sharp](#call-google-nlp-from-c-sharp)
-    - [Put google NLP output into pandas](#put-google-nlp-output-into-pandas)
+    - [Call Google NLP from C](#call-google-nlp-from-c)
+    - [Put Google NLP output into Pandas](#put-google-nlp-output-into-pandas)
     - [Evaluate NLP Solutions from various vendors](#evaluate-nlp-solutions-from-various-vendors)
     - [Google NLP](#google-nlp)
     - [AWS Comprehend](#aws-comprehend)
@@ -48,27 +48,27 @@ To beter underst all this check out this course: <https://www.nlpdemystified.org
     - [Bag of words](#bag-of-words)
     - [Lemmatization vs Stemming](#lemmatization-vs-stemming)
     - [TF/IDF](#tfidf)
-    - [Word Embedding - Word2Vec (GPT3 is better)](#word-embedding---word2vec-gpt3-is-better)
+    - [Word Embedding, and semantic search, (useful for RAG)](#word-embedding-and-semantic-search-useful-for-rag)
     - [Bag of words analysis](#bag-of-words-analysis)
 
 <!-- vim-markdown-toc -->
 <!-- prettier-ignore-end -->
 
 ## Abstract
 
-At my best, I do daily stream of consciousness journalling . These journal entries should have insights I can apply to my life. I'm too lazy to read the journal entries, but it's a great corpus to see how I can use NLP and ML tech on myself.
+At my best, I do daily stream of consciousness journaling. These journal entries should have insights I can apply to my life. I'm too lazy to read the journal entries, but it's a great corpus to see how I can use NLP and ML tech on myself.
 
 This gives me an excuse to use NLP services, and play with them.
 
 ## Corpus
 
-I used a service called 750words to do daily stream of consciousness Journaling. I have several years of data in this format, and I want to convert to the format I'm using now. This will be a good corpus for analysis.
+I used a service called 750words to do daily stream of consciousness journaling. I have several years of data in this format, and I want to convert it to the format I'm using now. This will be a good corpus for analysis.
 
 ### Convert exported 750words data to per day files
 
 ### Remove custom stop words
 
-My data set has stop words
+My dataset has stop words
 
 TBD
 
@@ -80,21 +80,21 @@ Relative frequency distribution of proper nouns gives a good understanding of wh
 
 ### Focus on Verbs/Nouns
 
-Haven't had a lot of luck here, my writing isn't very verb focused. Perhaps this needs some type of boosting from TF/IDF
+Haven't had a lot of luck here, my writing isn't very verb-focused. Perhaps this needs some type of boosting from TF/IDF
 
 ## Sentiment Analysis From Cloud Vendors
 
 ### GPT3
 
-Once GPT 3 came along the other cloud services became irrelevant.
+Once GPT-3 came along, the other cloud services became irrelevant.
 
-{%include summarize-page.html src="/gpt" %}
+{% include summarize-page.html src="/gpt" %}
 
-### Call Google NLP from C sharp
+### Call Google NLP from C#
 
 See: <https://github.com/idvorkin/play-google-nlp>
 
-### Put google NLP output into pandas
+### Put Google NLP output into Pandas
 
 ### Evaluate NLP Solutions from various vendors
 
@@ -118,7 +118,7 @@ Limited, only supports extraction of key topics, without the different spots of
 
 ### I can graph my mood over time
 
-Sentiment analysis only works on sentences. For a first order approximation take the median sentence sentiment score for a document. E.g. Median([sentiment(s) for s in sentences]). Graph sentiment by day measured.
+Sentiment analysis only works on sentences. For a first-order approximation take the median sentence sentiment score for a document. E.g. Median([sentiment(s) for s in sentences]). Graph sentiment by day measured.
 
 TBD:
 
@@ -140,12 +140,13 @@ I can score the sentiment of a document as the median(f
 
 Multiply term frequency by inverse document frequency, this boosts words that only appear in this text as they are likely more important (or they're typos).
 
-### Word Embedding - Word2Vec (GPT3 is better)
+### Word Embedding, and semantic search, (useful for RAG)
 
 Put words into a semantic vector space using unsupervised learning, so words with similar meanings are closer in vector space distance. For example (king - queen) ~ (boy - girl) and toaster is far away from red.
 
-Very good articulation of [word embedding](https://jalammar.github.io/illustrated-word2vec/)
+- A more technical description of [word embedding](https://jalammar.github.io/illustrated-word2vec/)
+- A very layman approach to [word embeddings](https://investigate.ai/text-analysis/word-embeddings/)
 
 ### Bag of words analysis
 
-Ignore properties of text other then word frequencies.
+Ignore properties of text other than word frequencies.