pre-processing of text #890 issue #990

ashishgit7 · 2018-12-15T07:23:26Z

No description provided.

ashishgit7 · 2018-12-15T07:34:39Z

#890 added pre-processing of text

ad71

There are some things in your PR which are not exactly in line with what aima-python aims to do

ad71 · 2018-12-20T16:06:10Z

nlp_apps.ipynb

-    "wordseq = words(federalist)\n",
-    "wordseq = wordseq[114:-3098]"
+    "wordseqs = words(federalist)\n",
+    "wordseqs = wordseqs[114:-3098]"


Was it necessary to change the name of the variable?

I haven't change the name if variable actually I have created a new variable

Wasn't wordseq already present in the repository?
Anyway, its fine if it makes things simpler.

ad71 · 2018-12-20T16:09:52Z

nlp_apps.ipynb

+   "outputs": [],
+   "source": [
+    "#removing stopwords\n",
+    "from nltk.corpus import stopwords\n",


We want to try to minimize the use of third-party libraries. The point of the nlp module is to have basic implementations of standard functions used in the domain. Importing from nltk is the opposite of what we want to do.

Alright , I'll try to create new function in place of nltk library in my next contribution to minimize third-party library

ad71 · 2018-12-20T16:13:40Z

nlp_apps.ipynb

+   "source": [
+    "#stemming and lemmatization\n",
+    "from nltk.stem.wordnet import WordNetLemmatizer\n",
+    "lmtzr = WordNetLemmatizer()\n",


Again, we shouldn't use lemmatizers from third parties. Instead, we could have a lemmatizer within the repository, however basic it may be. The point of this repository is to be able to explain the underlying concepts of these algorithms, not directly import from other modules.

@ad71 can we use this file for lemmatization.

@sagar-sehgal We can, but make sure you read their license first. We might have to cite/acknowledge them. If the license allows, I think we can save a copy of the file in aima-data and carry on from there.

Okay. I'll try to do that. Thank You!

ad71 · 2018-12-20T16:15:02Z

nlp_apps.ipynb

     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
-    "' '.join(wordseq[:100])"
+    "' '.join(wordseqs[:100])"


This was fine already

ad71 · 2018-12-20T16:15:17Z

nlp_apps.ipynb

   "metadata": {
    "collapsed": true
   },
   "outputs": [],
   "source": [
-    "wordseq = [w for w in wordseq if w != 'publius']"
+    "wordseqs = [w for w in wordseqs if w != 'publius']"


And so was this.

ad71 · 2018-12-20T16:15:41Z

nlp_apps.ipynb

@@ -531,7 +559,7 @@
       "(4, 16, 52)"
      ]
     },
-     "execution_count": 6,
+     "execution_count": 41,


A slightly picky complaint, but you can rerun a notebook to serialize the execution counts.

aimacode#928 issue

794c425

ashishgit7 changed the title ~~pre-processing of text #928 issue~~ pre-processing of text #890 issue Dec 15, 2018

ad71 reviewed Dec 20, 2018

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pre-processing of text #890 issue #990

pre-processing of text #890 issue #990

ashishgit7 commented Dec 15, 2018

ashishgit7 commented Dec 15, 2018

ad71 left a comment

ad71 Dec 20, 2018

ashishgit7 Dec 20, 2018

ad71 Dec 20, 2018

ad71 Dec 20, 2018

ashishgit7 Dec 20, 2018

ad71 Dec 20, 2018

ad71 Dec 20, 2018

thesagarsehgal Dec 20, 2018

ad71 Dec 20, 2018

thesagarsehgal Dec 20, 2018

ad71 Dec 20, 2018

ad71 Dec 20, 2018

ad71 Dec 20, 2018

pre-processing of text #890 issue #990

Are you sure you want to change the base?

pre-processing of text #890 issue #990

Conversation

ashishgit7 commented Dec 15, 2018

ashishgit7 commented Dec 15, 2018

ad71 left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment