-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Future Work on the NLP Apps Notebook #890
Comments
Hello @MrDupin! I have worked in NLP problems and I can give you some help here. :) |
That's great! You can work on them either on GSoC or on your own. Whenever you want to get started, don't forget to post a comment here explaining in short what you want to do. |
@MrDupin With reference to point 3,I would like to include support for both N-gram and Bag of Words models.I am mentioning the same in my proposal. |
@ashwinnalwade Awesome! Good luck with your application! |
Hello, Reference to the 2nd point . We can do some preprocessing task like ranking words/sentence in paper with reference to frequency of word/sentence occurence and uniqueness of each word by considering ranking factor like the length of word/sentence. |
Hello, I would like to work on the first point on implementing log addition if no one is doing it yet. |
Sure @dsaw you can get started, nobody is working on this yet. |
@MrDupin I did some work related to language identification during my GSoC (the year before yours). This follows from an example in the book. If anyone is keen on taking this up - here is the notebook: https://github.com/reachtarunhere/aima-python/blob/lang-id/lang-id.ipynb If not, I would be happy to clean it up and get it merged myself. |
@reachtarunhere I only skimmed through it, but it looks good. Maybe the current GSoC students can take it up? |
@MrDupin I would have loved to take it up but I have no knowledge of NLP. I'll definitely come back to this once I get acquainted with the topic. |
@ad71 or anyone else taking this up - feel free to get in touch if you need any help/clarifications. In case I miss this here (lots of notifs) you can always drop an email :) |
* test case for zebra problem * Revert "Merge remote-tracking branch 'upstream/master'" This reverts commit 5ceab1a, reversing changes made to 34997e4. * discarded HEAD changes for merge * added ensemble_learner jpeg * updated travis and search * Added logarithmic learner in nlp_apps (#890) * Remove zebra tests * revised Bayes learner explanation * added missing SimpleReflexAgent from upstream
hi @MrDupin I have some experience in nlp would like to solve the issue if this issue is still open |
@llucifer97: You can definitely work on the issue! Simply fork the repository, make your changes to the notebook, and submit a Pull Request. If you need any help, feel free to ask (although I am swamped with work at the moment so I am not sure I will be able to provide much help). |
Hello @MrDupin. Would I be right in assuming that this isn't an issue that someone completely new to opensource and AI-related topics can handle? I am well versed with Python but have never dabbled with this sort of thing. Apologies if this is a very trivial question. |
@aditya-hari: Hmmm, I'm not sure... I don't think any past experience with open-source is necessary, but I believe some understanding of the AI concepts covered in this notebook is a must. If you want you can take it on, but it seems to me that it will be difficult to make much progress efficiently, mainly because not only do you have to execute the concepts, but also showcase them in a manner in which others can understand them. It is up to you, but I first suggest you read up on NLP and some basic AI material in order to be better equipped to tackle this challenge. |
Is encouraging to know that past experience isn't necessary. Could you kindly guide me to some resources which can help me get up to speed? Or point me to some other issues which would be easier to start off with? Would really appreciate any help. |
Hi @aditya-hari, sorry for taking so long to get back to you. The way this repository works no longer involves issues (there are exceptions though). The main work remaining is to simply add to the notebooks. For NLP, I am afraid there's not much I can say. Personally, I was just looking around the internet for information, I was never taught anything in a class. I suggest you snoop around and try to find a university course or something that is open to the public. Then you can follow its structure, complementing your studying with googling. |
Hi, @MrDupin |
@vaibhavshukla182: Hi! Since most of the algorithms in this project have been implemented, I suggest you turn to providing examples for the notebooks. Maybe some text analysis, or a tutorial on NLP techniques, or stuff of this nature. Remember that this project is educational, so the more examples we have the better. Just don't forget to include some instructions on how your examples work. Also, I believe you can implement algorithms outside the AIMA book as well. But only if they have short implementations and easy testing under |
Can I add some data preprocessing and data visualisation techniques? |
@vaibhavshukla182: This sounds great! For the nlp stuff though, don't use any external libraries. We want to keep all the code right in front of the student, with no middle-man. Feel free to use visualization libraries (preferably |
@MrDupin I too want to work on the visualisation technique. I was thinking of implementing t-SNE for dimensionality reduction. I know this will not bring in great results for a big dataset. Also i want to add tf-idf and word2vec in addition to BOW and n grams. I can also work on with different algos on the same dataset like KNN and time split(temporal) data KNN and so on. |
Also i think i might have to add some additional dataset on which i can do some super wised classification and do a polarity based t-sne visualisation as well as for KNN. Please see this for dataset recommendation: |
@cursed4ever I think for now TF.IDF would be a nice addition, if you want to get started with it. |
@hackerashish25 What external library are you planning on using? Quite a while ago I wrote an implementation for tf-idf, and it did not require any libraries, if I remember correctly. |
Hello @MrDupin, I want to contribute to aima-python. It seems most of the algorithms have already been implemented. Can I propose and work on some AI optimization algorithms? ( If adding new algorithms outside AIMA book is valid) I was suggesting to implement algorithms like Stochastic Gradient Descent, Particle Swarm Optimization, etc. I would also like to work on other notebook completions if the above algorithms implementation is not necessary for the project. Thank You! |
@Ask149: I am not familiar with Particle Swarm, but SGD is something that can be implemented here, sure! |
Hey @MrDupin, |
Hello @MrDupin , Can I take on "sentiment analysis" for the apps notebook if no one is working on it? Also I would like to implement customisable vectorisers as utility functions that can come handy in many situations. |
Hello @MrDupin, I am interested in starting with sentiment analysis and would like to contribute to the NLP Apps notebook. Any pointers you could give me on project selection, as I want to do this for my GSoC project. |
Hello all (@JayantSravan, @ShaswatLenka, @aasthasood). Thank you for your interest in this project. For sentiment analysis, you can pick a project and work on it. A popular one is the movie reviews dataset. I would prefer if you picked something else (you can research "sentiment analysis datasets" on Google). Pick a relatively small dataset and work on it. Remember though that you cannot use any external libraries for the training, you have to do it yourselves to showcase the algorithm. |
Hi @MrDupin, can you repost what else is left to be done i see log,visualization,preprocessing is already completed, can i go with sentiment analysis ? |
@rushic24: Sure, you can do solve some sentiment analysis problems. Sounds good! |
Hi @MrDupin , has anyone taken up the sentiment analysis example? If not, I can contribute since I have prior work on IMDB dataset, which I can probably re-use. Also, I can take up the explanation of the 'Question Answering' section in nlp.ipynb? Thanks. |
@MrDupin i had written twitter sentiment analysis few weeks back, can that be put here ? |
@Kaustav97: Sure, you can do Sentiment Analysis and/or Question Answering. Just remember to only use as basic Python as possible to explain the algorithms. @rushic24: That sounds great as well. |
@MrDupin I was planning to add Text Classification, using Naive Bayes Classification into different classes like if "apple" in a statement refers to |
@sagar-sehgal: You can add Soundex, if the implementation is simple enough (I have not heard of the algorithm). Also, you can work on classification, even though it would be best if you used datasets already in aima-data. |
@MrDupin Soundex in a phonetic coding algorithm which gives a code of a specific length to every word as pronounced in English. It helps in searching through the documents even if we have made a spelling mistake (because of the difference in the way of pronounciation) since the Soundex code would still remain the same. For eg:- "Robert" and "Rupert" both have Soundex code as "R163". For Classifiaction, instead of using a dataset I was planning to define 2-3 sentences of each class in the notebook itself. Thank You! |
@sagar-sehgal: Soundex sounds good, although it all depends on the implementation. If it is short and there is a nice description to go with it, it's all fine. For the Classification task, if defining 2-3 sentences for each class in the notebook works, then it is great! |
@MrDupin As I had mentioned earlier, I was working on Text Classification. Since Text Classification is a broad topic and many other topics Sentiment Analysis, Topic Labelling, etc. come under it. So I planned to go with Topic Labelling on whether apple word in a sentence refers to Here, in this case, we would just be having the same application, but trained on different data. So, we could achieve similar results, even without any coding. We would just have to change the data in the dataset directory and then run the application. So, after all this should I still push the PR of Text Classification? |
You can still push your work on Topic Labeling, as long as you explain what is going on. The solution may be pretty much the same, but the application is different, so having an example on it won't hurt. Reusing the same code to solve a different problem is even better, showcasing that there is an underlying structure to a lot of NLP problems. |
@MrDupin I wrongly referred the concept as of Topic Labelling. Classifying the word apple as referring to the |
The PR looks fine, thanks! I have made some comments. |
In the language recognition problem, there might be a solution for very long words to be segmented such as lebensversicherungsgesellschaftsangestellter to better perform on the language classifier. This word for example is in german, so if you use word segmentation, it might help perform better in the language recognition specially that the two classes contain german language which already has very long words |
can we use some more proper classifier to identify the author? |
@gamechanger98 Naive Bayes is a Machine Learning Classifier and it has been coded without using any other external library. Other classifiers do exist in NLP but they use Deep Learning Techniques and need external libraries. But here all the code has to be coded without using an external library. If you code up a better Machine Learning Classifier, then it can also be included. |
@memogamd21 Yes, that seems to be a nice idea. It can increase performance. You can code that up and if it increases the accuracy then it would be really nice and can be added. |
@sagar-sehgal well, we can use the maximum matching algorithm in which we can compare the word we have with the longest ever word in our two corpora (german and english one) and if it matches one of them, the language can be classified easily. I mean like we can make the long words list as a list of tuples (word, language it belongs to) and then from that detect the lanuage :) |
* Probability: Notebook + gibbs_ask test (aimacode#653) * probability notebook * Update test_probability.py * Update README.md * add gsoc write-ups (aimacode#654) * Fix typo in docstring (aimacode#660) * Fixed Key Error bug in CustomMDP (aimacode#658) * fixed Key Error bug in CustomMDP and reordered the return values of T function to be consistent with GridMDP * removing idea files from git * fix typos & grammar (aimacode#656) * Update README.md (aimacode#669) * '>' to '>=' (aimacode#668) * fix typo for issue#664 (aimacode#665) * Adding Tkinter GUI (aimacode#661) * tic-tac-toe gui added * Added GUI for Searching * Added Legend and Minor Fix * Minor Fix and Options added * Added Breadth-First Tree Search * Added Depth-First Tree Search * Minor Fix * Added Depth-First Graph Search * Fixed typo (aimacode#675) Closes aimacode#673 * Adding Tkinter GUI (2) and Visualization in notebook (aimacode#670) * tic-tac-toe gui added * Added GUI for Searching * Added Legend and Minor Fix * Minor Fix and Options added * Added Breadth-First Tree Search * Added Depth-First Tree Search * Minor Fix * Added Depth-First Graph Search * Minor Fix * Breadth-First Search and Minor Fix * Added Depth-First Graph Search in notebook * Added Depth-First Tree Search in notebook * Cell Placement * Adding Tkinter GUI (aimacode#677) * tic-tac-toe gui added * Added GUI for Searching * Added Legend and Minor Fix * Minor Fix and Options added * Added Breadth-First Tree Search * Added Depth-First Tree Search * Minor Fix * Added Depth-First Graph Search * Minor Fix * Breadth-First Search and Minor Fix * Added Depth-First Graph Search in notebook * Added Depth-First Tree Search in notebook * Cell Placement * Added Uniform Cost Search in GUI * update genetic algorithm (aimacode#676) * Fix small typo in documentation (aimacode#681) * Update search.py (aimacode#680) * Added A*-Search in GUI (aimacode#679) * add Astar heuristics (aimacode#685) * Add simulated annealing visualisation through TSP (aimacode#694) Partially solves aimacode#687 * Added Decision Tree Learner example to learning.ipynb. (aimacode#686) * added example for Decision Tree Learner * fixed docstring in learning.py and description in learning.ipynb * Adding Tkinter GUI (aimacode#693) * Added Vacuum Agent * Minor Fix * Improved Font * Added XYVacuumEnv * Minor Fix * Review changes * Update PeakFindingProblem code to allow diagonal motion (aimacode#684) * Update PeakFindingProblem code to allow diagonal motion * Fix unit test issues * update PeakFindingProblem to take actions as input param * Refactor code in search.py * Visualisation of TSP. (aimacode#699) Add features like selecting cities to be part of tsp, controlling temperature and speed of animation. * Add reset button for XYEnv (aimacode#698) * Solve Issue of Loading search.ipynb (aimacode#689) * rebase with master * solve error * solve error * solve error * Update search.ipynb * Fixed issue aimacode#700 (aimacode#701) * Explanation of genetic algorithm functions with an example. Fixed aimacode#696 (aimacode#702) * Added explanation of Genetic Algorithm functions using an example * Added GUI version of genetic algorithm example (phrase generation problem) * added Best First search in search.ipynb (aimacode#708) * added Best First search * fixed minor conflicts * minor changes * Adding algorithm selection menu for TSP (aimacode#706) * Added dropdown option to solve using genetic algorithm * Added option to solve using Hill Climbing * Added messagebox to confirm exit * added function to implement uniform crossover (aimacode#704) * Updated move dictionary (aimacode#715) * improved search.ipynb (aimacode#716) * added submodule * removed duplicates * minor changes * Update README.md * removed an unwanted commit * Added GridMDP editor to create and solve grid-world problems (Closes aimacode#713) (aimacode#719) * Added GridMDP editor to create and solve grid-world problems * Added reference to grid_mdp.py in mdp.ipynb * Replacing %psource with psource function * Print matrix to console as well * modify AC3 algorithm (aimacode#717) * added submodule * fixed ac3 in csp.py * added a test to verify the modified ac3 algorithm in csp.py * Update .gitmodules * Add vacuum_world.ipynb (aimacode#721) * Added vacuum_world.ipynb * Add psource for environment * Add explanation for Simple Problem Solving Agent (aimacode#724) * Add SimpleProblemSolvingAgent * Fix typo in search.py * Added more tests for mdp.py (aimacode#722) * Update vacuum_world.ipynb (aimacode#725) * Adaboost example (aimacode#739) * added overview for AdaBoost * added implementation for AdaBoost * added example for AdaBoost * added tests for AdaBoost * rephrased sentences * final changes to AdaBoost * changed adaboost tests to use grade_learner * grammar check * Enhanced mdp notebook (aimacode#743) * Added Policy Iteration section * Removed ambiguous test * Capitalized header * Added images * Added section for sequential decision problems * Update search.ipynb (aimacode#726) * Fix EightPuzzle class implementation in search.py (aimacode#710) (aimacode#733) * Fix EightPuzzle class implementation * Fix EightPuzzle class implementation (aimacode#710) * Address style issues (aimacode#710) * Modified table for TableDrivenVacuumAgent (aimacode#738) * Enhanced explanation of value iteration (aimacode#736) * Enhanced explanation of value iteration * Fixed minor typo * Minor enhancement to grid_mdp editor (aimacode#734) * Fixed reset function to reset placeholder variables as well * Added functionality to display best policy * Fix aimacode#731: Add table and tests for TableDrivenVacuumAgent (aimacode#732) * Add test for TableDrivenVacuumAgent * Debug Travis * Minor fix * Fixed table for TableDrivenAgent * Update README * Update text.py (aimacode#740) * Fix aimacode#741: Add learning agent to vacuum_world.ipynb (aimacode#742) * Modified table for TableDrivenVacuumAgent * Add learing agent * Add image for learning agent * Learning: Neural Net Test + Minor Styling Fix (aimacode#746) * Update learning.py * Update test_learning.py * Fix various typos. (aimacode#750) * Added tests for information_content (aimacode#753) * Added tests for information_content Added some tests for information_content function from learning.py * Added test for information_content * Added Node in search.ipynb (aimacode#761) * fixing build * Update README.md (aimacode#767) * Minor update in search.ipynb (aimacode#763) * Update nlp_apps.ipynb (aimacode#764) * Updated README.md (aimacode#771) * Update README.md * Update SUBMODULE.md * added Done tag for adaboost (aimacode#774) * csp.ipynb: removed some typos (aimacode#769) * Update README.md * Add test for table_driven_agent_program and Random_agent_program (aimacode#770) * Add test for table driven agent * Some style fixes * Added done to tabledrivenagent test in readme * Added randomAgentProgram test to test_agents.py * Added Import randomAgentProgram * Style fixes * Added the done tag tp tabledrivenagent test * Update README.md (aimacode#773) * Fixed typos and added inline LaTeX to mdp.ipynb (aimacode#776) * Fixed typos and added inline LaTeX * Fixed more backslashes * Added mdp_apps notebook (aimacode#778) * Added mdp_apps notebook * Added images * LaTeX formatting errors fixed * Minor fix in typo (aimacode#779) * Minor typos (aimacode#780) * Added TableDrivenAgentProgram tests (aimacode#777) * Add tests for TableDrivenAgentProgram * Add tests for TableDrivenAgentProgram * Check environment status at every step * Check environment status at every step of TableDrivenAgentProgram * Enhanced mdp_apps notebook (aimacode#782) * Added pathfinding example * Added images * Backgammon implementation (aimacode#783) * Create model classes for backgammon * Add game functions to model * Implement expectiminimax function * Correct logic in some functions * Correct expectiminimax logic * Refactor code and add docstrings * Remove print statements * Added section on Hill Climbing (aimacode#787) * Added section on Hill Climbing * Added images * Updated README.md * Added test for simpleProblemSolvingAgentProgram (aimacode#784) * Added test for simpleProblemSolvingAgent * Some Style fixes * Fixed update_state in test_search.py * Fix MDP class and add POMDP subclass and notebook (aimacode#781) * Fixed typos and added inline LaTeX * Fixed backslash for inline LaTeX * Fixed more backslashes * generalised MDP class and created POMDP notebook * Fixed consistency issues with base MDP class * Small fix on CustomMDP * Set default args to pass tests * Added TableDrivenAgentProgram tests (aimacode#777) * Add tests for TableDrivenAgentProgram * Add tests for TableDrivenAgentProgram * Check environment status at every step * Check environment status at every step of TableDrivenAgentProgram * Fixing tests * fixed test_rl * removed redundant code, fixed a comment * Ignoring .DS_Store for macOS (aimacode#788) * Removed a repeating cell (aimacode#789) * Updated index (aimacode#790) * Updated README.md (aimacode#794) * Replace Point class with dict (aimacode#798) * Add to rl module (aimacode#799) * Ignoring .DS_Store for macOS * Added Direct Utility Estimation code and fixed notebook * Added implementation to README.md * Added tt-entails explanation (aimacode#793) * added tt-entails explanation * Updated README.md * Added simple problem solving agent in search.ipynb (aimacode#795) * Update README.md (aimacode#796) * Resolved merge conflicts in mdp.ipynb (aimacode#801) * Resolved merge conflicts * Rerun * Metadata restored * Added to-cnf (aimacode#802) * Update CONTRIBUTING.md (aimacode#806) * Remove commented codes in agents.ipynb (aimacode#805) * Minor formatting issues (aimacode#832) * Add injection A new function, `injection` for dependency injection of globals (for classes and functions that weren't designed for dependency injection). * styling and several bug fixes in learning.py (aimacode#831) * styling changes and bug fixes in learning.py * Fix aimacode#833 and other pep corrections in mdp.py * minor change mdp.py * renamed train_and_test() to train_test_split() aimacode#55 aimacode#830 * typo fix * Added DPLL and WalkSAT sections (aimacode#823) * Added dpll section * Updated README.md * Added WalkSAT section * Updated README.md * Added test for SimpleReflexAgentProgram (aimacode#808) * Added test for simpleReflexAgent * Fixed a bug * Fixed another bug * Move viz code + changes to search (aimacode#812) * Updating submodule * Moved viz code to notebook.py + changes * Changed use of 'next' * Added networkx to .travis.yml * Added others to .travis.yml * Remove time from .travis.yml * Added linebreaks and fixed case for no algo * Fixed spaces for args * Renamed *search as *search_for_vis * Added air_cargo to planning.ipynb (aimacode#835) * Added air_cargo to planning.ipynb * Some style issues * fixed all instances of issue aimacode#833 (aimacode#843) * test commit * agents.ipynb * agents.ipynb * Fixed all the instances of issue aimacode#833 * minor fix and cleared change in agents.ipynb * Added min-conflicts section (aimacode#841) * Added section on min-conflicts * Refactor one-liner for loop * Added tests for min_conflicts and NQueensCSP * Rewrote parts of search.ipynb (aimacode#809) * Rewrote parts of search.ipynb * Fixed typo and cleared cell output * Added pl-fc-entails section (aimacode#818) * Added pl-fc-entails section * Updated README.md * Updated filename * Added tests for pl-fc-entails * Review fixes * Implemented HybridWumpusAgent (aimacode#842) * Added WumpusKB for use in HybridWumpusAgent * Implemented HybridWumpusAgent added WumpusPosition helping class. * Forward-Backward examples added to the probability.ipynb. Fixes issue aimacode#813 (aimacode#827) * Forward-Backward examples added to the ipynb. Fixes issue aimacode#813 * Forward-Backward examples added to the probability.ipynb. Fixes issue aimacode#813 * Convert Latex syntax to Markdown except from the equations with subscript characters * Add test for TableDrivenAgentProgram. (aimacode#749) Fixes aimacode#748. * Refactored EightPuzzle class (aimacode#807) * Refactor EightPuzzle class * return instead of print * Added tests for EightPuzzle * Review fixes * Review fixes * Fixed tests * Update inverted commas in docstrings * Changed plotting function for NQueensCSP (aimacode#847) * Updated README.md * Added function to plot NQueensProblem * Added queen image * Changed plotting function for NQueensCSP * Replaced f'{}' with .format() notation * Added Pillow to travis.yml * Refactored N-Queens problem (aimacode#848) * NQueensProblem returns tuples as states * Reran search.ipynb * List to tuple * Changed default value and add heuristic function * Added astar_search for NQueensProblem * Added tests for NQueensProblem * Fix various issues in backgammon and expectiminimax (aimacode#849) * Fix expectiminimax and utility issues * Correct result function * Fix issue with dice roll in different states * Refactor code * Added missing tests (aimacode#854) * Added Random Forest Learner in learning.ipynb (aimacode#855) * fix aimacode#844 Added Random Forest Learner * Update learning.ipynb * Update learning.ipynb * Add files via upload * Updated image source * Added Random Forest in contents section * Added sections on learning.ipynb (aimacode#851) * Updated README.md (aimacode#864) * Added missing execution_count (aimacode#867) * updated vacuum_world.ipynb (aimacode#869) * fixed typos * fixed several typos * Corrected test_compare_agents() function TrivialVacuumEnvironment was missing a pair of parenthesis while creating the "environment" object of the TrivialVacuumEnvironment class ModelBasedVacuumAgent and ReflexVacuumAgent were also missing parenthesis while creating the "agents" object. * Reverted changes made to test_compare_agents() * fixed typo * Assemble all backgammon code in a single class (aimacode#868) * Remove BackgammonBoard class * Refactor code * Added Simulated Annealing to search notebook (aimacode#866) * A few helper functions * Added Simulated Annealing * Updated README.md * Refactored WumpusKB and HybridWumpusAgent to use Expr obejcts (aimacode#862) * Added ask_with_dpll to WumpusKB * Refactored WumpusKB * Refactored HybridWumpusAgent * No need for ask_with_dpll, fix typos * override ask_if_true in WumpusKB * remove extra line * Added gui version for eight_puzzle (aimacode#861) * Added SATPlan to logic.ipynb (aimacode#857) * Added SATPlan to logic.ipynb * Updated README.md * Removed all Rule mechanism (aimacode#859) * Correction in the formula for mean square error (aimacode#850) * added min_consistent_det to knowledge.ipynb (aimacode#860) * added min_consistent_det to knowledge.ipynb * some minor changes * Added function and test cases for cross-entropy loss (aimacode#853) * Correction in the formula for mean square error * Added cross-entropy loss * Test case for cross-entropy loss * Decimal point mistake * Added spaces around = and == * Added test case for CYK_parse (aimacode#816) * added test case for CYK_parse * added testcase for CYK_parse * corrected spacing * fixed issues like alignment, missing comma etc. * test case for double tennis problem * Update planning.py removed commented print statements. * refactored FIFOQueue, Stack, and PriorityQueue (aimacode#878) * Added Depth Limited Search in search.ipynb (aimacode#876) * Added Depth Limited Search in search.ipynb * Made changes in depth limited search * Implemented plan_route and plan_shot (aimacode#872) * define plan_route, plan_shot and refactor code * minor changes * Added PlanRoute Problem class * update plan_route return list of actions ( node.solution() ) instead of node itself in plan_route * add federalist papers classification (aimacode#887) * Update nlp_apps.ipynb (aimacode#888) * Added ensemble learner (aimacode#884) * Added ensemble learner in learning.ipynb * Added ensemble_learner.jpg * Update learning.ipynb * Update learning.ipynb * Added Iterative deepening search in search.ipynb (aimacode#879) * Added linear learner (aimacode#889) * Added linear learner in learning.ipynb * Update learning.ipynb * Update learning.ipynb * Update learning.ipynb * Renaming image (aimacode#900) * deleting * reuploading * Fixed errors occurred in search.ipynb due to refactoring (aimacode#902) * refactored changes * added DLS and IDS to readme * added have_cake_and_eat_cake_too (aimacode#906) * added have_cake_and_eat_cake_too * renamed effect_neg to effect_rem * added have_cake_and_eat_cake_too to readme * added details to Problems in planning.ipynb Added more information for the problems Air Cargo, Spare Tire, Three Block Tower, Have Cake and Eat Cake Too. * removed a test from planning.py A test for three block tower problem is written here. I have removed it. * Style fixes * minor style fix * fixed a typo * minor fixes some sentence issues * minor changes * minor fixes * Add play_game function and fix backgammon game play issues (aimacode#904) * Resolve recursion issue in Backgammon class * Handle empty action list in player functions * Add play_game method for backgammon * Refactor functions * Update argmax function call * Fixed errors in notebooks (aimacode#910) * corrected cell type * fixed umbrella_prior not defined error * minor changes * [WIP] Refactor planning.py (aimacode#921) * GraphPlan fixed * Updated test_planning.py * Added test for spare_tire * Added test for graphplan * Added shopping problem * Added tests for shopping_problem * Updated README.md * Refactored planning notebook * Completed shopping problem * Refactors * Updated notebook * Updated test_planning.py * Removed doctest temporarily * Added planning graph image * Added section on GraphPlan * Updated README.md * Removed append (aimacode#920) * Fixes problems in mdp.py (aimacode#918) * Added MDP2 class * Updated loop termination condition in value_iteration * comment in method alphabeta_search (aimacode#914) should be '# Body of alphabeta_search:' * Include stochastic game class and generic expectiminimax (aimacode#916) * Add stochastic game class * Update backgammon class * Update Expectiminimax * Fix lint issues * Correct compute_utility function * Minor update to planning.py (aimacode#923) * PDDLs and Actions can now be defined using Exprs as well as Strings * Minor refactors * Minor refactor on a variable name (aimacode#925) * Added a line to child node `child_node` method of `Node` uses the `problem.result` which returns normally a state not a node according to its docstring in `Problem`. Naming the variable `next_node` can be confusing to users when it actually refers to a resulting state. * Update search.py * Update search.py * Minor refactors (aimacode#924) * Refactored HLA * Refactors * Refactored broken tests * Cleaned up duplicated code * Cleaned up duplicated code * Added TotalOrderPlanner * Linearize helper function * Added tests for TotalOrderPlanner * Readd sussman anomaly test * Added Logarithmic Naive Bayes Learner. (aimacode#928) * test case for zebra problem * Revert "Merge remote-tracking branch 'upstream/master'" This reverts commit 5ceab1a, reversing changes made to 34997e4. * discarded HEAD changes for merge * added ensemble_learner jpeg * updated travis and search * Added logarithmic learner in nlp_apps (aimacode#890) * Remove zebra tests * revised Bayes learner explanation * added missing SimpleReflexAgent from upstream * Added PartialOrderPlanner (aimacode#927) * Added PartialOrderPlanner * Added doctests * Fix doctests * Added tests for PartialOrderPlanner methods * Added test for PartialOrderPlanner * Rerun planning.ipynb * Added notebook section for TotalOrderPlanner * Added image * Added notebook section for PartialOrderPlanner * Updated README.md * Refactor double tennis problem * Refactored test for double_tennis_problem * Updated README.md * Added notebook sections for job_shop_problem and double_tennis_problem * Updated README.md * Fixed refinements example * Added go_to_sfo problem * Rename TotalOrderPlanner * Renamed PDDL to PlanningProblem * Added POMDP-value-iteration (aimacode#929) * Added POMDP value iteration * Added plot_pomdp_utility function * Added tests for pomdp-value-iteration * Updated README.md * Fixed notebook import * Changed colors * Added notebook sections for POMDP and pomdp_value_iteration * Fixed notebook parsing error * Replace pomdp.ipynb * Updated README.md * Fixed line endings * Fixed line endings * Fixed line endings * Fixed line endings * Removed numpy dependency * Added docstrings * Fix tests * Added a test for pomdp_value_iteration * Remove numpy dependencies from mdp.ipynb * Added POMDP to mdp_apps.ipynb * Information Gathering Agent and probability notebook update (aimacode#931) * Formatting fixes * Added runtime comparisons of algorithms * Added tests * Updated README.md * Added HMM explanation and contents tab * Added section on fixed lag smoothing * Added notebook sections on particle filtering and monte carlo localization * Updated README.md * Minor formatting fix * Added decision networks and information gathering agent * Added notebook sections for decision networks and information gathering agent * Updated README.md * minor spacing * more minor spacing * Angelic_search (aimacode#940) * Added angelic search to planning code * Added unit tests for angelic search * Created notebook planning_angelic_search.ipynb * Fixed refinements function for HLAs * Hierarchical (aimacode#943) * created notebooks for GraphPlan, Total Order Planner and Partial Order Planner * Added hierarchical search and tests * Created notebook planning_hierarchical_search.ipynb * Added making progress and tests, minor changes in decompose * image for planning_hierarchical_search.ipynb * Updated the status of angelic_search and hierarchical_search. * Notebook updates (aimacode#942) * Added notebook section for DTAgentProgram * Updated README.md * Minor * Added TODO to fix pomdp tests * Added notebook section on AC3 * Added doctests to agents.py * Fixed pomdp tests * Fixed a doctest * Fixed pomdp test * Added doctests for rl.py * Fixed NameError in rl.py doctests * Fixed NameError in rl.py doctests * Minor fixes * Minor fixes * Fixed ImportErrors * Fixed all doctests * Search notebook update (aimacode#933) * Added tests for online dfs agent * Minor formatting fixes * Completed notebook sections * Updated README.md * Fixed a test * Added new algorithms to display_visual notebook function * Added RBFS visualization * Foil (aimacode#946) * Modified FOIL_container * Added unit tests for FOIL_container functions * Added knowledge_current_best notebook * Added knowledge_FOIL notebook * Added knowledge_version_space notebook * Added images for knowledge_FOIL notebook * knowledge.ipynb replaced by knowledge_current_best.ipynb, knowledge_version_space.ipynb, knowledge_FOIL.ipynb * modify knowledge.py * Updated FOIL entry. * Minor modifications in planning_angelic_search.ipynb and knowledge_FOIL.ipynb notebooks (aimacode#949) * Minor modifications in planning_angelic_search.ipynb and knowledge_FOIL.ipynb notebooks. * added matplotlib to requirements.txt * Update requirements.txt * Update requirements.txt * Update requirements.txt * Logic notebook update (aimacode#932) * Added KB_AgentProgram and subst * Added doctests * Updated README.md * Fixed doctest * Fixed doctest * Fixed doctest * Added definite_clauses_KB to logic.py * Fixed a doctest, again * Fixed another doctest * Fixed another doctest * Moved unnecessary doctests to unit tests * Added unit test for ModelBasedReflexAgent * Added unit test for ModelBasedReflexAgent * Updated README.md * Minor fix * Fixed a doctest * Update README.md * Remove unnecessary goal test in search.py (aimacode#953) Remove unnecessary initial goal test in best_first_graph_search. The loop will catch that case immediately. * Minor Changes in Text (aimacode#955) * Minor text change (aimacode#957) To make it more accurate. * Minor change in text (aimacode#956) To make it more descriptive and accurate. * Added relu Activation (aimacode#960) * added relu activation * added default parameters * Changes in texts (aimacode#959) Added a few new sentences, modified the sentence structure at a few places, and corrected some grammatical errors. * Change PriorityQueue expansion (aimacode#962) `self.heap.append` simply appends to the end of the `self.heap` Since `self.heap` is just a python list. `self.append` calls the append method of the class instance, effectively putting the item in its proper place. * added GSoC 2018 contributors A thank you to contributors from the GSoC 2018 program! * Revamped the notebook (aimacode#963) * Revamped the notebook * A few changes reversed Changed a few things from my original PR after a review from ad71. * Text Changes + Colored Table (aimacode#964) Made a colored table to display dog movement instead. Corrected grammatical errors, improved the sentence structure and corrected any typos found. * Fixed typos (aimacode#970) Typos and minor other text errors removed * Fixed Typos (aimacode#971) Corrected typos + minor other text changes * Update intro.ipynb (aimacode#969) * Added activation functions (aimacode#968) * Updated label_queen_conflicts function (aimacode#967) Shortened it, finding conflicts separately and storing them in different variables has no use later in the notebook; so i believe this looks better. * Removed the Assignment of a variable to itself (aimacode#976) * activate pruning inside `forward_checking` (aimacode#975) Otherwise `csp.curr_domains` may not be available for the loop that follows. * Fixed aimacode#972 (aimacode#973) * usa missing AZ (Arizona) as neighbor to NM (new mexico) (aimacode#977) `usa` in csp.py incorrectly has NM not being a neighbor with AZ. * Added more examples (aimacode#979) * Fixed small errors (aimacode#987) * fixing broken links * solved a typo in nlp.ipynb (aimacode#993) * updating submodule (aimacode#994) * Fix typos for Wupus agent (aimacode#999) * Fix typos for Wupus agent * fix imports * Grammar and typo fixes in logic notebook (aimacode#1002) * Update utils.py * fixed typo * some typos in agents file (aimacode#1008) * some typos in utils.py (aimacode#1017) * Added some test cases to first function in utils.py (aimacode#1016) * added some test cases to first function in utils.py * added space after each comma * Updated the count function in utils.py (aimacode#1013) * improved the count function in utils.py * updated according to PEP style * Updated the sequence function and added test cases (aimacode#1012) * updated the sequence function and added test cases * updated according to the PEP style * update in multimap function in utils.py and added test for it (aimacode#1014) * update in multimap functi n in utils.py and added test for it * made changes according to PEP style * broke the long sentence to 2 shorter sentences * necessary change in learning.py file (aimacode#1011) * no need of if else * no need of if - else . As if hidden_layer_sizes is zero then it will not affect layer_sizes * removed comment * Update learning.py * Fixed a typo in README (pl_fc_entails) (aimacode#1022) * Shuffling data before k-fold loop in cross validation. (aimacode#1028) * Rename search-4e.ipynb to obsolete-search-4e.ipynb * Add files via upload * Add files via upload * Add files via upload * Add files via upload * Add files via upload * some optimizations in knowledge.py (aimacode#1034) * Add files via upload * Update in `NeuralNetLearner` function in `learnign.py` (aimacode#1019) * Update in NeuralNetLearner function * made the changes as suggested * Install dependencies from `requirements.txt` missing in `README.md` (aimacode#1039) * update in Readme.md * updated as per the suggestions in the review. * typo * Reworked PriorityQueue and Added Tests (aimacode#1025) * Reworked PriorityQueue spec Modified: - Priority Queue methods: queue[elem] now returns the first value of elem stored in queue elem in queue now correctly returns whether a copy of element is present regardless of the function value. Apparently the bug was introduced while trying to meet heapq spec del queue[elem] deletes the first instance of elem in queue correctly - Algorithms Same change in best_first_graph_search in romania_problem.py and search.py to make them compatible with the new spec - Tests Introduced 3 tests in test_utils.py to comprehensively test PriorityQueue's new spec * Reworked PriorityQueue spec Modified: - Priority Queue methods: queue[elem] now returns the first value of elem stored in queue elem in queue now correctly returns whether a copy of element is present regardless of the function value. Apparently the bug was introduced while trying to meet heapq spec del queue[elem] deletes the first instance of elem in queue correctly - Algorithms Same change in best_first_graph_search in romania_problem.py and search.py to make them compatible with the new spec - Tests Introduced 3 tests in test_utils.py to comprehensively test PriorityQueue's new spec * closing the old cell window (aimacode#996) * Add files via upload * changed queue to set in AC3 (aimacode#1051) * changed queue to set in AC3 Changed queue to set in AC3 (as in the pseudocode of the original algorithm) to reduce the number of consistency-check due to the redundancy of the same arcs in queue. For example, on the harder1 configuration of the Sudoku CSP the number consistency-check has been reduced from 40464 to 12562! * re-added test commented by mistake * Rework agents.ipynb (aimacode#1031) * Reworked Introduction and 1-D environment in agents.py Added: - Table of Contents and overview - A miniscule explanation of all required code from agents.py Modified: - Some grammar and sentences - Structure of the notebook in 1-D environments to make it more coherent Removed: - Outputs from notebook (Makes VCS tough and bugs tough to detect) * Reworked agents in a 2D environment Modified: - Removed global variable turn from 2D park model: Agent programs are not supposed to see anything except percepts - Bump percept is now generated when Dog is about to bump into a wall - Replaced all XYEnvironment with GraphicEnvironment - Gives better readability to both code and output (Previous way of just showing GraphicEnvironment in the end was redundant imo) - Restructured the 2D park and EnergeticBlindDog environment scenario to be more readable Removed: - Redundant Park2D without graphics (subclass of XYEnvironment) * Fixed issue aimacode#1030 Added: - ipython and ipythonblocks packages to requirements.txt * Has some typographic improvements in agents.ipynb * Added output to agents.ipynb * The function truncated_svd() should not return negative singular values. (aimacode#1059) * Added test cases for agents.py (aimacode#1057) * Added WumpusWorld testcases Added: - Testcases in test_agents.py Modified: - Duplicate walls are not added in corners now * Tests for VacuumEnvironment and WumpusEnvironment Added: - Test cases for Explorer actions in WumpusEnvironment - Test cases for VacuumEnvironment Modified: - VacuumAgent correctly disables bump percept when agent sucks dirt after bumping into a wall * Added spaces in tuples to comply with PEP8 * Updates and changes required in CONTRIBUTING.md (aimacode#1055) * Update GSOC link to correctly refer aimacode@GSOC2019 * Correct the underlying typo * Update CONTRIBUTING.md (aimacode#1063) Update CONTRIBUTING.md * Improvement in train_test_split function (aimacode#1067) Improvement in train_test_split function Shuffling has been removed * style fixes * added necessary and unique tests (aimacode#1071) * added necessary and unique tests * some required changes * required changes * Add files via upload * Update test_csp.py (issue aimacode#287) (aimacode#1073) Cover or use at least once in tests classes, methods, and non-debugging functions of csp.py (version 18.04.2019). Issue aimacode#287. * added text classification in nlp_apps (aimacode#1043) * added text classification in nlp_apps * updated as per the changes suggested. * Update learning.py * Bruyang (aimacode#1075) * add monte carlo tree search * add comments to mcts * change model based reflex agent * add games4e.py * recover games4e.ipynb * add tests to mcts * Added coverage report generation to Travis (aimacode#1058) Added: - .coveragerc file to configure report generation Modified: - .travis.yml to start generating reports during build * added class Tfidf (aimacode#1054) * Implementing Class Tfidf * added class of Tfidf * added scoring function BM25 * Revert "Implementing Class Tfidf"
Recently in the nlp_apps notebook I added a section on the Federalist Papers. What I did was write a simple workflow from start to finish. There is a lot of work to be done still and I am opening this to community contributions, since I believe it is a great way to get started with the applications notebooks.
A few ways you can improve the section:
DONE - One big issue with the Naive Bayes Classifier in this problem is that the multiplication of probabilities causes underflow (all the probability multiplications result in 0.0). That happens because examples are long texts. To avoid this, we are currently using the
decimal
module of Python. I believe this problem can be solved more elegantly using the logarithm of probabilities instead of probabilities. So instead of multiplying the probabilities, we add their logarithms.Do some pre-processing. Currently I only added a sample pre-processing step (removing one common word from each paper). I would like to see some other pre-processing tasks + analysis of the text. Which are the most common words for each author? Is it worth it if we removed the most popular words?
Right now we are using unigram word models. There are other options available too. I would like to explore this in the notebooks. Maybe an author likes using two words together. Maybe another spells some words a bit differently. I would like to see different models used/explored in the notebook, to let the readers know that they shouldn't rely on just one model all the time. We can even combine models together.
At the end of the notebook I note that the dataset is lopsided. We have way more information on Hamilton than the other two. Maybe it is worth adding some more writings from Jay/Madison to balance this out. I think it would be interesting to see if we could improve the results by using external data. This could come after the current section, so that we could compare the results.
Finally, maybe we can take a step back and try and classify all the Federalist papers, not just the disputed ones. Add a new section where we use external data to train our model and then try and classify the papers.
This is a big undertaking, and it doesn't need to happen on the particular problem. If you have a problem in mind, you can instead use the above ideas to tackle your own problems! Sentiment Analysis is trending right now, so maybe this is a place to explore some of the above.
All in all, I think this is a good project to chip in every once in a while and I hope it will serve as an introduction to the repository. Or maybe it will sound interesting to GSoC students who might choose to tackle this.
In any case, feel free to post here with ideas + if you want to start working on something.
The text was updated successfully, but these errors were encountered: