Project based on https://github.com/vitorjordao/Gen-Crawler
This is a crawler project make in Clojure language
There are two methods to build your crawler, the first one is simple-collector, this method is focused in performance, it's will make so downloading the HTML page and extracting the information, the second method is ajax-collector, it's focused in will working at pages with AJAX.
(def crawled {:url <URL>
:find <FINDER>
:match <MATCHER>})
URL = Crawled url page = "https://www.americanas.com.br/produto/122597474/10692-lego-classic-pecas-criativas?pfm_carac=lego&pfm_page=search&pfm_pos=grid&pfm_type=search_page"
FINDER = CSS selector = ".price__SalesPrice-ej7lo8-2"
MATCHER = Regex applied in the tag = "^.?().$"
(def returned (simple-collector/run-item crawled))
(def returned (simple-collector/run-list crawled crawled))
(set-driver {:driver <DRIVER>
:url-driver <PATH>})
DRIVER = A driver name for chromedriver = "webdriver.chrome.driver"
PATH = A path for chromedriver = "./chromedriver"
(def crawled {:url <URL>
:find <FINDER>
:match <MATCHER>})
URL = Crawled url page = "https://www.americanas.com.br/produto/122597474/10692-lego-classic-pecas-criativas?pfm_carac=lego&pfm_page=search&pfm_pos=grid&pfm_type=search_page"
FINDER = CSS selector = ".price__SalesPrice-ej7lo8-2"
MATCHER = Regex applied in the tag = "^.?().$"
(def returned (ajax-collector/run-item crawled))
(def returned (ajax-collector/run-list crawled crawled))
Run the project's tests (they'll fail until you edit them):
$ lein test
Build an uberjar from the project:
$ lein install
Because I am not fluent in English, they are likely to have grammar and spelling mistakes, so I will accept any help in this.
I accept any help with the documentation you add to the project.
Any help in the code that helps to improve the quality or quantity of futures will be totally welcome.