This is a crawler project make in Java language
There are two methods to build your crawler, the first one is SimpleCollector, this method is focused in performance, it's will make so downloading the HTML page and extracting the information, the second method is AjaxCollector, it's focused in will working at pages with AJAX.
final SimpleCollector collector = new SimpleCollector(<URL>, <FINDER>, <MATCHER>);
URL = Crawled url page = "https://www.americanas.com.br/produto/122597474/10692-lego-classic-pecas-criativas?pfm_carac=lego&pfm_page=search&pfm_pos=grid&pfm_type=search_page"
FINDER = CSS selector = ".price__SalesPrice-ej7lo8-2"
MATCHER = Regex applied in the tag = "^.?().$"
collector.run();
List<String> itens = collector.getItems();
Set<String> urls = collector.getURLs();
System.setProperty("webdriver.chrome.driver", <PATH>);
PATH = A path for chromedriver = "./chromedriver"
final AjaxCollector collector = new AjaxCollector(<URL>, <FINDER>, <MATCHER>);
URL = Crawled url page = "https://www.americanas.com.br/produto/122597474/10692-lego-classic-pecas-criativas?pfm_carac=lego&pfm_page=search&pfm_pos=grid&pfm_type=search_page"
FINDER = CSS selector = ".price__SalesPrice-ej7lo8-2"
MATCHER = Regex applied in the tag = "^.?().$"
collector.run();
List<String> itens = collector.getItems();
Set<String> urls = collector.getURLs();
Because I am not fluent in English, they are likely to have grammar and spelling mistakes, so I will accept any help in this.
I accept any help with the documentation you add to the project.
Any help in the code that helps to improve the quality or quantity of futures will be totally welcome.