Define how the crawler is supposed to crawl and how to store its data #4

marco-c · 2018-02-15T23:35:08Z

We should decide how to navigate the web and how to store the data.
The simplest option is:

Follow any link / interact with any element on the page;
Store in a file the steps taken;
Parse coverage data and store coverage report (the name of the report could be WEBSITE_URL-INDEX_OF_STEP_IN_STEPS_FILE.json);
Whenever we have to switch to a different website, restart Firefox to reset the coverage data to 0.

rhcu · 2018-03-12T09:44:36Z

Hi Marco!
Wanted to clarify if I understood right,
The resource I was reading to understand coverage code tools: http://ncover.sourceforge.net/why.html.
So, as I understood we need to decide how to combine the work of web crawler with the code coverage testing tool and just to write the output to the JSON file.
I thought maybe we can add the queue of child elements the crawler gets. My thoughts can be summarized as follows:

Load one website. Wait for several seconds for it to load all elements. Inspect all elements of this parent link to find child links (e.g. buttons, links). Create a queue of all elements.
Traverse the queue with coverage code testing tool, writing the report for each one to the file
Restart the browser to see another website.

marco-c · 2018-03-12T11:37:08Z

We have a Firefox coverage build that emits coverage information.
We will use a crawler to navigate some websites using the Firefox coverage build. This build will automatically emit coverage information, that we will parse.

We already have a prototype version of the crawler (in crawler.py).

This is issue is about defining how we want the crawler to work and how we want to store the data.
The data we need to store is 1) the coverage information itself; 2) the steps taken on the website to generate the coverage information (as we will need to be able to replicate them in order to write tests).

svensevenslow · 2018-03-29T09:18:14Z

@marco-c Could you elaborate this a little more

Follow any link / interact with any element on the page;
Store in a file the steps taken
What type of file, what format for storing the steps. Will we be writing a script to parse the file and automatically replicate the steps stored in the file?

marco-c · 2018-03-29T10:15:41Z

The crawler should click on any link, button, or other element of the page. Not sure how to clarify this more.
Not sure about the format of the file, there are several options. I don't want to force a particular format as it's not really important. We will need the list of steps taken when we will write tests for Firefox (we need to know what the crawler did and replicate it as a test).

marco-c · 2018-06-20T20:08:48Z

After #51, #76 and #78 are done, we should revisit this.
At the moment, we are just running the crawler on all websites and then collecting the coverage at the end, which makes it difficult to reproduce the same steps again.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Define how the crawler is supposed to crawl and how to store its data #4

Define how the crawler is supposed to crawl and how to store its data #4

marco-c commented Feb 15, 2018

rhcu commented Mar 12, 2018

marco-c commented Mar 12, 2018

svensevenslow commented Mar 29, 2018

marco-c commented Mar 29, 2018

marco-c commented Jun 20, 2018

Define how the crawler is supposed to crawl and how to store its data #4

Define how the crawler is supposed to crawl and how to store its data #4

Comments

marco-c commented Feb 15, 2018

rhcu commented Mar 12, 2018

marco-c commented Mar 12, 2018

svensevenslow commented Mar 29, 2018

marco-c commented Mar 29, 2018

marco-c commented Jun 20, 2018