Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define how the crawler is supposed to crawl and how to store its data #4

Open
marco-c opened this issue Feb 15, 2018 · 5 comments
Open

Comments

@marco-c
Copy link
Collaborator

marco-c commented Feb 15, 2018

We should decide how to navigate the web and how to store the data.
The simplest option is:

  1. Follow any link / interact with any element on the page;
  2. Store in a file the steps taken;
  3. Parse coverage data and store coverage report (the name of the report could be WEBSITE_URL-INDEX_OF_STEP_IN_STEPS_FILE.json);
  4. Whenever we have to switch to a different website, restart Firefox to reset the coverage data to 0.
@rhcu
Copy link
Collaborator

rhcu commented Mar 12, 2018

Hi Marco!
Wanted to clarify if I understood right,
The resource I was reading to understand coverage code tools: http://ncover.sourceforge.net/why.html.
So, as I understood we need to decide how to combine the work of web crawler with the code coverage testing tool and just to write the output to the JSON file.
I thought maybe we can add the queue of child elements the crawler gets. My thoughts can be summarized as follows:

  • Load one website. Wait for several seconds for it to load all elements. Inspect all elements of this parent link to find child links (e.g. buttons, links). Create a queue of all elements.
  • Traverse the queue with coverage code testing tool, writing the report for each one to the file
  • Restart the browser to see another website.

@marco-c
Copy link
Collaborator Author

marco-c commented Mar 12, 2018

We have a Firefox coverage build that emits coverage information.
We will use a crawler to navigate some websites using the Firefox coverage build. This build will automatically emit coverage information, that we will parse.

We already have a prototype version of the crawler (in crawler.py).

This is issue is about defining how we want the crawler to work and how we want to store the data.
The data we need to store is 1) the coverage information itself; 2) the steps taken on the website to generate the coverage information (as we will need to be able to replicate them in order to write tests).

@svensevenslow
Copy link

@marco-c Could you elaborate this a little more

  1. Follow any link / interact with any element on the page;
  2. Store in a file the steps taken
    What type of file, what format for storing the steps. Will we be writing a script to parse the file and automatically replicate the steps stored in the file?

@marco-c
Copy link
Collaborator Author

marco-c commented Mar 29, 2018

  1. The crawler should click on any link, button, or other element of the page. Not sure how to clarify this more.
  2. Not sure about the format of the file, there are several options. I don't want to force a particular format as it's not really important. We will need the list of steps taken when we will write tests for Firefox (we need to know what the crawler did and replicate it as a test).

@marco-c
Copy link
Collaborator Author

marco-c commented Jun 20, 2018

After #51, #76 and #78 are done, we should revisit this.
At the moment, we are just running the crawler on all websites and then collecting the coverage at the end, which makes it difficult to reproduce the same steps again.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants