Skip to content

Commit

Permalink
Answering the written-questions.txt
Browse files Browse the repository at this point in the history
  • Loading branch information
Eduguimar committed Apr 20, 2021
1 parent 8b4208c commit 21ce55f
Show file tree
Hide file tree
Showing 2 changed files with 21 additions and 2 deletions.
4 changes: 2 additions & 2 deletions webcrawler/src/main/config/sample_config.json
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
{
"startPages": ["https://www.udacity.com/"],
"startPages": ["https://www.udacity.com/", "https://github.com/"],
"ignoredUrls": ["https://blog.udacity.com/.*"],
"ignoredWords": ["^.{1,3}$"],
"parallelism": 4,
"maxDepth": 2,
"timeoutSeconds": 1,
"popularWordCount": 5,
"profileOutputPath": "profileData.txt"
}
}
19 changes: 19 additions & 0 deletions webcrawler/written-questions.txt
Original file line number Diff line number Diff line change
Expand Up @@ -11,28 +11,47 @@ Q1. Run the web crawler using the configurations located at src/main/config/writ

Why did the parser take more time when run with ParallelWebCrawler?

Because the ParallelWebCrawler read more urls than the SequentialWebCrawler. Therefore, the parse time is longer, because it has more content to read.


Q2. Your manager ran your crawler on her old personal computer, using the configurations from Q1, and she notices that
the sequential crawler actually outperforms the parallel crawler. She would like to know why.

(a) Suggest one reason why the sequential web crawler was able to read more web pages than the parallel crawler.
(Hint: Try setting "parallelism" to 1 in the JSON configs to simulate your manager's computer.)

Parallel web crawler uses more resources from a one thread computer, because the main requirements for optimal functioning is exactly a larger number of threads.
That is the reason why it does not perform well on a old computer.
As sequential web crawler was created to work sequentially without using more than one thread at the same time, it works better on a computer with only one thread.

(b) Suggest one scenario in which the parallel web crawler will almost certainly perform better than the sequential
crawler. Why will it perform better?

On a multi-core computer, the parallel web crawler will outperform the sequential web crawler.
A multi-core computer does have a larger number of threads, so it benefits from the parallel crawl functionality.

Q3. Analyze your method profiler through the lens of Aspect Oriented Programming, by answering the following questions:

(a) What cross-cutting concern is being addressed by the com.udacity.webcrawler.profiler.Profiler class?

The performance measure of the Profiler class is a property of a cross-cutting concern.

(b) What are the join points of the Profiler in the web crawler program?

The methods with @Profiled annotation are being addressed. It is the join point of the Profiler.


Q4. Identify three (3) different design patterns used in this project, and explain which interfaces, classes, and/or
libraries use or implement those design patterns.

For each pattern, name one thing about the pattern that you LIKED, and one thing you DISLIKED. If you did not like
anything, you can name two things you disliked.

- Dependency Injection - It's used in the WebCrawlerMain and Profiler classes. It uses the Guice plugin to inject dependencies.
This design pattern simplifies tests and make classes more modular, but it increases the number of classes and/or interfaces of the project.

- Builder Pattern - The CrawlerConfiguration, CrawlResult, ParserModule classes and PageParser interface uses the Builder method.
It simplifies the creation of instances of complex constructor classes, but it increases significantly the amount of code.

- Proxy Pattern - ProfilerImpl class.
Offers a good method for working with interfaces in runtime, but makes the code more complicated to understand.

0 comments on commit 21ce55f

Please sign in to comment.