Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(crawler): add canceled context to Visit func #27

Conversation

DmitriyLewen
Copy link
Collaborator

Description

When we get error from Visit function we close c.urlCh.
But there are cases when another Visit function tries to write to c.urlCh before we return error.
In this case we get panic instead of error.
e.g. - https://github.com/aquasecurity/trivy-java-db/actions/runs/8041756834/job/21961450007#step:5:619

To avoid this case - we need to add context with cancel function for Visit function.

@DmitriyLewen DmitriyLewen self-assigned this Feb 26, 2024
@@ -63,6 +63,9 @@ func NewCrawler(opt Option) Crawler {

func (c *Crawler) Crawl(ctx context.Context) error {
log.Println("Crawl maven repository and save indexes")
ctx, ctxCancelFunc := context.WithCancel(ctx)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ctx, ctxCancelFunc := context.WithCancel(ctx)
ctx, cancel := context.WithCancel(ctx)

nit: I think cancel is used conventionally.
https://go.dev/doc/database/cancel-operations

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done - 8f9c98b

Comment on lines 172 to 176
go func() {
for _, child := range children {
c.urlCh <- url + child
}
}()
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What if just checking context here?

Suggested change
go func() {
for _, child := range children {
c.urlCh <- url + child
}
}()
go func() {
for _, child := range children {
select {
// Context can be canceled if we receive an error from another Visit function.
case <-ctx.Done():
return
default:
c.urlCh <- url + child
}
}
}()

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about this too.
I was worried that we need to stop func immediately after receiving the cancel function.
But after making HTTP requests context-sensitive (262f22d), we can move it into this loop.

Did this in c4a4202.

case <-ctx.Done():
return nil
default:
resp, err := c.http.Get(url)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a good opportunity to make the HTTP request context-aware.

Suggested change
resp, err := c.http.Get(url)
req, err := http.NewRequestWithContext(ctx, http.MethodGet, url, nil)
if err != nil {
return xerrors.Errorf("unable to new HTTp request: %w", err)
}
if err = client.Do(req); err != nil {

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point!
Created this in 262f22d.

@knqyf263 knqyf263 merged commit be4b443 into aquasecurity:main Feb 26, 2024
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants