Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collect error in the index #1097

Merged
merged 28 commits into from
Jan 3, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
28 commits
Select commit Hold shift + click to select a range
3d41e20
index: add error document to the index
TristanCacqueray Dec 20, 2023
0cd8e46
api: add error document type to the add doc endpoint
TristanCacqueray Dec 20, 2023
b389c62
crawler: emit crawler error when processing stream
TristanCacqueray Dec 20, 2023
f9357eb
api: add error indexing
TristanCacqueray Dec 20, 2023
1d808a0
index: add error created_at attribute
TristanCacqueray Dec 21, 2023
2475a4e
crawler: base64 encode json blob
TristanCacqueray Dec 21, 2023
5cac8e3
test: fix the macroscope failure test
TristanCacqueray Dec 21, 2023
303dae2
api: add crawler/errors endpoint to fetch errors
TristanCacqueray Dec 21, 2023
5e660b0
index: store the entity and timestamp in the errors_data structure
TristanCacqueray Dec 21, 2023
37656c9
test: verify the indexed error content
TristanCacqueray Dec 22, 2023
ee8a869
crawler: continue processing even when there are decoding errors
TristanCacqueray Dec 22, 2023
38119ec
chore: perform monocle-reformat-run
TristanCacqueray Dec 22, 2023
f179448
api: update dropTime to keep the current hour
TristanCacqueray Dec 22, 2023
4084bff
doc: add example to run a single test
TristanCacqueray Dec 22, 2023
9f040a7
web: add crawler api codegen
TristanCacqueray Dec 22, 2023
eec06d1
web: display crawler errors
TristanCacqueray Dec 22, 2023
4a71ad3
index: encode crawler error body by the api
TristanCacqueray Dec 23, 2023
0ea04b3
index: introduce new type for BinaryText
TristanCacqueray Dec 23, 2023
0f1df8a
index: bump version to apply new mapping
TristanCacqueray Dec 23, 2023
796d395
crawler: improve crawler error representation
TristanCacqueray Dec 24, 2023
9fb3c3b
crawler: introduce error variant for page-info
TristanCacqueray Dec 24, 2023
1731dee
crawler: preserve the original fetch error from morpheus client
TristanCacqueray Dec 24, 2023
32f140a
crawler: handle partial results
TristanCacqueray Dec 24, 2023
4d2bf8a
api: introduce CrawlerErrorList
TristanCacqueray Dec 27, 2023
23c4def
crawler: add stream error to stop the stream
TristanCacqueray Jan 3, 2024
9a60ca4
api: prevent error when submitting empty task data
TristanCacqueray Jan 3, 2024
2ba9fdd
doc: add profiling build instructions
TristanCacqueray Jan 3, 2024
8f03288
Rename PageInfoError to RateLimitInfoError
morucci Jan 3, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,8 @@ All notable changes to this project will be documented in this file.
- [crawler] Proxy can be configured with `HTTP_PROXY` and `HTTPS_PROXY` environment. To proxy http requests between crawlers and the api, use the `API_PROXY` environment.
- [crawler] A new `groups` sub-field in all Author fields (`on_author` and `author`) for `Change` and `Events`.
Groups memberships are reflected from the config file to the database.
- [crawler] Processing errors are no longer fatal and they are now stored in the index.
- [web] A red bell is added to the UI when crawler errors exists for the given query to display the missing data.

### Changed

Expand Down
45 changes: 45 additions & 0 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,12 @@ nix develop --command monocle-repl
λ> run $ defaultApiConfig 8080 "http://localhost:19200" "etc/config.yaml"
```

… or by running the executable:

```ShellSession
CRAWLERS_API_KEY=secret MONOCLE_CONFIG=./etc/config.yaml nix develop --command cabal run -O0 monocle -- api
```

The Monocle UI should be accessible:

```ShellSession
Expand Down Expand Up @@ -145,6 +151,12 @@ Run the full test suite with:
nix develop --command monocle-ci-run
```

Run a single test:

```ShellSession
cabal test --test-options='-p "Change stream"'
```

## Start the web development server

Start the web dev server (hot-reload):
Expand Down Expand Up @@ -239,3 +251,36 @@ Test the containers:
podman run --network host -v prom-data:/var/lib/prometheus:Z -e API_TARGET=localhost:8080 --rm quay.io/change-metrics/monocle-prometheus:latest
podman run -it --rm --network host quay.io/change-metrics/monocle-grafana:latest
```

## Example query

Add a crawler error:

```ShellSession
curl -X POST -d '{"index": "monocle", "crawler": "demo", "apikey": "secret", "entity": {"project_name": "neutron"}, "errors": [{"created_at": "2023-12-22T10:11:12Z"}]}' -H "Content-type: application/json" localhost:8080/api/2/crawler/add
```

Get crawler errors:

```ShellSession
curl -X POST -d '{"index": "monocle"}' -H "Content-type: application/json" localhost:8080/api/2/crawler/errors
```

## Debug by dumping every stacktrace

When the service fails with an obscure `NonEmpty.fromList: empty list`, run the following commands to get the full stacktrace:

```ShellSession
cabal --ghc-options="-fprof-auto" --enable-executable-profiling --enable-profiling --enable-library-profiling -O0 run exe:monocle -- api +RTS -xc -RTS
```

Note that this also shows legitimate exceptions that are correctly caught, but hopefully you should see something like:

```
*** Exception (reporting due to +RTS -xc): (THUNK_1_0), stack trace:
GHC.IsList.CAF
--> evaluated by: Monocle.Backend.Index.getChangesByURL,
called from Monocle.Backend.Index.taskDataAdd,
called ...
NonEmpty.fromList: empty list
```
2 changes: 1 addition & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -35,7 +35,7 @@ codegen-haskell:

codegen-javascript:
rm -f web/src/messages/*
sh -c 'for pb in $(MESSAGES); do ocaml-protoc $(PINCLUDE) -bs -ml_out web/src/messages/ schemas/$${pb}; done'
sh -c 'for pb in $(MESSAGES) $(CRAWLER); do ocaml-protoc $(PINCLUDE) -bs -ml_out web/src/messages/ schemas/$${pb}; done'
python3 ./codegen/rename_bs_module.py ./web/src/messages/

codegen-openapi:
Expand Down
Loading
Loading