Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Aggeregate test summary files in CircleCI workflow runs #34989

Merged
merged 87 commits into from
Dec 16, 2024
Merged
Show file tree
Hide file tree
Changes from 79 commits
Commits
Show all changes
87 commits
Select commit Hold shift + click to select a range
729f61f
fix
ydshieh Nov 19, 2024
573455d
fix
ydshieh Nov 19, 2024
d788ffa
fix
ydshieh Nov 19, 2024
337956e
fix
ydshieh Nov 19, 2024
8367992
fix
ydshieh Nov 19, 2024
97efbd0
fix
ydshieh Nov 19, 2024
17ac09a
fix
ydshieh Nov 19, 2024
e95bc11
fix
ydshieh Nov 19, 2024
c24625f
fix
ydshieh Nov 19, 2024
b8f2f76
fix
ydshieh Nov 19, 2024
01037c6
fix
ydshieh Nov 19, 2024
f29dc9b
fix
ydshieh Nov 19, 2024
6086f9d
fix
ydshieh Nov 19, 2024
6bf6dca
fix
ydshieh Nov 19, 2024
95dac99
fix
ydshieh Nov 20, 2024
6f41cd8
fix
ydshieh Nov 20, 2024
4329c26
fix
ydshieh Nov 20, 2024
31ff1a1
fix
ydshieh Nov 20, 2024
1a1a10d
fix
ydshieh Nov 20, 2024
16b7459
fix
ydshieh Nov 20, 2024
9833fac
fix
ydshieh Nov 22, 2024
48234c7
fix
ydshieh Nov 22, 2024
f0d2bba
fix
ydshieh Nov 22, 2024
ad3b42c
fix
ydshieh Nov 22, 2024
a012473
fix
ydshieh Nov 22, 2024
00d5976
fix
ydshieh Nov 22, 2024
aa65c71
fix
ydshieh Nov 22, 2024
49e9e3f
fix
ydshieh Nov 22, 2024
1841f56
fix
ydshieh Nov 22, 2024
67935a6
fix
ydshieh Nov 22, 2024
0ea4bb1
fix
ydshieh Nov 22, 2024
3fc2af7
fix
ydshieh Nov 22, 2024
9d143f5
fix
ydshieh Nov 22, 2024
3e2ff0c
fix
ydshieh Nov 22, 2024
55aa01c
fix
ydshieh Nov 22, 2024
b3d8212
try 1
ydshieh Nov 27, 2024
75da5b8
try 1
ydshieh Nov 27, 2024
428b937
try 1
ydshieh Nov 27, 2024
a355972
try 1
ydshieh Nov 27, 2024
6f94ad8
try 1
ydshieh Nov 27, 2024
339e0f3
try 1
ydshieh Nov 27, 2024
931bd56
try 1
ydshieh Nov 27, 2024
ea08752
try 1
ydshieh Nov 27, 2024
a89920b
try 1
ydshieh Nov 27, 2024
1bf1088
try 1
ydshieh Nov 27, 2024
dab6281
try 1
ydshieh Nov 27, 2024
5a9978a
try 1
ydshieh Nov 27, 2024
84cb015
try 1
ydshieh Nov 27, 2024
ef53bf4
try 1
ydshieh Nov 27, 2024
fbc3390
try 1
ydshieh Nov 27, 2024
2ac2158
try 1
ydshieh Nov 27, 2024
f0b3c5d
try 1
ydshieh Nov 27, 2024
cc6600a
try 1
ydshieh Nov 27, 2024
12852e4
try 1
ydshieh Nov 27, 2024
eb0125d
try 1
ydshieh Nov 27, 2024
3be7105
try 1
ydshieh Nov 27, 2024
c9ef06e
try 1
ydshieh Nov 27, 2024
64bf5e0
try 1
ydshieh Nov 27, 2024
c93e34c
try 1
ydshieh Nov 27, 2024
d471968
try 1
ydshieh Nov 27, 2024
eda984c
try 1
ydshieh Nov 27, 2024
90f3a0a
try 1
ydshieh Nov 27, 2024
553e163
try 1
ydshieh Nov 27, 2024
af2ddf2
try 1
ydshieh Nov 27, 2024
a1f7c63
try 1
ydshieh Nov 27, 2024
f285200
try 1
ydshieh Nov 27, 2024
9d45dbc
try 1
ydshieh Nov 27, 2024
6ea3002
try 1
ydshieh Nov 27, 2024
977d27e
try 1
ydshieh Nov 28, 2024
4a1d93a
try 1
ydshieh Nov 28, 2024
74be7c0
try 1
ydshieh Nov 28, 2024
f7ab430
try 1
ydshieh Nov 28, 2024
9200176
try 1
ydshieh Nov 28, 2024
358f86c
try 1
ydshieh Nov 28, 2024
dd7fee4
try 1
ydshieh Nov 28, 2024
99c89d6
try 1
ydshieh Nov 28, 2024
3818ab0
fix
ydshieh Nov 28, 2024
ce760fa
fix
ydshieh Nov 28, 2024
a4586f0
fix
ydshieh Nov 28, 2024
fa07279
update
ydshieh Dec 5, 2024
3ce5700
Merge branch 'main' into final_job
ydshieh Dec 5, 2024
d14de3d
fix
ydshieh Dec 5, 2024
aa16edc
Merge branch 'final_job' of https://github.com/huggingface/transforme…
ydshieh Dec 5, 2024
b9d4bfb
fix
ydshieh Dec 6, 2024
c174645
Merge branch 'main' into final_job
ydshieh Dec 6, 2024
ef7f6a8
Merge branch 'main' into final_job
ydshieh Dec 10, 2024
0e6d198
Merge branch 'main' into final_job
ydshieh Dec 16, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -58,14 +58,14 @@ jobs:
name: "Prepare pipeline parameters"
command: |
python utils/process_test_artifacts.py

# To avoid too long generated_config.yaml on the continuation orb, we pass the links to the artifacts as parameters.
# Otherwise the list of tests was just too big. Explicit is good but for that it was a limitation.
# We used:

# https://circleci.com/docs/api/v2/index.html#operation/getJobArtifacts : to get the job artifacts
# We could not pass a nested dict, which is why we create the test_file_... parameters for every single job

- store_artifacts:
path: test_preparation/transformed_artifacts.json
- store_artifacts:
Expand Down
26 changes: 23 additions & 3 deletions .circleci/create_circleci_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -40,9 +40,22 @@ class EmptyJob:
job_name = "empty"

def to_dict(self):
steps = [{"run": 'ls -la'}]
if self.job_name == "collection_job":
steps.extend(
[
"checkout",
{"run": "pip install requests"},
{"run": """while [[ $(curl --location --request GET "https://circleci.com/api/v2/workflow/$CIRCLE_WORKFLOW_ID/job" --header "Circle-Token: $CCI_TOKEN"| jq -r '.items[]|select(.name != "collection_job")|.status' | grep -c "running") -gt 0 ]]; do sleep 5; done"""},
{"run": 'python utils/process_circleci_workflow_test_reports.py --workflow_id $CIRCLE_WORKFLOW_ID'},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In circle-ci can't we just use something like with or require? It will automatically wait for the run_test to run, and then fetch results?
(It could even be a github action no?)

Copy link
Collaborator Author

@ydshieh ydshieh Dec 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Job 2 require Job 1: if job 1 failed, job 2 won't be triggered (sad 😢 ). when: always can't apply for jobs in a workflow (only steps in a job) for CircleCI.

That is why such wait is implemented (although I don't like it).

Copy link
Collaborator

@ArthurZucker ArthurZucker Dec 5, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK then you can merge but I would like for this test to not block merge

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean if this new job fails, it shouldn't block the merge?

I will see if I can configure this 🤔

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am just doing

d14de3d

to make sure the run steps will always success.

{"store_artifacts": {"path": "outputs"}},
{"run": 'echo "All required jobs have now completed"'},
]
)

return {
"docker": copy.deepcopy(DEFAULT_DOCKER_IMAGE),
"steps":["checkout"],
"steps": steps,
}


Expand Down Expand Up @@ -352,6 +365,7 @@ def job_name(self):
DOC_TESTS = [doc_test_job]
ALL_TESTS = REGULAR_TESTS + EXAMPLES_TESTS + PIPELINE_TESTS + REPO_UTIL_TESTS + DOC_TESTS + [custom_tokenizers_job] + [exotic_models_job] # fmt: skip


def create_circleci_config(folder=None):
if folder is None:
folder = os.getcwd()
Expand All @@ -361,7 +375,13 @@ def create_circleci_config(folder=None):

if len(jobs) == 0:
jobs = [EmptyJob()]
print("Full list of job name inputs", {j.job_name + "_test_list":{"type":"string", "default":''} for j in jobs})
else:
print("Full list of job name inputs", {j.job_name + "_test_list":{"type":"string", "default":''} for j in jobs})
# Add a job waiting all the test jobs and aggregate their test summary files at the end
collection_job = EmptyJob()
collection_job.job_name = "collection_job"
jobs = [collection_job] + jobs

config = {
"version": "2.1",
"parameters": {
Expand All @@ -371,7 +391,7 @@ def create_circleci_config(folder=None):
**{j.job_name + "_test_list":{"type":"string", "default":''} for j in jobs},
**{j.job_name + "_parallelism":{"type":"integer", "default":1} for j in jobs},
},
"jobs" : {j.job_name: j.to_dict() for j in jobs},
"jobs": {j.job_name: j.to_dict() for j in jobs},
"workflows": {"version": 2, "run_tests": {"jobs": [j.job_name for j in jobs]}}
}
with open(os.path.join(folder, "generated_config.yml"), "w") as f:
Expand Down
83 changes: 83 additions & 0 deletions utils/process_circleci_workflow_test_reports.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
# Copyright 2024 The HuggingFace Team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import argparse
import json
import os
import requests


if __name__ == '__main__':
parser = argparse.ArgumentParser()
parser.add_argument('--workflow_id', type=str, required=True)
args = parser.parse_args()
workflow_id = args.workflow_id

r = requests.get(f"https://circleci.com/api/v2/workflow/{workflow_id}/job", headers={"Circle-Token": os.environ.get('CIRCLE_TOKEN', "")})
jobs = r.json()["items"]

os.makedirs("outputs", exist_ok=True)

workflow_summary = {}
# for each job, download artifacts
for job in jobs:

project_slug = job["project_slug"]
if job["name"].startswith(("tests_", "examples_", "pipelines_")):

url = f'https://circleci.com/api/v2/project/{project_slug}/{job["job_number"]}/artifacts'
r = requests.get(url, headers={"Circle-Token": os.environ.get('CIRCLE_TOKEN', "")})
job_artifacts = r.json()["items"]

os.makedirs(job["name"], exist_ok=True)
os.makedirs(f'outputs/{job["name"]}', exist_ok=True)

job_test_summaries = {}
for artifact in job_artifacts:
if artifact["path"].startswith("reports/") and artifact["path"].endswith("/summary_short.txt"):
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think using the junit.xlm will be more "foolproof" than regex parsing no?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is no regex in this script.

Regarding junit.xlm, yes probably more foolproof, but more post-process to be done I think:

  • xml's structure is more complex then summary_short.txt.
  • it doesn't give something like tests/generation/test_utils.py::GenerationIntegrationTests::test_generated_length_assisted_generation that we could copy-paste and run directly.

summary_short.txt is working well in a very simple way. If we find xlm is necessary in the future, happy to make the change however.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok sounds good!

node_index = artifact["node_index"]
url = artifact["url"]
r = requests.get(url, headers={"Circle-Token": os.environ.get('CIRCLE_TOKEN', "")})
test_summary = r.text
job_test_summaries[node_index] = test_summary

summary = {}
for node_index, node_test_summary in job_test_summaries.items():
for line in node_test_summary.splitlines():
if line.startswith("PASSED "):
test = line[len("PASSED "):]
summary[test] = "passed"
elif line.startswith("FAILED "):
test = line[len("FAILED "):].split()[0]
summary[test] = "failed"
# failed before passed
summary = {k: v for k, v in sorted(summary.items(), key=lambda x: (x[1], x[0]))}
workflow_summary[job["name"]] = summary

# collected version
with open(f'outputs/{job["name"]}/test_summary.json', "w") as fp:
json.dump(summary, fp, indent=4)

new_workflow_summary = {}
for job_name, job_summary in workflow_summary.items():
for test, status in job_summary.items():
if test not in new_workflow_summary:
new_workflow_summary[test] = {}
new_workflow_summary[test][job_name] = status

for test, result in new_workflow_summary.items():
new_workflow_summary[test] = {k: v for k, v in sorted(result.items())}
new_workflow_summary = {k: v for k, v in sorted(new_workflow_summary.items())}

with open(f'outputs/test_summary.json', "w") as fp:
json.dump(new_workflow_summary, fp, indent=4)