Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some Issues with guiact Data #2

Open
numbmelon opened this issue Jul 22, 2024 · 2 comments
Open

Some Issues with guiact Data #2

numbmelon opened this issue Jul 22, 2024 · 2 comments

Comments

@numbmelon
Copy link

Thank you for your great work, this has been particularly helpful for me!
However, i have encountered some issue while using theguiact dataset. These may effect both training and evaluation process.

For example, in the smartphone_train_data.json, i have noticed that the task starting with uid_episode_332227296314252139_step_*, contain an unusually high number of actions (over 100). These actions include an excessive amount of swipe and tap operations, which seems abnormal.

Also, when evaluate the web-single test set, i checked the error log and found texts as following:

{
    "uid": "uid_image_4b80efeb-599c-489e-b885-f44206f6fada_qa_01",
    "question": "Sort reviews by most relevant",
    "pred": [
        ...
    ],
    "label": [
        {
            "element": "<box>884, 749, 915, 766</box>",
            "name": "click",
            "element_id": "45"
        },
        {
            "element": "<box>940, 749, 1029, 766</box>",
            "text": "Most relevant",
            "name": "select",
            "element_id": "46"
        }
    ]
}

The bounding boxes marked as answers look like this:
3_with_bboxes
As you can see, this is not so reasonable. Perhaps a single select operation at element 46 could complete the task?

Could these issues be improved in future releases, perhaps through manual review or by using scripts to filter out problematic data? Your attention to this matter would be greatly appreciated.
Thank you again for your hard work and for providing such a valuable dataset.

@yiye3
Copy link
Collaborator

yiye3 commented Jul 23, 2024

Thank you for your attention and suggestions,
We have done some checking work on our guiact data before.

  • web-multi is annotated by crowdsourcing, and we filter unreasonable actions by design rules after the annotation.
  • web-single is annotated by GPT4-V, and there are many errors in the original data. Therefore, we checked the web-single data by crowdsourcing and improved the accuracy from 55% to 92%.
  • smartphone data is processed from AITW-general, and we don't expect that there would be such long actions in AITW data.

Thank you again for your questions, we plan to re-fix our data in the next two months:

  1. Re-check the evaluation dataset by manual.
  2. Filter the training dataset by design rules.

When we finish the check, we will update our data and notify you in the issue and the "updates" partition in this repository.

@korbinian-hoermann
Copy link

Hi @yiye3 ! Are there any updates on this (i could not find any info in the "updates" partition)?

Further, I wanted to ask, why sometimes steps within a trajectory are missing (e.g. uid_record_05421 from the GUIAct-web multi test set consists of step 1 and 9 only)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants