Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CAT vocabulary lookup not working #534

Open
vmarchman opened this issue Aug 20, 2024 · 11 comments
Open

CAT vocabulary lookup not working #534

vmarchman opened this issue Aug 20, 2024 · 11 comments

Comments

@vmarchman
Copy link
Contributor

@HenryMehta Testing some CAT lookups. The Thetas that I'm getting seem really low, but they are there. But, all of the vocabulary look-ups are 0.

image

@vmarchman
Copy link
Contributor Author

Old CATs run previously seem to be working just fine. Also ones that are under 30 months. It appears to be only for children 31-36 months where it is not working.

@HenryMehta
Copy link
Collaborator

@vmarchman The Percentile for both sexes is 0. We have no values when the percentile is less than 5% which is why the estimated vocabulary by both sexes shows 0.

There is an error for the Est Vocab by sex. I've been going through the code and although it is getting the different Percentiles, it is then basing the Est Vocab for both and by sex on the Percentile by both.

I have corrected this and I'm just running tests (takes 1-2 hours) to ensure it has broken something else. I'll then deploy to test and prod.

Please note, this means all the CAT Est Vocabs by sex have been showing based on the percentile for by Both sexes. This will have been right when the percentiles were the same but wrong when different.

I'll let you know when deployed

@HenryMehta
Copy link
Collaborator

@vmarchman done

@vmarchman
Copy link
Contributor Author

@HenryMehta Let's see if we can just spit out the lowest value in the table in these cases. We are currently using tables with only 5 percentile increments. Let's switch to tables with 1 percentile increments. Then, you can just spit out the lowest value in the table (1st percentile).

You also then don't have to do the interpolation step as you would need to do for the 5 percentile tables.

@HenryMehta HenryMehta reopened this Aug 21, 2024
@vmarchman vmarchman reopened this Jan 27, 2025
@vmarchman
Copy link
Contributor Author

vmarchman commented Jan 27, 2025

Hi @HenryMehta I think we never implemented the 1 percentile tables for the estimating of the vocabulary size based on the theta percentile. If the Theta percentile is <1 (it should never be 0), then the estimated vocabulary for that child should be the 1 percentile value. Here are the 1 percentile tables for English WG and WS. Let me know if you have any questions.

WGprod_both_1.csv
WGprod_boys_1.csv
WGprod_girls_1.csv
WSprod_both_36months_1.csv
WSprod_boys_36_1.csv
WSprod_girls_36months_1.csv

@HenryMehta
Copy link
Collaborator

@vmarchman

I have done WG and it is available to test. I have assumed the percentages given are for words understood. I have not amended the other benchmark categories (Words Produced, Total Gestures, Phrases, Later Gestures, Early Gestures).

@HenryMehta
Copy link
Collaborator

@vmarchman
I have started working on the WS but the boys file only goes to age 30 months and the girls file is only every 5%.

Also, I think you need to understand, this will not stop the 0 issue. A WS (raw score rather than sex specific) of age 36 months having indicated less than 162 words will show as 0%. In order to show 1% in this case, we need the value in the cell to be 0.

@vmarchman
Copy link
Contributor Author

Hi @HenryMehta These are only for computing the estimated vocabulary for the English CAT, not the English WG or WS.

I will double check the numbers and files and get back to you.

@HenryMehta
Copy link
Collaborator

@vmarchman Blast, that means I've edited the wrong files. Fortunately I kept the originals

@vmarchman
Copy link
Contributor Author

Sorry about that @HenryMehta !!

You were right that some of the files weren't right. Here they are all again, all should be 1 percentile increments and to 36 months for WS. the WG is only prod from 8 to 18 months. These should be used to compute the estimated vocabulary scores based on the CAT Theta percentiles. Because they are at the 1-percentile level, you won't need to interpolate any scores (like we had to do with the 5 percentile-level tables).

WGprod_boys_1.csv
WGprod_girls_1.csv
WGprod_both_1.csv
WSprod_boys_36_1.csv
WSprod_both_36months_1.csv
WSprod_girls_36months_1.csv

I don't understand your comment re the 0 issue: Let me try to say what I want to happen for the CAT scoring outputs, and then maybe you can help me understand better.

There are two things happening: (1) spitting out a percentile value for the Theta based on the CAT look-up tables. If the actual theta is below the lowest value in the CAT theta tables, the percentile value that is output should be "< 1". There should never be a 0th percentile score output in any situation.

(2) providing an estimated vocabulary score for that "< 1" percentile based on the 1-percentile level vocabulary tables. If a particular child's theta percentile is "< 1", then the system should output the number from the 1 percentile vocabulary tables that represents the lowest value in the table, i.e., the 1st percentile value, for their estimated vocabulary score.

@HenryMehta
Copy link
Collaborator

@vmarchman ok, I've got the raw scores in the right place but I do not have estimated thetas in single percent intervals. I only have 5% intervals so that remains what you'll get

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants