Check outliers on the full dataset #47

pro100olga · 2020-03-28T21:17:43Z

@dchaplinsky pls share the algorithm of defining outliers you used

dchaplinsky · 2020-03-29T23:37:07Z

f167721#diff-0a7976e2cc1844ff00834f1bede8d856

dchaplinsky · 2020-05-15T12:04:19Z

@pro100olga can you look into it?

pro100olga · 2020-05-15T20:29:52Z

Checked on 88K dataset as of 23/04/20.

Excluding people from outliers list

Change the logic

Now is based on id:

    excl = (~df['id'].isin([
        'nacp_08a63d8b-2db4-4ef0-8b8b-396e0cd9f495',
        'nacp_7762d918-fe93-4285-8703-7fbe18312634',
        'nacp_50a32d11-ebfa-4466-9bde-2f049cb00574']))

Should be changed to user_declarant_id.
Namely, exclude declarations, where user_declarant_id is in [54382, 728990]

Exclude more people:

Also, exclude from outliers declarations with the following user_declarant_id:
90684, 552845, 675626, 1084920, 1108047

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Check outliers on the full dataset #47

Check outliers on the full dataset #47

pro100olga commented Mar 28, 2020

dchaplinsky commented Mar 29, 2020

dchaplinsky commented May 15, 2020

pro100olga commented May 15, 2020

Check outliers on the full dataset #47

Check outliers on the full dataset #47

Comments

pro100olga commented Mar 28, 2020

dchaplinsky commented Mar 29, 2020

dchaplinsky commented May 15, 2020

pro100olga commented May 15, 2020