Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Streamline cops filter #2

Open
kbmorales opened this issue Aug 6, 2020 · 5 comments · Fixed by #3
Open

Streamline cops filter #2

kbmorales opened this issue Aug 6, 2020 · 5 comments · Fixed by #3
Assignees
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@kbmorales
Copy link
Collaborator

Currently using string detection, but could use something more concrete like officer ID

@kbmorales kbmorales added enhancement New feature or request help wanted Extra attention is needed labels Aug 6, 2020
@kbmorales kbmorales linked a pull request Aug 12, 2020 that will close this issue
@kbmorales
Copy link
Collaborator Author

@camille-s has done some work on this!

@kbmorales
Copy link
Collaborator Author

Note: common misspelling of
Hankard
is
Hanford

@camille-s
Copy link

camille-s commented Aug 13, 2020

Second pass to improve upon #3:

  • More string cleanup
  • Pull from dsk8 related persons, not just dscr—probably roughly the same cops in both, but might as well be more exhaustive Ooof, officer IDs in dsk8 are all null
  • Incorporate manual running list of corrections (e.g. Hankard)
  • Concatenate names that look like maiden name --> married name: same ID & first name, 2 last names used

@camille-s
Copy link

camille-s commented Aug 13, 2020

@camille-s
Copy link

Starting on a second script now to clean up names from dsk8. There are no officer IDs in dsk8 (wtf), so I'm going to clean their names, then fuzzy-match to the cleaned-up names this script generates. I'm assuming all or most officers that have been on a case in circuit court (dsk8) have also been in district court (dscr). This will let us do some of the same analysis of charges, nolles, networks, etc on circuit court cases that we're doing on district using IDs instead of messy strings.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants