Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Az state cleaner #45

Merged
merged 81 commits into from
Dec 5, 2023
Merged

Az state cleaner #45

merged 81 commits into from
Dec 5, 2023

Conversation

nrposner
Copy link
Contributor

Current additions to state cleaner, mostly on the cleaning and schema-compliance parts. I expect name standardization to be a cinch once those bits are done, outstanding problems are locating state information for individuals and organizations tables and determining whether incoming tables belong to individuals or organizations. The former will just require some finagling, the latter will benefit from the Monday meeting and clarifying points about the StateCleaner flow.

utils/clean.py Outdated
"""
# initial_dataframes = self.preprocess(filepaths_list)

# should run create tables, which runs through the functions above
# to preprocess, clean, standardizes and create the following tables
# (invividuals_table, organizations_table, transactions_table)


class AzCleaner(StateCleaner):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets enforce a standard of full state names, so ArizonaCleaner

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also for simplicity, lets place each state cleaner its own file, named after the state, so arizona.py

utils/clean.py Outdated
"""

# cleans transactions dates
self[0]["TransactionDate"] = self[0]["TransactionDate"].apply(convert_date)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self is not index-able. It is an instance of the AzCleaner class. If you would like to access attributes of the instance on which this method is called, you would do self.attribute_name. If you would like to pass in an additional argument, add it to the method definition.

utils/clean.py Outdated
"""This class is based on the StateCleaner abstract class,
and cleans Arizona data"""

def preprocess(self, filepaths_list: list[str]) -> list[pd.DataFrame]:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

look at the docstring for this method in the base class. what happens if I pass in an individuals file, then a transactions file, then an org file? I would still be following your spec, but it would fail to meet return enforced by the base class spec. Please be more rigorous in your spec here

Comment on lines 17 to 18
def name_clean(df: pd.DataFrame) -> pd.DataFrame:
"""Replace empty 'candidate' value with 'committee_name' value
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This name and docstring are both unclear to me. Remember, cleaner is going to be general for any state. So I would expect cleaner_utils to be general for any state. Can I use this for any state?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is only for Az, will update and improve docstring

return df_working


def az_transactions_convert(df: pd.DataFrame) -> pd.DataFrame:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job prepending with az since it is in the util file. But since this is so closely related to the state cleaner itself, this should just be a method of the arizona state cleaner (and then you wouldn't need the az since it is a method of the Az class)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will transfer it

return pd.DataFrame(data=d)


def remove_nonstandard(col):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what is col and what does this return?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

meant to be a column of the dataset, used on troublesome columns which tend to have a lot of nonstandard characters, and returns the cleaned column. Will update to clarify

utils/clean.py Outdated

self[3] = name_clean(self[3])

az_transactions = az_transactions_convert(self[0])
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we don't need the az prefix in here -- this is in the AzCleaner class

nrposner and others added 27 commits December 3, 2023 14:21
1 merge conflict in notebooks/README.md. Accepted all changes from
MN_abstract_class
Acctepting michigan-statecleaner changes in notebooks. Accepted both changes
in constants.py
import all state cleaners
@nrposner nrposner merged commit 74bc643 into dev Dec 5, 2023
@trevorspreadbury trevorspreadbury deleted the az_state_cleaner branch December 5, 2023 15:21
trevorspreadbury added a commit that referenced this pull request Mar 22, 2024
updated readme to contain defualt file names and have main access tho…
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants