-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Az state cleaner #45
Az state cleaner #45
Conversation
Merge branch 'dev' into w4_MN_CompleteData_EDA need to access dev
utils/clean.py
Outdated
""" | ||
# initial_dataframes = self.preprocess(filepaths_list) | ||
|
||
# should run create tables, which runs through the functions above | ||
# to preprocess, clean, standardizes and create the following tables | ||
# (invividuals_table, organizations_table, transactions_table) | ||
|
||
|
||
class AzCleaner(StateCleaner): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lets enforce a standard of full state names, so ArizonaCleaner
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also for simplicity, lets place each state cleaner its own file, named after the state, so arizona.py
utils/clean.py
Outdated
""" | ||
|
||
# cleans transactions dates | ||
self[0]["TransactionDate"] = self[0]["TransactionDate"].apply(convert_date) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
self is not index-able. It is an instance of the AzCleaner class. If you would like to access attributes of the instance on which this method is called, you would do self.attribute_name
. If you would like to pass in an additional argument, add it to the method definition.
utils/clean.py
Outdated
"""This class is based on the StateCleaner abstract class, | ||
and cleans Arizona data""" | ||
|
||
def preprocess(self, filepaths_list: list[str]) -> list[pd.DataFrame]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
look at the docstring for this method in the base class. what happens if I pass in an individuals file, then a transactions file, then an org file? I would still be following your spec, but it would fail to meet return enforced by the base class spec. Please be more rigorous in your spec here
utils/cleaner_utils.py
Outdated
def name_clean(df: pd.DataFrame) -> pd.DataFrame: | ||
"""Replace empty 'candidate' value with 'committee_name' value |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This name and docstring are both unclear to me. Remember, cleaner is going to be general for any state. So I would expect cleaner_utils to be general for any state. Can I use this for any state?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is only for Az, will update and improve docstring
return df_working | ||
|
||
|
||
def az_transactions_convert(df: pd.DataFrame) -> pd.DataFrame: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good job prepending with az
since it is in the util file. But since this is so closely related to the state cleaner itself, this should just be a method of the arizona state cleaner (and then you wouldn't need the az
since it is a method of the Az class)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will transfer it
utils/cleaner_utils.py
Outdated
return pd.DataFrame(data=d) | ||
|
||
|
||
def remove_nonstandard(col): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is col and what does this return?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
meant to be a column of the dataset, used on troublesome columns which tend to have a lot of nonstandard characters, and returns the cleaned column. Will update to clarify
utils/clean.py
Outdated
|
||
self[3] = name_clean(self[3]) | ||
|
||
az_transactions = az_transactions_convert(self[0]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't need the az prefix in here -- this is in the AzCleaner class
…-fall-clinic-climate-cabinet into az_state_cleaner merging after getting rid of conflicts with dev cleaner
…into MN_abstract_class
1 merge conflict in notebooks/README.md. Accepted all changes from MN_abstract_class
Acctepting michigan-statecleaner changes in notebooks. Accepted both changes in constants.py
import all state cleaners
Mn abstract class
updated readme to contain defualt file names and have main access tho…
Current additions to state cleaner, mostly on the cleaning and schema-compliance parts. I expect name standardization to be a cinch once those bits are done, outstanding problems are locating state information for individuals and organizations tables and determining whether incoming tables belong to individuals or organizations. The former will just require some finagling, the latter will benefit from the Monday meeting and clarifying points about the StateCleaner flow.