-
Notifications
You must be signed in to change notification settings - Fork 74
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ease the adding of new data #93
Comments
@konstantinstadler I wrote a wrapper around the def convert_id(
series: pd.Series,
from_type: str = "regex",
to_type: str = "ISO3",
not_found: str | None = None,
*,
additional_mapping: dict = None,
) -> pd.Series:
"""Takes a Pandas' series with country IDs and converts them into the desired type.
Args:
series: the Pandas series to convert
from_type: the classification type according to which the series is encoded.
Available types come from the country_converter package
(https://github.com/konstantinstadler/country_converter#classification-schemes)
For example: ISO3, ISO2, name_short, DACcode, etc.
to_type: the target classification type. Same options as from_type
not_found: what to do if the value is not found. Can pass a string or None.
If None, the original value is passed through.
additional_mapping: Optionally, a dictionary with additional mappings can be used.
The keys are the values to be converted and the values are the converted values.
The keys follow the same datatype as the original values. The values must follow
the same datatype as the target type.
"""
# if from and to are the same, return without changing anything
if from_type == to_type:
return series
# Create convert object
cc = coco.CountryConverter()
# Get the unique values for mapping. This is done in order to significantly improve
# the performance of country_converter with very long datasets.
s_unique = series.unique()
# Create a correspondence dictionary
mapping = pd.Series(
cc.convert(names=s_unique, src=from_type, to=to_type, not_found=nan),
index=s_unique,
).to_dict()
# If additional_mapping is passed, add to the mapping
if additional_mapping is not None:
mapping = mapping | additional_mapping
return series.map(mapping).fillna(series if not_found is None else not_found) For the main |
@konstantinstadler just bumping this here if you think it would be helpful to implement. |
Amazing! (Thanks for reminding/bumping that, first 3 month in 2023 were a bit crazy) To make it simple we could have a separate function which takes a defined "from" and "to" argument, as in your proposal. But this could potentially be added to pandas_convert, what do you think? |
Currently, new data (country mapping) can only be passed as dataframe with minimum fields (name_short, name_official, regex).
See for example
https://gist.github.com/konstantinstadler/a8c1a651aeda5c67c4910325b8a9b466
Add a functionality to, for example, pass just a ISO2 to ISO3 mapping directly to the convert function (as dict).
The text was updated successfully, but these errors were encountered: