-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix slow DataFrame
creation in CogniteResource.to_pandas
#1389
Conversation
Codecov Report
@@ Coverage Diff @@
## master #1389 +/- ##
==========================================
- Coverage 90.75% 90.68% -0.07%
==========================================
Files 117 117
Lines 14250 14247 -3
==========================================
- Hits 12932 12920 -12
- Misses 1318 1327 +9
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great start! 😄
cognite/client/data_classes/_base.py
Outdated
df.loc[name] = [value] | ||
|
||
data = [{"value": value} for name, value in dumped.items()] | ||
df = pd.DataFrame(data, index=dumped.keys()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a way you can feed the dumped dict
directly?
tests/tests_unit/test_base.py
Outdated
|
||
result_df = obj.to_pandas() | ||
|
||
expected_df = pd.DataFrame({"value": [1]}, index=["id"]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the point of the test was to make sure the ordering of rows stays consistent after your change, you gotta need more rows 😉
@haakonvt made altercations based on your comments, does this seem better? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost there!
cognite/client/data_classes/_base.py
Outdated
for name, value in dumped.items(): | ||
df.loc[name] = [value] | ||
|
||
df = pd.Series(dumped).to_frame(name="value") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Very nice! 🚀 Just a minor change:
return pd.Series(dumped).to_frame(name="value")
tests/tests_unit/test_base.py
Outdated
|
||
obj = AssetList([Asset(external_id=f"ext-{i}", name=f"name-{i}") for i in range(5)]) | ||
|
||
result_df = obj.to_pandas() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does not actually call your code, it will just call dump on each asset. I suggest you just create one test asset with quite a few of the accepted parameter:
1. external_id
2. name
3. parent_id
4. parent_external_id
5. description
6. data_set_id
7. metadata
8. source
9. labels
10. geo_location
11. id
12. created_time
13. last_updated_time
14. root_id
15. aggregates
Apart from that, the test looks great!
DataFrame
instantiation in CogniteResource.to_pandas
DataFrame
instantiation in CogniteResource.to_pandas
DataFrame
creation in CogniteResource.to_pandas
@haakonvt I hope i understood the feedback on the test correctly |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! 🚀
7101d6f
to
ea10820
Compare
…ry df= line before return
ea10820
to
d5efe27
Compare
Description
Optimize DataFrame creation by initializing with a list of dictionaries
Checklist:
If a new method has been added it should be referenced in cognite.rst in order to generate docs based on its docstring.