Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix normed dtype #557

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open

Fix normed dtype #557

wants to merge 4 commits into from

Conversation

rettigl
Copy link
Member

@rettigl rettigl commented Jan 26, 2025

Sets the dtype of normalized data to that of unnormalized data.
Currently, it gets the dtype of the normalization histogram

@rettigl rettigl changed the base branch from main to v1_feature_branch January 26, 2025 20:08
@coveralls
Copy link
Collaborator

coveralls commented Jan 26, 2025

Pull Request Test Coverage Report for Build 13205728880

Details

  • 7 of 9 (77.78%) changed or added relevant lines in 3 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.003%) to 92.177%

Changes Missing Coverage Covered Lines Changed/Added Lines %
src/sed/core/processor.py 5 7 71.43%
Totals Coverage Status
Change from base Build 13167417292: 0.003%
Covered Lines: 7706
Relevant Lines: 8360

💛 - Coveralls

@rettigl rettigl requested a review from zain-sohail February 3, 2025 21:12
@rettigl rettigl changed the base branch from v1_feature_branch to main February 5, 2025 21:57
)
else:
self._normalization_histogram = normalization_histogram_from_timed_dataframe(
self._timed_dataframe,
axis,
self._binned.coords[axis].values,
self._config["dataframe"]["timed_dataframe_unit_time"],
hist_mode=self.config["binning"]["hist_mode"],
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems repeated. Probably can go out of the loop

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed the structure now to repeat less code


Returns:
xr.DataArray: Calculated normalization histogram.
"""
bins = df[axis].map_partitions(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this removed due to the updated dask version?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am using our optimized binning now for the timed dataframe. This is somewhat faster, does the sequential binning using the num_cores parameter, and shows the progress bar. The previous solution used the pandas cut to define bins, which requires bin edges rather than bin centers as our function.
I once checked that they produce very similar results (a very tiny difference was there, I think, because of different inclusion/exclusion of the bin edges into either left or right bin).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants