Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor libcudf indexalator to typed normalator #14043

Merged
merged 13 commits into from
Sep 22, 2023

Conversation

davidwendt
Copy link
Contributor

@davidwendt davidwendt commented Sep 6, 2023

Description

Creates generic normalizing-iterator for integer types for use by the indexalator and the future offsets normalizing iterator.
Mostly code has been moved around or renamed so the normalizing-iterator part can take type template parameter to identify which integer type to normalize to. For the indexalator, this type is cudf::size_type and for the offsets iterator this type would be int64.

Checklist

  • I am familiar with the Contributing Guidelines.
  • New or existing tests cover these changes.
  • The documentation is up to date with these changes.

@davidwendt davidwendt added 2 - In Progress Currently a work in progress libcudf Affects libcudf (C++/CUDA) code. improvement Improvement / enhancement to an existing function non-breaking Non-breaking change labels Sep 6, 2023
@davidwendt davidwendt self-assigned this Sep 6, 2023
@@ -32,193 +31,6 @@
namespace cudf {
namespace detail {

/**
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code moved/refactored to normalizing_iterator.cuh

@@ -0,0 +1,362 @@
/*
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code moved/refactored from indexalator.cuh

@davidwendt davidwendt added 3 - Ready for Review Ready for review by team and removed 2 - In Progress Currently a work in progress labels Sep 7, 2023
@davidwendt davidwendt marked this pull request as ready for review September 7, 2023 16:05
@davidwendt davidwendt requested a review from a team as a code owner September 7, 2023 16:05
@davidwendt davidwendt changed the title Refactor libcudf indexalator to typed normalator Refactor libcudf indexalator to typed normalator Sep 8, 2023
cpp/include/cudf/detail/indexalator.cuh Outdated Show resolved Hide resolved
cpp/include/cudf/detail/normalizing_iterator.cuh Outdated Show resolved Hide resolved
struct input_normalator : base_normalator<input_normalator<Integer>, Integer> {
friend struct base_normalator<input_normalator<Integer>, Integer>; // for CRTP

using reference = Integer const; // this keeps STL and thrust happy
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can the reference member type be defined on the base_normalator instead?

I'm not sure if I understand this comment. What keeps STL/Thrust happy, and what is the alternative?

Copy link
Contributor Author

@davidwendt davidwendt Sep 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just that it has to be here (as like a tag) even though we don't actually use it anywhere in the class.
It is not needed (and may not work) for the output_normalizer so I think it is more correct to put it here.

cpp/include/cudf/detail/normalizing_iterator.cuh Outdated Show resolved Hide resolved
@davidwendt davidwendt requested a review from bdice September 18, 2023 15:01
Copy link
Contributor

@bdice bdice left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for helping me think through my previous review comments. This looks good to me.

* @brief Indirection operator returns this iterator instance in order
* to capture the `operator=(size_type)` calls.
*/
__device__ inline output_normalator const& operator*() const { return *this; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fairly certain I haven't understood this part. By returning *this, then for any output_normalator p,
p == *p == **p?

Copy link
Contributor

@mythrocks mythrocks Sep 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would have thought we'd have to support semantics like:

output_normalator p = ...;
*p     = int16_t{5};
*(p+1) = int32_t{10};
p[2]   = int64_t{15};

I haven't understood how current implementation for operator*() and operator[]() work.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot return a reference to the underlying array element which is what you would normally see.
From what I remember, this is a way to capture that request *itr = value and redirect it to the operator=(Integer) function below. It's a subtle thing for sure.

Copy link
Contributor Author

@davidwendt davidwendt Sep 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The output iterator expects a specific type but can store it as the underlying type.
The operator[]() works normally in that it handles p[2] = value.
I've tried to explain the operator*() and operator= relationship in a few separate comments now.
These two handle the *p = value and *(p+1) = value from your example.
Though only one type Integer would usually be accepted on the right-hand side (discounting automatic casting).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, maybe I see it now?

*p = 5; is evaluated as:

  1. output_normalator::operator *(), which returns p.
  2. p = 5 calls output_normalator::operator =(Integer).

Do I have this right?

Were I to nitpick, I would have preferred it if *p returned an integer_reference object, and integer_reference::operator=() did the assignment. As it stands, doesn't the user run the risk of assigning integers to the normalator directly, instead of to *normalator?

That said, yours has the benefit of brevity.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That reference idea sounds brilliant to me! I think that’s much clearer in intent.

Copy link
Contributor Author

@davidwendt davidwendt Sep 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there is more to it than what you have described.

*p = 5; is evaluated as:

  1. output_normalator::operator*(), which returns reference to p.
  2. &p = 5 calls output_normalator::operator =(Integer).

I'm not seeing what the integer_reference object would be defined to be.
The operator must resolve to a member function on the output_normalizer object so we can intercept the assignment and place it correctly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel this discussion is getting beyond the intention of the PR.
This PR did not introduce any of iterator mechanisms discussed here but is only moving (and renaming) the code around so it can be reused for future iterators.
I'd be happy to open an issue or separate PR to explore alternate implementations for this.

Copy link
Contributor

@bdice bdice Sep 20, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, let's do this in a later PR. I'd propose the following. If you agree, opening an issue for this would be good.

  • output_normalator::reference is a member type defined as a class integer_reference
  • integer_reference requirements:
    • holds some state to know what it's referencing (perhaps the output normalator "begin" iterator and the index which it references?)
    • defines integer_reference::operator=(Integer) which normalizes and writes to the referenced location
    • possibly defines other operators that modify the value in-place (as-if by reference)
  • output_normalator::operator* is changed to return an integer_reference -- this aligns with various STL container iterator designs, where the dereference operator returns a value of the member type reference
  • output_normalator::operator=(Integer) is no longer defined -- because assignment of a value to a (non-dereferenced) iterator is an unclear design that we can avoid by using integer_reference

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

let's do this in a later PR

Yes, sir. That works.

Copy link
Contributor

@mythrocks mythrocks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Barring a minor suggestion regarding how to deal with output_normalator assignments, this looks good to ship. Thanks, David!

@mythrocks
Copy link
Contributor

Sorry if I muddied the waters here with my last comment. Let's not hold this PR up for the integer_reference suggestion. We can tackle that later.

@davidwendt
Copy link
Contributor Author

/merge

@rapids-bot rapids-bot bot merged commit c7dd6b4 into rapidsai:branch-23.10 Sep 22, 2023
54 checks passed
@davidwendt davidwendt deleted the normalizing-iterator branch September 22, 2023 18:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
3 - Ready for Review Ready for review by team improvement Improvement / enhancement to an existing function libcudf Affects libcudf (C++/CUDA) code. non-breaking Non-breaking change
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants