Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support interfaces for Union logical type in DuckDB::LogicalType #893

Merged
merged 4 commits into from
Feb 28, 2025

Conversation

otegami
Copy link
Contributor

@otegami otegami commented Feb 24, 2025

refs: GH-690

In this PR, we support interfaces for Union logical type in DuckDB::Column#logical_type about the following.

This is one of the steps for supporting duckdb_logical_column_type C API.

Summary by CodeRabbit

  • New Features
    • Introduced enhanced union-type support, allowing users to retrieve union member counts, names, and types via new interface methods.
    • Expanded SQL schema capabilities with the addition of a union column, enabling more flexible, robust data modeling.
  • Tests
    • Added unit tests for verifying union member count, names, and types to ensure expected behavior of the new union type functionality.

Copy link

coderabbitai bot commented Feb 24, 2025

Walkthrough

This pull request introduces the DuckDB::LogicalType class with three new public methods: member_count, member_name_at, and member_type_at. The implementations are provided in the C extension (ext/duckdb/logical_type.c) and include error handling for cases where union member information is unavailable. The CHANGELOG.md has been updated to document these additions, and the unit tests in test/duckdb_test/logical_type_test.rb now include a new union type column alongside tests verifying the functionality of the new methods.

Changes

File Changes Summary
CHANGELOG.md Added entry for the new DuckDB::LogicalType class and its methods: member_count, member_name_at, member_type_at; previous entry regarding dropping duckdb v1.0.0 remains unchanged.
ext/duckdb/…/logical_type.c Implemented three methods (duckdb_logical_type_member_count, duckdb_logical_type_member_name_at, duckdb_logical_type_member_type_at) for union type handling with error checks and Ruby method registration.
test/duckdb_test/…/logical_type_test.rb Extended the SQL table schema to include a union_col with a union type and added three test methods (test_union_member_count, test_union_member_name_at, test_union_member_type_at) to validate functionality.

Possibly related PRs

Suggested reviewers

  • suketa

Poem

Hopping through lines with a curious gleam,
I crafted union types like a wonderful dream.
Each new method a hop in the code’s vast land,
Logical leaps built strong by my gentle hand.
Carrots and code, a fun blend under moonlight,
My rabbit heart celebrates change with pure delight! 🐇

✨ Finishing Touches
  • 📝 Generate Docstrings (Beta)

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate docstrings to generate docstrings for this PR. (Beta)
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

assert_equal(2, union_column.logical_type.member_count)
end

def test_union_member_name_at
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am wondering if it might be helpful to provide an iterator-like method for union members. This could make it more convenient for users to iterate through member names. If you think this would be a good addition, I'm planning to address it in a follow-up PR.

Here's an example of how it could look:

diff --git a/lib/duckdb/logical_type.rb b/lib/duckdb/logical_type.rb
index fb38b67..bc77b36 100644
--- a/lib/duckdb/logical_type.rb
+++ b/lib/duckdb/logical_type.rb
@@ -18,5 +18,31 @@ def type
       type_id = _type
       DuckDB::Converter::IntToSym.type_to_sym(type_id)
     end
+
+    # Iterates over each union member name associated with a Union column.
+    #
+    # When a block is provided, this method yields each union member name in order.
+    # It also returns the total number of members yielded.
+    #
+    #   union_logical_type.each_member_name do |name|
+    #     puts "Union member: #{name}"
+    #   end
+    #
+    # If no block is given, an Enumerator is returned, which can be used to
+    # retrieve all member names.
+    #
+    #   names = union_logical_type.each_member_name.to_a
+    #   # => ["member1", "member2"]
+    #
+    # @return [Enumerator, Integer]
+    #   Returns an Enumerator if no block is provided.
+    #   Returns an Integer (the member count) when a block is provided.
+    def each_member_name
+      return to_enum(__method__) {member_count} unless block_given?
+
+      member_count.times do |member_idx|
+        yield member_name_at(member_idx)
+      end
+    end
   end
 end

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you create another PR?

@otegami otegami marked this pull request as ready for review February 24, 2025 03:07
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (4)
test/duckdb_test/logical_type_test.rb (1)

72-72: Consider adding test cases for both union member types.

The current test only inserts an INTEGER value. Consider adding test cases that insert VARCHAR values as well to ensure both union member types work correctly.

-        1::INTEGER,
+        1::INTEGER,  -- Test case for 'num' member
+        'test'::VARCHAR,  -- Test case for 'str' member
ext/duckdb/logical_type.c (3)

164-175: Fix documentation typo in call-seq example.

The call-seq example incorrectly shows value_type instead of member_count.

- *    member_col.logical_type.value_type -> Integer
+ *    member_col.logical_type.member_count -> Integer

184-199: Improve error message for member name retrieval failure.

The current error message could be more descriptive to help with debugging.

-        rb_raise(eDuckDBError, "fail to get name of %llu member", (unsigned long long)idx);
+        rb_raise(eDuckDBError, "Failed to retrieve name for union member at index %llu", (unsigned long long)idx);

209-224: Maintain consistent error message style.

The error message style should be consistent with the suggested improvement for member_name_at.

-                "Failed to get the union member type at index %llu",
+                "Failed to retrieve type for union member at index %llu",
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 76b1011 and 7450bf8.

📒 Files selected for processing (3)
  • CHANGELOG.md (1 hunks)
  • ext/duckdb/logical_type.c (3 hunks)
  • test/duckdb_test/logical_type_test.rb (4 hunks)
🔇 Additional comments (6)
test/duckdb_test/logical_type_test.rb (3)

39-39: LGTM!

The union column is correctly defined with appropriate member types and follows consistent naming conventions.


105-105: LGTM!

The union type is correctly added to the expected types array, maintaining consistency with the table schema.


178-200: Consider adding an iterator-like method for union members.

The tests effectively verify the basic functionality of union type methods. However, as suggested in past review comments, adding an iterator-like method would improve the API's usability.

Example implementation:

def test_union_member_iterator
  union_column = @columns.find { |column| column.type == :union }
  union_logical_type = union_column.logical_type
  
  # Test the iterator without a block
  enum = union_logical_type.each_member_name
  assert_instance_of(Enumerator, enum)
  assert_equal(["num", "str"], enum.to_a)
  
  # Test the iterator with a block
  names = []
  count = union_logical_type.each_member_name { |name| names << name }
  assert_equal(2, count)
  assert_equal(["num", "str"], names)
end
ext/duckdb/logical_type.c (2)

15-17: LGTM!

The function declarations follow the established pattern and maintain consistent naming conventions.


252-254: LGTM!

The method registration follows the established pattern and correctly exposes the new functionality.

CHANGELOG.md (1)

6-8: LGTM!

The changelog entries correctly document the new functionality and follow the established format. The "under construction" note appropriately indicates that more methods may be added in the future.

@otegami otegami force-pushed the support-union-in-logical-type branch from 7450bf8 to f7e6043 Compare February 24, 2025 03:18
Copy link

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
test/duckdb_test/logical_type_test.rb (1)

71-72: Consider adding test cases for both union member types.

Currently, the test only inserts an INTEGER value. Consider adding test cases that cover both INTEGER and VARCHAR members to ensure comprehensive testing of the union type functionality.

-        1::INTEGER,
+        1::INTEGER,  -- Test case for INTEGER member
+        'test'::VARCHAR,  -- Test case for VARCHAR member
ext/duckdb/logical_type.c (1)

164-170: Improve documentation for member_count method.

The documentation should clarify that this method is specifically for union types and mention its behavior for non-union types.

 /*
  *  call-seq:
  *    member_col.logical_type.member_count -> Integer
  *
- *  Returns the member count of union type, otherwise 0.
+ *  Returns the number of members in a union type. For non-union types,
+ *  returns 0. This method is specifically designed for working with
+ *  union types and helps determine how many members are available
+ *  for access using member_name_at and member_type_at methods.
  *
  */
📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 7450bf8 and f7e6043.

📒 Files selected for processing (3)
  • CHANGELOG.md (1 hunks)
  • ext/duckdb/logical_type.c (3 hunks)
  • test/duckdb_test/logical_type_test.rb (4 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • CHANGELOG.md
⏰ Context from checks skipped due to timeout of 90000ms (19)
  • GitHub Check: test (head, 1.1.1)
  • GitHub Check: test (head, 1.1.3)
  • GitHub Check: test (head, 1.2.0)
  • GitHub Check: test (mswin, 1.1.1)
  • GitHub Check: test (mingw, 1.1.1)
  • GitHub Check: test (mingw, 1.1.3)
  • GitHub Check: test (mingw, 1.2.0)
  • GitHub Check: test (3.4.1, 1.1.1)
  • GitHub Check: test (3.4.1, 1.1.3)
  • GitHub Check: test (3.4.1, 1.2.0)
  • GitHub Check: test (3.3.6, 1.1.1)
  • GitHub Check: test (3.3.6, 1.1.3)
  • GitHub Check: test (3.3.6, 1.2.0)
  • GitHub Check: test (3.2.6, 1.1.1)
  • GitHub Check: test (3.2.6, 1.1.3)
  • GitHub Check: test (3.2.6, 1.2.0)
  • GitHub Check: test (3.1.6, 1.1.1)
  • GitHub Check: test (3.1.6, 1.1.3)
  • GitHub Check: test (3.1.6, 1.2.0)
🔇 Additional comments (1)
test/duckdb_test/logical_type_test.rb (1)

178-200: Consider adding an iterator for union members.

As suggested in the previous review, an iterator-like method would make it more convenient for users to iterate through member names. The current implementation requires manual iteration using indices.

The implementation could be added to lib/duckdb/logical_type.rb as previously proposed:

def each_member_name
  return to_enum(__method__) {member_count} unless block_given?
  
  member_count.times do |member_idx|
    yield member_name_at(member_idx)
  end
end

Comment on lines +184 to +199
static VALUE duckdb_logical_type_member_name_at(VALUE self, VALUE midx) {
rubyDuckDBLogicalType *ctx;
VALUE mname;
const char *member_name;
idx_t idx = NUM2ULL(midx);

TypedData_Get_Struct(self, rubyDuckDBLogicalType, &logical_type_data_type, ctx);

member_name = duckdb_union_type_member_name(ctx->logical_type, idx);
if (member_name == NULL) {
rb_raise(eDuckDBError, "fail to get name of %llu member", (unsigned long long)idx);
}
mname = rb_str_new_cstr(member_name);
duckdb_free((void *)member_name);
return mname;
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add index validation to prevent out-of-bounds access.

The function should validate that the index is within bounds before accessing the member name.

 static VALUE duckdb_logical_type_member_name_at(VALUE self, VALUE midx) {
     rubyDuckDBLogicalType *ctx;
     VALUE mname;
     const char *member_name;
     idx_t idx = NUM2ULL(midx);
 
     TypedData_Get_Struct(self, rubyDuckDBLogicalType, &logical_type_data_type, ctx);
 
+    idx_t count = duckdb_union_type_member_count(ctx->logical_type);
+    if (idx >= count) {
+        rb_raise(eDuckDBError, "index %llu out of bounds (member count: %llu)",
+                (unsigned long long)idx, (unsigned long long)count);
+    }
+
     member_name = duckdb_union_type_member_name(ctx->logical_type, idx);
     if (member_name == NULL) {
         rb_raise(eDuckDBError, "fail to get name of %llu member", (unsigned long long)idx);
     }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
static VALUE duckdb_logical_type_member_name_at(VALUE self, VALUE midx) {
rubyDuckDBLogicalType *ctx;
VALUE mname;
const char *member_name;
idx_t idx = NUM2ULL(midx);
TypedData_Get_Struct(self, rubyDuckDBLogicalType, &logical_type_data_type, ctx);
member_name = duckdb_union_type_member_name(ctx->logical_type, idx);
if (member_name == NULL) {
rb_raise(eDuckDBError, "fail to get name of %llu member", (unsigned long long)idx);
}
mname = rb_str_new_cstr(member_name);
duckdb_free((void *)member_name);
return mname;
}
static VALUE duckdb_logical_type_member_name_at(VALUE self, VALUE midx) {
rubyDuckDBLogicalType *ctx;
VALUE mname;
const char *member_name;
idx_t idx = NUM2ULL(midx);
TypedData_Get_Struct(self, rubyDuckDBLogicalType, &logical_type_data_type, ctx);
idx_t count = duckdb_union_type_member_count(ctx->logical_type);
if (idx >= count) {
rb_raise(eDuckDBError, "index %llu out of bounds (member count: %llu)",
(unsigned long long)idx, (unsigned long long)count);
}
member_name = duckdb_union_type_member_name(ctx->logical_type, idx);
if (member_name == NULL) {
rb_raise(eDuckDBError, "fail to get name of %llu member", (unsigned long long)idx);
}
mname = rb_str_new_cstr(member_name);
duckdb_free((void *)member_name);
return mname;
}

Comment on lines +209 to +224
static VALUE duckdb_logical_type_member_type_at(VALUE self, VALUE midx) {
rubyDuckDBLogicalType *ctx;
duckdb_logical_type union_member_type;
idx_t idx = NUM2ULL(midx);

TypedData_Get_Struct(self, rubyDuckDBLogicalType, &logical_type_data_type, ctx);

union_member_type = duckdb_union_type_member_type(ctx->logical_type, idx);
if (union_member_type == NULL) {
rb_raise(eDuckDBError,
"Failed to get the union member type at index %llu",
(unsigned long long)idx);
}

return rbduckdb_create_logical_type(union_member_type);
}
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Add index validation to prevent out-of-bounds access.

Similar to member_name_at, this function should validate that the index is within bounds before accessing the member type.

 static VALUE duckdb_logical_type_member_type_at(VALUE self, VALUE midx) {
     rubyDuckDBLogicalType *ctx;
     duckdb_logical_type union_member_type;
     idx_t idx = NUM2ULL(midx);
 
     TypedData_Get_Struct(self, rubyDuckDBLogicalType, &logical_type_data_type, ctx);
 
+    idx_t count = duckdb_union_type_member_count(ctx->logical_type);
+    if (idx >= count) {
+        rb_raise(eDuckDBError, "index %llu out of bounds (member count: %llu)",
+                (unsigned long long)idx, (unsigned long long)count);
+    }
+
     union_member_type = duckdb_union_type_member_type(ctx->logical_type, idx);
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
static VALUE duckdb_logical_type_member_type_at(VALUE self, VALUE midx) {
rubyDuckDBLogicalType *ctx;
duckdb_logical_type union_member_type;
idx_t idx = NUM2ULL(midx);
TypedData_Get_Struct(self, rubyDuckDBLogicalType, &logical_type_data_type, ctx);
union_member_type = duckdb_union_type_member_type(ctx->logical_type, idx);
if (union_member_type == NULL) {
rb_raise(eDuckDBError,
"Failed to get the union member type at index %llu",
(unsigned long long)idx);
}
return rbduckdb_create_logical_type(union_member_type);
}
static VALUE duckdb_logical_type_member_type_at(VALUE self, VALUE midx) {
rubyDuckDBLogicalType *ctx;
duckdb_logical_type union_member_type;
idx_t idx = NUM2ULL(midx);
TypedData_Get_Struct(self, rubyDuckDBLogicalType, &logical_type_data_type, ctx);
idx_t count = duckdb_union_type_member_count(ctx->logical_type);
if (idx >= count) {
rb_raise(eDuckDBError, "index %llu out of bounds (member count: %llu)",
(unsigned long long)idx, (unsigned long long)count);
}
union_member_type = duckdb_union_type_member_type(ctx->logical_type, idx);
if (union_member_type == NULL) {
rb_raise(eDuckDBError,
"Failed to get the union member type at index %llu",
(unsigned long long)idx);
}
return rbduckdb_create_logical_type(union_member_type);
}

Copy link
Owner

@suketa suketa left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you. LGTM

@suketa suketa merged commit cd96002 into suketa:main Feb 28, 2025
58 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants