Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DO NOT MERGE] cudf-polars chunked parquet reader #16789

Commits on Jul 29, 2024

  1. Configuration menu
    Copy the full SHA
    7742b8b View commit details
    Browse the repository at this point in the history
  2. Require polars >= 1.3

    wence- committed Jul 29, 2024
    Configuration menu
    Copy the full SHA
    ef0b49f View commit details
    Browse the repository at this point in the history
  3. Adapt to IR changes

    wence- committed Jul 29, 2024
    Configuration menu
    Copy the full SHA
    e9fd96d View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    9d69621 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    918a40e View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    f8f2d0d View commit details
    Browse the repository at this point in the history
  7. Update overview docs

    wence- committed Jul 29, 2024
    Configuration menu
    Copy the full SHA
    bcedb6b View commit details
    Browse the repository at this point in the history
  8. Support right join

    wence- committed Jul 29, 2024
    Configuration menu
    Copy the full SHA
    6f2d406 View commit details
    Browse the repository at this point in the history
  9. Configuration menu
    Copy the full SHA
    f3bbd3f View commit details
    Browse the repository at this point in the history
  10. Configuration menu
    Copy the full SHA
    1d4c30c View commit details
    Browse the repository at this point in the history

Commits on Jul 30, 2024

  1. Configuration menu
    Copy the full SHA
    abcf22b View commit details
    Browse the repository at this point in the history

Commits on Aug 2, 2024

  1. Merge pull request rapidsai#16347 from wence-/wence/fea/polars-engine…

    …-config
    
    Use new polars engine config object in cudf-polars callback
    wence- authored Aug 2, 2024
    Configuration menu
    Copy the full SHA
    62a5dbd View commit details
    Browse the repository at this point in the history

Commits on Aug 5, 2024

  1. Adapt to IR changes in polars 1.4 (rapidsai#16494)

    ## Description
    <!-- Provide a standalone description of changes in this PR. -->
    <!-- Reference any issues closed by this PR with "closes rapidsai#1234". -->
    <!-- Note: The pull request title will be included in the CHANGELOG. -->
    
    Adapts to IR changes in polars 1.4 and handles nrows/skiprows a little
    more correctly.
    
    ## Checklist
    - [ ] I am familiar with the [Contributing
    Guidelines](https://github.com/rapidsai/cudf/blob/HEAD/CONTRIBUTING.md).
    - [ ] New or existing tests cover these changes.
    - [ ] The documentation is up to date with these changes.
    
    ---------
    
    Co-authored-by: Lawrence Mitchell <[email protected]>
    lithomas1 and wence- authored Aug 5, 2024
    Configuration menu
    Copy the full SHA
    7d0c7ad View commit details
    Browse the repository at this point in the history

Commits on Aug 6, 2024

  1. Implement polars string Replace and ReplaceMany (rapidsai#16039)

    Add support for ``pl.col.str.replace`` and ``pl.col.str.replace_many``
    
    Authors:
      - Thomas Li (https://github.com/lithomas1)
    
    Approvers: None
    
    URL: rapidsai#16039
    lithomas1 authored Aug 6, 2024
    Configuration menu
    Copy the full SHA
    5de29b3 View commit details
    Browse the repository at this point in the history

Commits on Aug 19, 2024

  1. Configuration menu
    Copy the full SHA
    7f6b00f View commit details
    Browse the repository at this point in the history

Commits on Aug 20, 2024

  1. Configuration menu
    Copy the full SHA
    822e7d0 View commit details
    Browse the repository at this point in the history
  2. Implement scan-based whole-frame aggregations for cudf-polars (rapids…

    …ai#16509)
    
    contributes to rapidsai#16478
    
    This implements "cum_min", "cum_max", "cum_prod", "cum_sum"
    
    "cum_count" is not implemented for now, since there's no exact libcudf match (I imagine the non-grouped case is also not used that much but haven't checked).
    I suppose we could implement it by creating a column of 1s and copying the null mask over, and doing a cum_sum on that.
    Let me know if you want to try that.
    
    Authors:
      - Thomas Li (https://github.com/lithomas1)
    
    Approvers:
      - https://github.com/brandon-b-miller
    
    URL: rapidsai#16509
    lithomas1 authored Aug 20, 2024
    Configuration menu
    Copy the full SHA
    152111b View commit details
    Browse the repository at this point in the history

Commits on Aug 21, 2024

  1. Merge pull request rapidsai#16599 from wence/fix/remove-placeholder-c…

    …olumn
    
    Use a key column rather than a placeholder for count agg
    wence- authored Aug 21, 2024
    Configuration menu
    Copy the full SHA
    13a1493 View commit details
    Browse the repository at this point in the history

Commits on Aug 22, 2024

  1. Configuration menu
    Copy the full SHA
    7cf3289 View commit details
    Browse the repository at this point in the history

Commits on Aug 26, 2024

  1. Configuration menu
    Copy the full SHA
    f6c938f View commit details
    Browse the repository at this point in the history
  2. use std::ptrdiff_t

    davidwendt authored and vyasr committed Aug 26, 2024
    Configuration menu
    Copy the full SHA
    4ded370 View commit details
    Browse the repository at this point in the history

Commits on Aug 27, 2024

  1. Correctly export empty column names in DataFrame.to_polars (rapidsai#…

    …16596)
    
    polars.from_arrow renames empty column names (see
    pola-rs/polars#11632). This causes problems
    when round-tripping specially crafted dataframes. Avoid the problem by
    constructing the table with fake names and then renaming.
    wence- authored Aug 27, 2024
    Configuration menu
    Copy the full SHA
    edabb67 View commit details
    Browse the repository at this point in the history
  2. Forward-merge 24.08

    wence- committed Aug 27, 2024
    Configuration menu
    Copy the full SHA
    a4c35e9 View commit details
    Browse the repository at this point in the history
  3. Add more cudf-polars unaryops (rapidsai#16579)

    Add support for additional unaryops through `cudf-polars`. 
    
    Closes rapidsai#16566
    
    ---------
    
    Co-authored-by: Lawrence Mitchell <[email protected]>
    brandon-b-miller and wence- authored Aug 27, 2024
    Configuration menu
    Copy the full SHA
    0a95b2c View commit details
    Browse the repository at this point in the history
  4. Merge pull request rapidsai#16667 from wence-/wence/merge-2408

    Forward-merge 24.08
    wence- authored Aug 27, 2024
    Configuration menu
    Copy the full SHA
    cc892fc View commit details
    Browse the repository at this point in the history
  5. Add pylibcudf/cudf-polars string strip (rapidsai#16504)

    Add support for string `strip` in `pylibcudf` and `cudf-polars`.
    
    ---------
    
    Co-authored-by: Lawrence Mitchell <[email protected]>
    brandon-b-miller and wence- authored Aug 27, 2024
    Configuration menu
    Copy the full SHA
    41a3a95 View commit details
    Browse the repository at this point in the history

Commits on Aug 28, 2024

  1. cudf-polars/pylibcudf string -> date parsing (rapidsai#16306)

    This PR adds datetime/timestamp parsing from string columns in pylibcudf
    and cudf-polars.
    
    Closes rapidsai#16174
    brandon-b-miller authored Aug 28, 2024
    Configuration menu
    Copy the full SHA
    0bf68d4 View commit details
    Browse the repository at this point in the history

Commits on Aug 29, 2024

  1. Support quantile in cudf_polars (rapidsai#16093)

    Support `pl.Expr.quantile` in cudf-polars.
    
    ---------
    
    Co-authored-by: Vyas Ramasubramani <[email protected]>
    lithomas1 and vyasr authored Aug 29, 2024
    Configuration menu
    Copy the full SHA
    40d33cb View commit details
    Browse the repository at this point in the history

Commits on Aug 30, 2024

  1. Implement handlers for first/last in groupby (rapidsai#16688)

    Since the full-frame `Agg` handler for first and last doesn't construct
    a request (because we can do it without a `from_scalar` call), we didn't
    handle these in a groupby context. Fortunately it is easy to add.
    wence- authored Aug 30, 2024
    Configuration menu
    Copy the full SHA
    95da2c5 View commit details
    Browse the repository at this point in the history
  2. Ensure IR validation always checks for empty columns

    We were previously not calling the superclass __post_init__ in custom
    validations of IR nodes. This meant that we would sometimes fail to
    raise when the schema contained an EMPTY column.
    
    Since we can't really compute with these types, we just fall back.
    wence- committed Aug 30, 2024
    Configuration menu
    Copy the full SHA
    434afab View commit details
    Browse the repository at this point in the history
  3. Need to check for nulls in nested dtypes

    To do this we need to inspect the polars datatypes, since by the time
    we've converted to the pylibcudf one, the nested element types have
    been lost.
    
    We don't do this eagerly during dtype conversion because we still want
    to allow scalar literals with null dtype that will then be cast to a
    non-null dtype.
    wence- committed Aug 30, 2024
    Configuration menu
    Copy the full SHA
    385ae98 View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    1cf1146 View commit details
    Browse the repository at this point in the history
  5. Move creation of regex program to initialisation

    This way if we don't support any features of the pattern, we correctly
    fall back to CPU.
    wence- committed Aug 30, 2024
    Configuration menu
    Copy the full SHA
    de445a3 View commit details
    Browse the repository at this point in the history
  6. Merge pull request rapidsai#16703 from wence-/wence/fea/polars-reject…

    …-invalid-regex
    
    Move creation of regex program to initialisation
    wence- authored Aug 30, 2024
    Configuration menu
    Copy the full SHA
    f39713e View commit details
    Browse the repository at this point in the history
  7. Configuration menu
    Copy the full SHA
    ad364c6 View commit details
    Browse the repository at this point in the history

Commits on Sep 2, 2024

  1. Merge pull request rapidsai#16702 from wence-/wence/fea/polars-no-emp…

    …ty-columns
    
    Disallow producing dataframes with Empty columns
    wence- authored Sep 2, 2024
    Configuration menu
    Copy the full SHA
    d158b22 View commit details
    Browse the repository at this point in the history

Commits on Sep 3, 2024

  1. Partially reject dynamic groupby (rapidsai#16720)

    We are not yet exposing the actual information that the groupby is dynamic, but this catches a bunch of cases.
    
    Authors:
      - Lawrence Mitchell (https://github.com/wence-)
    
    Approvers:
      - Vyas Ramasubramani (https://github.com/vyasr)
    
    URL: rapidsai#16720
    wence- authored Sep 3, 2024
    Configuration menu
    Copy the full SHA
    b550645 View commit details
    Browse the repository at this point in the history

Commits on Sep 4, 2024

  1. Implement Kleene logic handling for Any/All and bitwise Or/And (rapid…

    …sai#16476)
    
    We previously didn't support this case correctly, but it's not too bad.
    
    This would be much easier if we could do it in libcudf, hence:
    rapidsai#16475
    wence- authored Sep 4, 2024
    Configuration menu
    Copy the full SHA
    eb2a23e View commit details
    Browse the repository at this point in the history
  2. Some fixes for unary functions (rapidsai#16719)

    Correctly handle `pow` and `log` by translating to binary expressions
    when we observe the node.
    
    Upgrade our minimum supported polars version (so that we see all these
    function names from the rust IR).
    
    Also tighten check for which groupby-aggs are supported when the
    expression contains a unary function.
    wence- authored Sep 4, 2024
    Configuration menu
    Copy the full SHA
    ebc3bbe View commit details
    Browse the repository at this point in the history
  3. Implement unpivot in cudf-polars (rapidsai#16689)

    Add support for unpivoting a DataFrame. We raise for cases where the
    concatenation of the value columns produces a cast that is not supported
    by standard fixed-width unary casting.
    wence- authored Sep 4, 2024
    Configuration menu
    Copy the full SHA
    5d262df View commit details
    Browse the repository at this point in the history
  4. Small scan-handler fixes (rapidsai#16721)

    Reject two more edge cases that we do not support.
    
    We could easily support the case where the parquet read just needs to
    read the metadata, but it is low priority, so have not done so here.
    wence- authored Sep 4, 2024
    Configuration menu
    Copy the full SHA
    c76e90b View commit details
    Browse the repository at this point in the history

Commits on Sep 5, 2024

  1. Implement cudf-polars datetime extraction methods (rapidsai#16500)

    ---------
    
    Co-authored-by: brandon-b-miller <[email protected]>
    Co-authored-by: Bradley Dice <[email protected]>
    Co-authored-by: Lawrence Mitchell <[email protected]>
    4 people authored Sep 5, 2024
    Configuration menu
    Copy the full SHA
    ccb8061 View commit details
    Browse the repository at this point in the history

Commits on Sep 6, 2024

  1. Polars 1.7 will change a minor thing in the IR, adapt to that (rapids…

    …ai#16755)
    
    This field renaming was due to a recent refactor in (as-yet-unreleased)
    polars 1.7.
    wence- authored Sep 6, 2024
    Configuration menu
    Copy the full SHA
    feb2e63 View commit details
    Browse the repository at this point in the history
  2. Run polars test suite (defaulting to GPU) in CI (rapidsai#16710)

    ## Description
    
    We implement a small pytest plugin that defaults the polars engine to
    GPU (by monkeypatching `LazyFrame.collect`, yet another reason to have a
    global default somehow).
    
    As well as this, we collate all the known failures and classify them.
    
    
    ## Checklist
    - [x] I am familiar with the [Contributing
    Guidelines](https://github.com/rapidsai/cudf/blob/HEAD/CONTRIBUTING.md).
    - [x] New or existing tests cover these changes.
    - [x] The documentation is up to date with these changes.
    wence- authored Sep 6, 2024
    Configuration menu
    Copy the full SHA
    6d2e455 View commit details
    Browse the repository at this point in the history

Commits on Sep 10, 2024

  1. Configuration menu
    Copy the full SHA
    24f9516 View commit details
    Browse the repository at this point in the history

Commits on Sep 16, 2024

  1. Configuration menu
    Copy the full SHA
    4bbbdc2 View commit details
    Browse the repository at this point in the history