Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GH-39301: [Archery][CI][Integration] Add nanoarrow to archery + integration setup #39302

Merged
merged 26 commits into from
May 10, 2024

Conversation

paleolimbot
Copy link
Member

@paleolimbot paleolimbot commented Dec 19, 2023

Rationale for this change

The ability to add integration testing was added in nanoarrow however, the infrastructure for running these tests currently lives in the arrow monorepo.

What changes are included in this PR?

  • Added the relevant code to Archery such that these tests can be run
  • Added the relevant scripts/environment variables to CI such that these tests run in the integration CI job

Are these changes tested?

Yes, via the "Integration" CI job.

Are there any user-facing changes?

No.

This PR still needs #41264 for the integration tests to pass.

Copy link

Thanks for opening a pull request!

If this is not a minor PR. Could you open an issue for this pull request on GitHub? https://github.com/apache/arrow/issues/new/choose

Opening GitHub issues ahead of time contributes to the Openness of the Apache Arrow project.

Then could you also rename the pull request title in the following format?

GH-${GITHUB_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

or

MINOR: [${COMPONENT}] ${SUMMARY}

In the case of PARQUET issues on JIRA the title also supports:

PARQUET-${JIRA_ISSUE_ID}: [${COMPONENT}] ${SUMMARY}

See also:

@github-actions github-actions bot added the awaiting committer review Awaiting committer review label Dec 19, 2023
@paleolimbot paleolimbot changed the title add nanoarrow to archery + integration setup GH-39301: [Archery][CI][Integration] Add nanoarrow to archery + integration setup Dec 19, 2023
Copy link

⚠️ GitHub issue #39301 has been automatically assigned in GitHub to PR creator.

paleolimbot added a commit to apache/arrow-nanoarrow that referenced this pull request Jan 9, 2024
This PR implements dictionary support in the integration test utility
and fixes a few problems identified with integration testing to ensure
that it actually works end-to-end (via
apache/arrow#39302 ). The changes are:

- Batches that contain dictionaries can now be read, written, and
validated using integration testing JSON
- Fixed an issue in the integration test library (anything other than
the first batch previously segfaulted)
- Improved const correctness of nanoarrow.hpp (because dictionaries
required a `std::unordered_map<>` with a `UniqueSchema` and a few const
overloads were missing)
- Fixed the nullability of the top-level batch to match Arrow C++ output
- Fixed the null count of exported arrays (previously they were all
exported as having zero nulls)

It can now be tested with `archery` (after checking out
apache/arrow#39302 ):

```
export ARROW_CPP_EXE_PATH=/Users/deweydunnington/.r-arrow-dev-build/build/debug
export ARROW_NANOARROW_PATH=/path/to/arrow-nanoarrow/build
archery integration --with-cpp=true --with-nanoarrow=true --run-c-data
``` 

The current failures are limited to the remaining unimplemented types
(datetime types and decimal).

And for future me or anybody who has to/wants to launch a debugger with
a segfaulting integration test in VSCode, it can be done with this
launch.json:

```
{
    "type": "lldb",
    "request": "launch",
    "name": "Debug Integration Tests",
    "program": "${workspaceFolder}/.venv/bin/python",
    "args": ["-m", "archery.cli", "integration", "--with-cpp=true",  "--with-nanoarrow=true", "--run-c-data"],
    "cwd": "${workspaceFolder}",
    "env": {
        "ARROW_CPP_EXE_PATH": "/Users/deweydunnington/.r-arrow-dev-build/build/debug",
        "ARROW_NANOARROW_PATH": "${workspaceFolder}/out/build/user-local"
    }
}
```
@paleolimbot paleolimbot force-pushed the archery-nanoarrow-integration branch from 3e059f7 to 0165301 Compare January 10, 2024 19:07
paleolimbot added a commit to apache/arrow-nanoarrow that referenced this pull request Jan 12, 2024
This PR adds support for date, time, timestamp, duration, and interval
to the integration tester. This also seems to pass the archery tests for
C++ (with a checkout of https://github.com/apache/arrow/pull/39302 ):

```bash
archery integration --with-cpp=true --with-nanoarrow=true --run-c-data
```

<details>

```
##########################################################
C Data Interface: C++ exporting, C++ importing
##########################################################
======================================================================
Testing C ArrowSchema from file 'primitive_no_batches'
======================================================================
======================================================================
Testing C ArrowSchema from file 'primitive'
======================================================================
======================================================================
Testing C ArrowSchema from file 'primitive_zerolength'
======================================================================
======================================================================
Testing C ArrowSchema from file 'primitive_large_offsets'
======================================================================
======================================================================
Testing C ArrowSchema from file 'null'
======================================================================
======================================================================
Testing C ArrowSchema from file 'null_trivial'
======================================================================
======================================================================
Testing C ArrowSchema from file 'decimal'
======================================================================
======================================================================
Testing C ArrowSchema from file 'decimal256'
======================================================================
======================================================================
Testing C ArrowSchema from file 'datetime'
======================================================================
======================================================================
Testing C ArrowSchema from file 'duration'
======================================================================
======================================================================
Testing C ArrowSchema from file 'interval'
======================================================================
======================================================================
Testing C ArrowSchema from file 'interval_mdn'
======================================================================
======================================================================
Testing C ArrowSchema from file 'map'
======================================================================
======================================================================
Testing C ArrowSchema from file 'map_non_canonical'
-- Skipping test because producer C++ does not support C ArrowSchema
======================================================================
======================================================================
Testing C ArrowSchema from file 'nested'
======================================================================
======================================================================
Testing C ArrowSchema from file 'recursive_nested'
======================================================================
======================================================================
Testing C ArrowSchema from file 'nested_large_offsets'
======================================================================
======================================================================
Testing C ArrowSchema from file 'union'
======================================================================
======================================================================
Testing C ArrowSchema from file 'custom_metadata'
======================================================================
======================================================================
Testing C ArrowSchema from file 'duplicate_fieldnames'
======================================================================
======================================================================
Testing C ArrowSchema from file 'dictionary'
======================================================================
======================================================================
Testing C ArrowSchema from file 'dictionary_unsigned'
======================================================================
======================================================================
Testing C ArrowSchema from file 'nested_dictionary'
======================================================================
======================================================================
Testing C ArrowSchema from file 'run_end_encoded'
======================================================================
======================================================================
Testing C ArrowSchema from file 'binary_view'
======================================================================
======================================================================
Testing C ArrowSchema from file 'list_view'
======================================================================
======================================================================
Testing C ArrowSchema from file 'extension'
-- Skipping test because producer C++ does not support C ArrowSchema
======================================================================
======================================================================
Testing C ArrowArray from file 'primitive_no_batches'
======================================================================
======================================================================
Testing C ArrowArray from file 'primitive'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'primitive_zerolength'
... with record batch #0
... with record batch #1
... with record batch #2
======================================================================
======================================================================
Testing C ArrowArray from file 'primitive_large_offsets'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'null'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'null_trivial'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'decimal'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'decimal256'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'datetime'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'duration'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'interval'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'interval_mdn'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'map'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'map_non_canonical'
... with record batch #0
======================================================================
======================================================================
Testing C ArrowArray from file 'nested'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'recursive_nested'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'nested_large_offsets'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'union'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'custom_metadata'
... with record batch #0
======================================================================
======================================================================
Testing C ArrowArray from file 'duplicate_fieldnames'
... with record batch #0
======================================================================
======================================================================
Testing C ArrowArray from file 'dictionary'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'dictionary_unsigned'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'nested_dictionary'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'run_end_encoded'
... with record batch #0
... with record batch #1
... with record batch #2
======================================================================
======================================================================
Testing C ArrowArray from file 'binary_view'
... with record batch #0
... with record batch #1
... with record batch #2
======================================================================
======================================================================
Testing C ArrowArray from file 'list_view'
... with record batch #0
... with record batch #1
... with record batch #2
======================================================================
======================================================================
Testing C ArrowArray from file 'extension'
-- Skipping test because producer C++ does not support C ArrowArray
======================================================================
##########################################################
C Data Interface: C++ exporting, nanoarrow importing
##########################################################
======================================================================
Testing C ArrowSchema from file 'primitive_no_batches'
======================================================================
======================================================================
Testing C ArrowSchema from file 'primitive'
======================================================================
======================================================================
Testing C ArrowSchema from file 'primitive_zerolength'
======================================================================
======================================================================
Testing C ArrowSchema from file 'primitive_large_offsets'
======================================================================
======================================================================
Testing C ArrowSchema from file 'null'
======================================================================
======================================================================
Testing C ArrowSchema from file 'null_trivial'
======================================================================
======================================================================
Testing C ArrowSchema from file 'decimal'
======================================================================
======================================================================
Testing C ArrowSchema from file 'decimal256'
======================================================================
======================================================================
Testing C ArrowSchema from file 'datetime'
======================================================================
======================================================================
Testing C ArrowSchema from file 'duration'
======================================================================
======================================================================
Testing C ArrowSchema from file 'interval'
======================================================================
======================================================================
Testing C ArrowSchema from file 'interval_mdn'
======================================================================
======================================================================
Testing C ArrowSchema from file 'map'
======================================================================
======================================================================
Testing C ArrowSchema from file 'map_non_canonical'
-- Skipping test because producer C++ does not support C ArrowSchema
======================================================================
======================================================================
Testing C ArrowSchema from file 'nested'
======================================================================
======================================================================
Testing C ArrowSchema from file 'recursive_nested'
======================================================================
======================================================================
Testing C ArrowSchema from file 'nested_large_offsets'
======================================================================
======================================================================
Testing C ArrowSchema from file 'union'
======================================================================
======================================================================
Testing C ArrowSchema from file 'custom_metadata'
======================================================================
======================================================================
Testing C ArrowSchema from file 'duplicate_fieldnames'
======================================================================
======================================================================
Testing C ArrowSchema from file 'dictionary'
======================================================================
======================================================================
Testing C ArrowSchema from file 'dictionary_unsigned'
======================================================================
======================================================================
Testing C ArrowSchema from file 'nested_dictionary'
======================================================================
======================================================================
Testing C ArrowSchema from file 'run_end_encoded'
Traceback (most recent call last):
  File "/Users/dewey/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 473, in _run_c_schema_test_case
    do_run()
  File "/Users/dewey/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 456, in do_run
    importer.import_schema_and_compare_to_json(json_path, c_schema_ptr)
  File "/Users/dewey/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 138, in import_schema_and_compare_to_json
    self._check_nanoarrow_error(na_error)
  File "/Users/dewey/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 109, in _check_nanoarrow_error
    raise RuntimeError(f"nanoarrow C Data Integration call failed: {error}")
RuntimeError: nanoarrow C Data Integration call failed: Unsupported Type name: 'runendencoded'
======================================================================
======================================================================
Testing C ArrowSchema from file 'binary_view'
Traceback (most recent call last):
  File "/Users/dewey/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 473, in _run_c_schema_test_case
    do_run()
  File "/Users/dewey/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 456, in do_run
    importer.import_schema_and_compare_to_json(json_path, c_schema_ptr)
  File "/Users/dewey/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 138, in import_schema_and_compare_to_json
    self._check_nanoarrow_error(na_error)
  File "/Users/dewey/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 109, in _check_nanoarrow_error
    raise RuntimeError(f"nanoarrow C Data Integration call failed: {error}")
RuntimeError: nanoarrow C Data Integration call failed: Unsupported Type name: 'binaryview'
======================================================================
======================================================================
Testing C ArrowSchema from file 'list_view'
Traceback (most recent call last):
  File "/Users/dewey/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 473, in _run_c_schema_test_case
    do_run()
  File "/Users/dewey/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 456, in do_run
    importer.import_schema_and_compare_to_json(json_path, c_schema_ptr)
  File "/Users/dewey/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 138, in import_schema_and_compare_to_json
    self._check_nanoarrow_error(na_error)
  File "/Users/dewey/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 109, in _check_nanoarrow_error
    raise RuntimeError(f"nanoarrow C Data Integration call failed: {error}")
RuntimeError: nanoarrow C Data Integration call failed: Unsupported Type name: 'listview'
======================================================================
======================================================================
Testing C ArrowSchema from file 'extension'
-- Skipping test because producer C++ does not support C ArrowSchema
======================================================================
======================================================================
Testing C ArrowArray from file 'primitive_no_batches'
======================================================================
======================================================================
Testing C ArrowArray from file 'primitive'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'primitive_zerolength'
... with record batch #0
... with record batch #1
... with record batch #2
======================================================================
======================================================================
Testing C ArrowArray from file 'primitive_large_offsets'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'null'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'null_trivial'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'decimal'
... with record batch #0
Traceback (most recent call last):
  File "/Users/dewey/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 523, in _run_c_array_test_cases
    do_run()
  File "/Users/dewey/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 503, in do_run
    importer.import_batch_and_compare_to_json(json_path,
  File "/Users/dewey/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 144, in import_batch_and_compare_to_json
    self._check_nanoarrow_error(na_error)
  File "/Users/dewey/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 109, in _check_nanoarrow_error
    raise RuntimeError(f"nanoarrow C Data Integration call failed: {error}")
RuntimeError: nanoarrow C Data Integration call failed: -> Column 'f0' storage type decimal128 DATA buffer not supported
======================================================================
======================================================================
Testing C ArrowArray from file 'decimal256'
... with record batch #0
Traceback (most recent call last):
  File "/Users/dewey/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 523, in _run_c_array_test_cases
    do_run()
  File "/Users/dewey/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 503, in do_run
    importer.import_batch_and_compare_to_json(json_path,
  File "/Users/dewey/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 144, in import_batch_and_compare_to_json
    self._check_nanoarrow_error(na_error)
  File "/Users/dewey/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 109, in _check_nanoarrow_error
    raise RuntimeError(f"nanoarrow C Data Integration call failed: {error}")
RuntimeError: nanoarrow C Data Integration call failed: -> Column 'f0' storage type decimal256 DATA buffer not supported
======================================================================
======================================================================
Testing C ArrowArray from file 'datetime'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'duration'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'interval'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'interval_mdn'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'map'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'map_non_canonical'
... with record batch #0
======================================================================
======================================================================
Testing C ArrowArray from file 'nested'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'recursive_nested'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'nested_large_offsets'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'union'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'custom_metadata'
... with record batch #0
======================================================================
======================================================================
Testing C ArrowArray from file 'duplicate_fieldnames'
... with record batch #0
======================================================================
======================================================================
Testing C ArrowArray from file 'dictionary'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'dictionary_unsigned'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'nested_dictionary'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'run_end_encoded'
... with record batch #0
Traceback (most recent call last):
  File "/Users/dewey/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 523, in _run_c_array_test_cases
    do_run()
  File "/Users/dewey/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 503, in do_run
    importer.import_batch_and_compare_to_json(json_path,
  File "/Users/dewey/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 144, in import_batch_and_compare_to_json
    self._check_nanoarrow_error(na_error)
  File "/Users/dewey/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 109, in _check_nanoarrow_error
    raise RuntimeError(f"nanoarrow C Data Integration call failed: {error}")
RuntimeError: nanoarrow C Data Integration call failed: Unsupported Type name: 'runendencoded'
======================================================================
======================================================================
Testing C ArrowArray from file 'binary_view'
... with record batch #0
Traceback (most recent call last):
  File "/Users/dewey/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 523, in _run_c_array_test_cases
    do_run()
  File "/Users/dewey/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 503, in do_run
    importer.import_batch_and_compare_to_json(json_path,
  File "/Users/dewey/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 144, in import_batch_and_compare_to_json
    self._check_nanoarrow_error(na_error)
  File "/Users/dewey/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 109, in _check_nanoarrow_error
    raise RuntimeError(f"nanoarrow C Data Integration call failed: {error}")
RuntimeError: nanoarrow C Data Integration call failed: Unsupported Type name: 'binaryview'
======================================================================
======================================================================
Testing C ArrowArray from file 'list_view'
... with record batch #0
Traceback (most recent call last):
  File "/Users/dewey/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 523, in _run_c_array_test_cases
    do_run()
  File "/Users/dewey/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 503, in do_run
    importer.import_batch_and_compare_to_json(json_path,
  File "/Users/dewey/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 144, in import_batch_and_compare_to_json
    self._check_nanoarrow_error(na_error)
  File "/Users/dewey/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 109, in _check_nanoarrow_error
    raise RuntimeError(f"nanoarrow C Data Integration call failed: {error}")
RuntimeError: nanoarrow C Data Integration call failed: Unsupported Type name: 'listview'
======================================================================
======================================================================
Testing C ArrowArray from file 'extension'
-- Skipping test because producer C++ does not support C ArrowArray
======================================================================
##########################################################
C Data Interface: nanoarrow exporting, C++ importing
##########################################################
======================================================================
Testing C ArrowSchema from file 'primitive_no_batches'
======================================================================
======================================================================
Testing C ArrowSchema from file 'primitive'
======================================================================
======================================================================
Testing C ArrowSchema from file 'primitive_zerolength'
======================================================================
======================================================================
Testing C ArrowSchema from file 'primitive_large_offsets'
======================================================================
======================================================================
Testing C ArrowSchema from file 'null'
======================================================================
======================================================================
Testing C ArrowSchema from file 'null_trivial'
======================================================================
======================================================================
Testing C ArrowSchema from file 'decimal'
======================================================================
======================================================================
Testing C ArrowSchema from file 'decimal256'
======================================================================
======================================================================
Testing C ArrowSchema from file 'datetime'
======================================================================
======================================================================
Testing C ArrowSchema from file 'duration'
======================================================================
======================================================================
Testing C ArrowSchema from file 'interval'
======================================================================
======================================================================
Testing C ArrowSchema from file 'interval_mdn'
======================================================================
======================================================================
Testing C ArrowSchema from file 'map'
======================================================================
======================================================================
Testing C ArrowSchema from file 'map_non_canonical'
-- Skipping test because consumer C++ does not support C ArrowSchema
======================================================================
======================================================================
Testing C ArrowSchema from file 'nested'
======================================================================
======================================================================
Testing C ArrowSchema from file 'recursive_nested'
======================================================================
======================================================================
Testing C ArrowSchema from file 'nested_large_offsets'
======================================================================
======================================================================
Testing C ArrowSchema from file 'union'
======================================================================
======================================================================
Testing C ArrowSchema from file 'custom_metadata'
======================================================================
======================================================================
Testing C ArrowSchema from file 'duplicate_fieldnames'
======================================================================
======================================================================
Testing C ArrowSchema from file 'dictionary'
======================================================================
======================================================================
Testing C ArrowSchema from file 'dictionary_unsigned'
======================================================================
======================================================================
Testing C ArrowSchema from file 'nested_dictionary'
======================================================================
======================================================================
Testing C ArrowSchema from file 'run_end_encoded'
Traceback (most recent call last):
  File "/Users/dewey/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 473, in _run_c_schema_test_case
    do_run()
  File "/Users/dewey/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 455, in do_run
    exporter.export_schema_from_json(json_path, c_schema_ptr)
  File "/Users/dewey/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 117, in export_schema_from_json
    self._check_nanoarrow_error(na_error)
  File "/Users/dewey/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 109, in _check_nanoarrow_error
    raise RuntimeError(f"nanoarrow C Data Integration call failed: {error}")
RuntimeError: nanoarrow C Data Integration call failed: Unsupported Type name: 'runendencoded'
======================================================================
======================================================================
Testing C ArrowSchema from file 'binary_view'
Traceback (most recent call last):
  File "/Users/dewey/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 473, in _run_c_schema_test_case
    do_run()
  File "/Users/dewey/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 455, in do_run
    exporter.export_schema_from_json(json_path, c_schema_ptr)
  File "/Users/dewey/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 117, in export_schema_from_json
    self._check_nanoarrow_error(na_error)
  File "/Users/dewey/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 109, in _check_nanoarrow_error
    raise RuntimeError(f"nanoarrow C Data Integration call failed: {error}")
RuntimeError: nanoarrow C Data Integration call failed: Unsupported Type name: 'binaryview'
======================================================================
======================================================================
Testing C ArrowSchema from file 'list_view'
Traceback (most recent call last):
  File "/Users/dewey/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 473, in _run_c_schema_test_case
    do_run()
  File "/Users/dewey/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 455, in do_run
    exporter.export_schema_from_json(json_path, c_schema_ptr)
  File "/Users/dewey/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 117, in export_schema_from_json
    self._check_nanoarrow_error(na_error)
  File "/Users/dewey/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 109, in _check_nanoarrow_error
    raise RuntimeError(f"nanoarrow C Data Integration call failed: {error}")
RuntimeError: nanoarrow C Data Integration call failed: Unsupported Type name: 'listview'
======================================================================
======================================================================
Testing C ArrowSchema from file 'extension'
-- Skipping test because consumer C++ does not support C ArrowSchema
======================================================================
======================================================================
Testing C ArrowArray from file 'primitive_no_batches'
======================================================================
======================================================================
Testing C ArrowArray from file 'primitive'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'primitive_zerolength'
... with record batch #0
... with record batch #1
... with record batch #2
======================================================================
======================================================================
Testing C ArrowArray from file 'primitive_large_offsets'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'null'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'null_trivial'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'decimal'
... with record batch #0
Traceback (most recent call last):
  File "/Users/dewey/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 523, in _run_c_array_test_cases
    do_run()
  File "/Users/dewey/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 500, in do_run
    exporter.export_batch_from_json(json_path,
  File "/Users/dewey/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 123, in export_batch_from_json
    self._check_nanoarrow_error(na_error)
  File "/Users/dewey/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 109, in _check_nanoarrow_error
    raise RuntimeError(f"nanoarrow C Data Integration call failed: {error}")
RuntimeError: nanoarrow C Data Integration call failed: -> Column 'f0' storage type decimal128 DATA buffer not supported
======================================================================
======================================================================
Testing C ArrowArray from file 'decimal256'
... with record batch #0
Traceback (most recent call last):
  File "/Users/dewey/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 523, in _run_c_array_test_cases
    do_run()
  File "/Users/dewey/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 500, in do_run
    exporter.export_batch_from_json(json_path,
  File "/Users/dewey/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 123, in export_batch_from_json
    self._check_nanoarrow_error(na_error)
  File "/Users/dewey/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 109, in _check_nanoarrow_error
    raise RuntimeError(f"nanoarrow C Data Integration call failed: {error}")
RuntimeError: nanoarrow C Data Integration call failed: -> Column 'f0' storage type decimal256 DATA buffer not supported
======================================================================
======================================================================
Testing C ArrowArray from file 'datetime'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'duration'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'interval'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'interval_mdn'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'map'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'map_non_canonical'
... with record batch #0
======================================================================
======================================================================
Testing C ArrowArray from file 'nested'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'recursive_nested'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'nested_large_offsets'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'union'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'custom_metadata'
... with record batch #0
======================================================================
======================================================================
Testing C ArrowArray from file 'duplicate_fieldnames'
... with record batch #0
======================================================================
======================================================================
Testing C ArrowArray from file 'dictionary'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'dictionary_unsigned'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'nested_dictionary'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'run_end_encoded'
... with record batch #0
Traceback (most recent call last):
  File "/Users/dewey/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 523, in _run_c_array_test_cases
    do_run()
  File "/Users/dewey/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 500, in do_run
    exporter.export_batch_from_json(json_path,
  File "/Users/dewey/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 123, in export_batch_from_json
    self._check_nanoarrow_error(na_error)
  File "/Users/dewey/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 109, in _check_nanoarrow_error
    raise RuntimeError(f"nanoarrow C Data Integration call failed: {error}")
RuntimeError: nanoarrow C Data Integration call failed: Unsupported Type name: 'runendencoded'
======================================================================
======================================================================
Testing C ArrowArray from file 'binary_view'
... with record batch #0
Traceback (most recent call last):
  File "/Users/dewey/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 523, in _run_c_array_test_cases
    do_run()
  File "/Users/dewey/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 500, in do_run
    exporter.export_batch_from_json(json_path,
  File "/Users/dewey/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 123, in export_batch_from_json
    self._check_nanoarrow_error(na_error)
  File "/Users/dewey/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 109, in _check_nanoarrow_error
    raise RuntimeError(f"nanoarrow C Data Integration call failed: {error}")
RuntimeError: nanoarrow C Data Integration call failed: Unsupported Type name: 'binaryview'
======================================================================
======================================================================
Testing C ArrowArray from file 'list_view'
... with record batch #0
Traceback (most recent call last):
  File "/Users/dewey/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 523, in _run_c_array_test_cases
    do_run()
  File "/Users/dewey/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 500, in do_run
    exporter.export_batch_from_json(json_path,
  File "/Users/dewey/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 123, in export_batch_from_json
    self._check_nanoarrow_error(na_error)
  File "/Users/dewey/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 109, in _check_nanoarrow_error
    raise RuntimeError(f"nanoarrow C Data Integration call failed: {error}")
RuntimeError: nanoarrow C Data Integration call failed: Unsupported Type name: 'listview'
======================================================================
======================================================================
Testing C ArrowArray from file 'extension'
-- Skipping test because consumer C++ does not support C ArrowArray
======================================================================
##########################################################
C Data Interface: nanoarrow exporting, nanoarrow importing
##########################################################
======================================================================
Testing C ArrowSchema from file 'primitive_no_batches'
======================================================================
======================================================================
Testing C ArrowSchema from file 'primitive'
======================================================================
======================================================================
Testing C ArrowSchema from file 'primitive_zerolength'
======================================================================
======================================================================
Testing C ArrowSchema from file 'primitive_large_offsets'
======================================================================
======================================================================
Testing C ArrowSchema from file 'null'
======================================================================
======================================================================
Testing C ArrowSchema from file 'null_trivial'
======================================================================
======================================================================
Testing C ArrowSchema from file 'decimal'
======================================================================
======================================================================
Testing C ArrowSchema from file 'decimal256'
======================================================================
======================================================================
Testing C ArrowSchema from file 'datetime'
======================================================================
======================================================================
Testing C ArrowSchema from file 'duration'
======================================================================
======================================================================
Testing C ArrowSchema from file 'interval'
======================================================================
======================================================================
Testing C ArrowSchema from file 'interval_mdn'
======================================================================
======================================================================
Testing C ArrowSchema from file 'map'
======================================================================
======================================================================
Testing C ArrowSchema from file 'map_non_canonical'
======================================================================
======================================================================
Testing C ArrowSchema from file 'nested'
======================================================================
======================================================================
Testing C ArrowSchema from file 'recursive_nested'
======================================================================
======================================================================
Testing C ArrowSchema from file 'nested_large_offsets'
======================================================================
======================================================================
Testing C ArrowSchema from file 'union'
======================================================================
======================================================================
Testing C ArrowSchema from file 'custom_metadata'
======================================================================
======================================================================
Testing C ArrowSchema from file 'duplicate_fieldnames'
======================================================================
======================================================================
Testing C ArrowSchema from file 'dictionary'
======================================================================
======================================================================
Testing C ArrowSchema from file 'dictionary_unsigned'
======================================================================
======================================================================
Testing C ArrowSchema from file 'nested_dictionary'
======================================================================
======================================================================
Testing C ArrowSchema from file 'run_end_encoded'
Traceback (most recent call last):
  File "/Users/dewey/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 473, in _run_c_schema_test_case
    do_run()
  File "/Users/dewey/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 455, in do_run
    exporter.export_schema_from_json(json_path, c_schema_ptr)
  File "/Users/dewey/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 117, in export_schema_from_json
    self._check_nanoarrow_error(na_error)
  File "/Users/dewey/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 109, in _check_nanoarrow_error
    raise RuntimeError(f"nanoarrow C Data Integration call failed: {error}")
RuntimeError: nanoarrow C Data Integration call failed: Unsupported Type name: 'runendencoded'
======================================================================
======================================================================
Testing C ArrowSchema from file 'binary_view'
Traceback (most recent call last):
  File "/Users/dewey/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 473, in _run_c_schema_test_case
    do_run()
  File "/Users/dewey/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 455, in do_run
    exporter.export_schema_from_json(json_path, c_schema_ptr)
  File "/Users/dewey/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 117, in export_schema_from_json
    self._check_nanoarrow_error(na_error)
  File "/Users/dewey/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 109, in _check_nanoarrow_error
    raise RuntimeError(f"nanoarrow C Data Integration call failed: {error}")
RuntimeError: nanoarrow C Data Integration call failed: Unsupported Type name: 'binaryview'
======================================================================
======================================================================
Testing C ArrowSchema from file 'list_view'
Traceback (most recent call last):
  File "/Users/dewey/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 473, in _run_c_schema_test_case
    do_run()
  File "/Users/dewey/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 455, in do_run
    exporter.export_schema_from_json(json_path, c_schema_ptr)
  File "/Users/dewey/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 117, in export_schema_from_json
    self._check_nanoarrow_error(na_error)
  File "/Users/dewey/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 109, in _check_nanoarrow_error
    raise RuntimeError(f"nanoarrow C Data Integration call failed: {error}")
RuntimeError: nanoarrow C Data Integration call failed: Unsupported Type name: 'listview'
======================================================================
======================================================================
Testing C ArrowSchema from file 'extension'
======================================================================
======================================================================
Testing C ArrowArray from file 'primitive_no_batches'
======================================================================
======================================================================
Testing C ArrowArray from file 'primitive'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'primitive_zerolength'
... with record batch #0
... with record batch #1
... with record batch #2
======================================================================
======================================================================
Testing C ArrowArray from file 'primitive_large_offsets'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'null'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'null_trivial'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'decimal'
... with record batch #0
Traceback (most recent call last):
  File "/Users/dewey/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 523, in _run_c_array_test_cases
    do_run()
  File "/Users/dewey/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 500, in do_run
    exporter.export_batch_from_json(json_path,
  File "/Users/dewey/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 123, in export_batch_from_json
    self._check_nanoarrow_error(na_error)
  File "/Users/dewey/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 109, in _check_nanoarrow_error
    raise RuntimeError(f"nanoarrow C Data Integration call failed: {error}")
RuntimeError: nanoarrow C Data Integration call failed: -> Column 'f0' storage type decimal128 DATA buffer not supported
======================================================================
======================================================================
Testing C ArrowArray from file 'decimal256'
... with record batch #0
Traceback (most recent call last):
  File "/Users/dewey/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 523, in _run_c_array_test_cases
    do_run()
  File "/Users/dewey/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 500, in do_run
    exporter.export_batch_from_json(json_path,
  File "/Users/dewey/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 123, in export_batch_from_json
    self._check_nanoarrow_error(na_error)
  File "/Users/dewey/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 109, in _check_nanoarrow_error
    raise RuntimeError(f"nanoarrow C Data Integration call failed: {error}")
RuntimeError: nanoarrow C Data Integration call failed: -> Column 'f0' storage type decimal256 DATA buffer not supported
======================================================================
======================================================================
Testing C ArrowArray from file 'datetime'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'duration'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'interval'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'interval_mdn'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'map'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'map_non_canonical'
... with record batch #0
======================================================================
======================================================================
Testing C ArrowArray from file 'nested'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'recursive_nested'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'nested_large_offsets'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'union'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'custom_metadata'
... with record batch #0
======================================================================
======================================================================
Testing C ArrowArray from file 'duplicate_fieldnames'
... with record batch #0
======================================================================
======================================================================
Testing C ArrowArray from file 'dictionary'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'dictionary_unsigned'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'nested_dictionary'
... with record batch #0
... with record batch #1
======================================================================
======================================================================
Testing C ArrowArray from file 'run_end_encoded'
... with record batch #0
Traceback (most recent call last):
  File "/Users/dewey/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 523, in _run_c_array_test_cases
    do_run()
  File "/Users/dewey/Desktop/rscratch/arrow/dev/archery/archery/integration/runner.py", line 500, in do_run
    exporter.export_batch_from_json(json_path,
  File "/Users/dewey/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 123, in export_batch_from_json
    self._check_nanoarrow_error(na_error)
  File "/Users/dewey/Desktop/rscratch/arrow/dev/archery/archery/integration/tester_nanoarrow.py", line 109, in _check_nanoarrow_error
    raise RuntimeError(f"nanoarrow C Data Integration call failed: {error}")
RuntimeError: nanoarrow C Data…
@paleolimbot paleolimbot force-pushed the archery-nanoarrow-integration branch 2 times, most recently from 58f74a4 to fa0d189 Compare February 7, 2024 16:30
@paleolimbot
Copy link
Member Author

A few outstanding issues:

I'm not sure how to skip based on type in Archery. nanoarrow doesn't support the new types yet, so it doesn't add them in integration testing. There must be an example of this but I can't seem to find it!

FAILED TEST: run_end_encoded C++ producing,  nanoarrow consuming
<class 'RuntimeError'>: nanoarrow C Data Integration call failed: Unsupported Type name: 'runendencoded'

FAILED TEST: binary_view C++ producing,  nanoarrow consuming
<class 'RuntimeError'>: nanoarrow C Data Integration call failed: Unsupported Type name: 'binaryview'

FAILED TEST: list_view C++ producing,  nanoarrow consuming
<class 'RuntimeError'>: nanoarrow C Data Integration call failed: Unsupported Type name: 'listview'

FAILED TEST: run_end_encoded C++ producing,  nanoarrow consuming
<class 'RuntimeError'>: nanoarrow C Data Integration call failed: Unsupported Type name: 'runendencoded'

C# appears to export record batches as nullable structs, whereas nanoarrow expects them to be non-nullable. In nanoarrow I should probably just ignore the difference for a top-level "batch".

FAILED TEST: recursive_nested C# producing,  nanoarrow consuming
<class 'RuntimeError'>: nanoarrow C Data Integration call failed: Found 1 differences:
Path: 
- .flags: 2
+ .flags: 0

Java's metadata probably goes through a hash map or something because it looks like the order is not always maintained. We can relax the comparison to consider all keys/values of the metadata as a whole:

 FAILED TEST: custom_metadata Java producing,  nanoarrow consuming
<class 'RuntimeError'>: nanoarrow C Data Integration call failed: Found 2 differences:
Path: .children[1]
- {"name": "lots_of_meta", "nullable": true, "type": {"name": "int", "bitWidth": 8, "isSigned": true}, "children": [], "metadata": [{"key": "..", "value": "{}"}, {"key": "a", "value": "{}"}, {"key": "b", "value": "{}"}, {"key": "c", "value": "{}"}, {"key": "d", "value": "{}"}, {"key": "w", "value": "{}"}, {"key": "x", "value": "{}"}, {"key": "y", "value": "{}"}, {"key": "z", "value": "{}"}]}
+ {"name": "lots_of_meta", "nullable": true, "type": {"name": "int", "bitWidth": 8, "isSigned": true}, "children": [], "metadata": [{"key": "a", "value": "{}"}, {"key": "b", "value": "{}"}, {"key": "c", "value": "{}"}, {"key": "d", "value": "{}"}, {"key": "..", "value": "{}"}, {"key": "w", "value": "{}"}, {"key": "x", "value": "{}"}, {"key": "y", "value": "{}"}, {"key": "z", "value": "{}"}]}

It looks like Go exports zero-length metadata (i.e., b"\x00\x00\x00\x00") instead of NULL metadata. We can relax that check in the comparison.

 FAILED TEST: primitive_no_batches Go producing,  nanoarrow consuming
<class 'RuntimeError'>: nanoarrow C Data Integration call failed: Found 31 differences:
Path: .children[0]
- {"name": "bool_nullable", "nullable": true, "type": {"name": "bool"}, "children": [], "metadata": []}
+ {"name": "bool_nullable", "nullable": true, "type": {"name": "bool"}, "children": []}

It looks like C# has some issues with the arrays produced by nanoarrow (or at least by the JSON reader):

 FAILED TEST: nested_dictionary nanoarrow producing,  C# consuming
<class 'Xunit.Sdk.TrueException'>: Validity buffers do not match.
   at Xunit.Assert.True(Nullable`1 condition, String userMessage) in /_/src/xunit.assert/Asserts/BooleanAsserts.cs:line 146
   at Apache.Arrow.Tests.ArrowReaderVerifier.ArrayComparer.CompareValidityBuffer(Int32 nullCount, Int32 arrayLength, ArrowBuffer expectedValidityBuffer, ArrowBuffer actualValidityBuffer) in /arrow/csharp/test/Apache.Arrow.Tests/ArrowReaderVerifier.cs:line 435

...and C# reports a memory leak. I would have assumed that a memory leak would have been consistent between languages so I'm puzzled by this one.

 FAILED TEST: extension nanoarrow producing,  C# consuming
<class 'RuntimeError'>: Memory was not released correctly after roundtrip: before = 694, after = 734 (should have been equal)

@paleolimbot
Copy link
Member Author

Ok, we're now down to:

################# FAILURES #################
FAILED TEST: primitive_zerolength nanoarrow producing,  C# consuming
<class 'RuntimeError'>: Memory was not released correctly after roundtrip: before = 0, after = 16 (should have been equal)

FAILED TEST: recursive_nested nanoarrow producing,  C# consuming
<class 'RuntimeError'>: Memory was not released correctly after roundtrip: before = 16, after = 668 (should have been equal)

FAILED TEST: union nanoarrow producing,  C# consuming
<class 'RuntimeError'>: Memory was not released correctly after roundtrip: before = 668, after = 676 (should have been equal)

FAILED TEST: custom_metadata nanoarrow producing,  C# consuming
<class 'RuntimeError'>: Memory was not released correctly after roundtrip: before = 676, after = 691 (should have been equal)

FAILED TEST: extension nanoarrow producing,  C# consuming
<class 'RuntimeError'>: Memory was not released correctly after roundtrip: before = 691, after = 731 (should have been equal)

(i.e., only very specific files, but the same files each time, are reporting leaks when C# is responsible for releasing)

@paleolimbot paleolimbot force-pushed the archery-nanoarrow-integration branch from c98efcd to a0e70ec Compare April 3, 2024 12:33
paleolimbot added a commit to apache/arrow-nanoarrow that referenced this pull request Apr 15, 2024
These changes are the changes required such that
apache/arrow#39302 results in passing
integration tests for nanoarrow. The changes are mostly related to
comparison:

- We needed an option to allow metadata to be compared on a key/value
basis without considering order (for Java, which seems to reorder
metadata on read)
- We needed the ability to treat NULL metadata and zero-size metadata as
equivalent (for Go, which always exports zero-length metadata)
- We needed an option to ignore flags for top-level batches (for C#,
which exports nullable structs)
- We needed to ensure that the last few bits of the validity buffer were
zeroed (for C#, although this is now fixed in C# on Arrow main)
- We needed to ensure that no buffers were NULL (For C#, which leaks the
top-level array if it encounters one, at least in the integration tests.
This should really be fixed in C#).
@paleolimbot paleolimbot force-pushed the archery-nanoarrow-integration branch 2 times, most recently from b169bed to a8ee10a Compare April 17, 2024 16:24
@paleolimbot paleolimbot marked this pull request as ready for review April 17, 2024 18:05
@paleolimbot paleolimbot requested a review from pitrou April 17, 2024 18:06
@paleolimbot paleolimbot force-pushed the archery-nanoarrow-integration branch from 4d60419 to b9adfec Compare May 8, 2024 17:31
@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels May 8, 2024
Copy link
Contributor

@felipecrv felipecrv left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, but I'm not to be trusted reviewing shell script and Python.

@github-actions github-actions bot added awaiting merge Awaiting merge and removed awaiting change review Awaiting change review labels May 8, 2024
docker-compose.yml Show resolved Hide resolved
@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting merge Awaiting merge labels May 8, 2024

_NANOARROW_PATH = os.environ.get(
"ARROW_NANOARROW_PATH",
os.path.join(ARROW_ROOT_DEFAULT, "nanoarrow/cdata"),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless we clone nanoarrow inside the arrow repo ARROW_ROOT_DEFAULT would never be the correct path here as it will include the arrow folder.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you wanted to do this interactively that is basically what you would have to do (I've tried this and it does work)...this is the approach used by the Rust tester that I copied here. In docker-compose.yml this is overridden such that the build happens elsewhere.

@github-actions github-actions bot added awaiting change review Awaiting change review and removed awaiting changes Awaiting changes labels May 9, 2024
@github-actions github-actions bot added awaiting changes Awaiting changes and removed awaiting change review Awaiting change review labels May 10, 2024
@paleolimbot paleolimbot merged commit 899422e into apache:main May 10, 2024
54 of 60 checks passed
@paleolimbot paleolimbot removed the awaiting changes Awaiting changes label May 10, 2024
@paleolimbot paleolimbot deleted the archery-nanoarrow-integration branch May 10, 2024 19:54
Copy link

After merging your PR, Conbench analyzed the 0 benchmarking runs that have been run so far on merge-commit 899422e.

None of the specified runs were found on the Conbench server.

The full Conbench report has more details.

@pitrou
Copy link
Member

pitrou commented May 13, 2024

Thanks for this @paleolimbot !

CurtHagenlocher pushed a commit to CurtHagenlocher/arrow that referenced this pull request May 13, 2024
… integration setup (apache#39302)

### Rationale for this change

The ability to add integration testing was added in nanoarrow however, the infrastructure for running these tests currently lives in the arrow monorepo.

### What changes are included in this PR?

- Added the relevant code to Archery such that these tests can be run
- Added the relevant scripts/environment variables to CI such that these tests run in the integration CI job

### Are these changes tested?

Yes, via the "Integration" CI job.

### Are there any user-facing changes?

No.

This PR still needs apache#41264 for the integration tests to pass.

* Closes: apache#39301
* GitHub Issue: apache#39301

Lead-authored-by: Dewey Dunnington <[email protected]>
Co-authored-by: Dewey Dunnington <[email protected]>
Signed-off-by: Dewey Dunnington <[email protected]>
vibhatha pushed a commit to vibhatha/arrow that referenced this pull request May 25, 2024
… integration setup (apache#39302)

### Rationale for this change

The ability to add integration testing was added in nanoarrow however, the infrastructure for running these tests currently lives in the arrow monorepo.

### What changes are included in this PR?

- Added the relevant code to Archery such that these tests can be run
- Added the relevant scripts/environment variables to CI such that these tests run in the integration CI job

### Are these changes tested?

Yes, via the "Integration" CI job.

### Are there any user-facing changes?

No.

This PR still needs apache#41264 for the integration tests to pass.

* Closes: apache#39301
* GitHub Issue: apache#39301

Lead-authored-by: Dewey Dunnington <[email protected]>
Co-authored-by: Dewey Dunnington <[email protected]>
Signed-off-by: Dewey Dunnington <[email protected]>
JerAguilon pushed a commit to JerAguilon/arrow that referenced this pull request May 29, 2024
… integration setup (apache#39302)

### Rationale for this change

The ability to add integration testing was added in nanoarrow however, the infrastructure for running these tests currently lives in the arrow monorepo.

### What changes are included in this PR?

- Added the relevant code to Archery such that these tests can be run
- Added the relevant scripts/environment variables to CI such that these tests run in the integration CI job

### Are these changes tested?

Yes, via the "Integration" CI job.

### Are there any user-facing changes?

No.

This PR still needs apache#41264 for the integration tests to pass.

* Closes: apache#39301
* GitHub Issue: apache#39301

Lead-authored-by: Dewey Dunnington <[email protected]>
Co-authored-by: Dewey Dunnington <[email protected]>
Signed-off-by: Dewey Dunnington <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Archery][CI][Integration] Add nanoarrow integration tests to archery/CI
5 participants