-
Notifications
You must be signed in to change notification settings - Fork 8
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FFI] - RangeError: byte length of BigInt64Array should be a multiple of 8 #129
Comments
@Vectorrent I'm unable to reproduce this:
With this test case: // issue129.test.ts
import { readFileSync } from "fs";
import { readParquet, wasmMemory } from "parquet-wasm";
import { describe, it, expect } from "vitest";
import * as arrow from "apache-arrow";
import * as wasm from "rust-arrow-ffi";
import { parseTable } from "../src";
wasm.setPanicHook();
describe("issue 129", (t) => {
const buffer = readFileSync("0320.parquet");
const ffiTable = readParquet(buffer).intoFFI();
const memory = wasmMemory();
const table = parseTable(
memory.buffer,
ffiTable.arrayAddrs(),
ffiTable.schemaAddr()
);
ffiTable.free();
console.log(table.schema);
it("Should pass", () => {
expect(true).toBeTruthy();
});
}); Schema {
fields: [
Field {
name: 'content',
type: [Utf8],
nullable: true,
metadata: Map(0) {}
},
Field {
name: 'url',
type: [Utf8],
nullable: true,
metadata: Map(0) {}
},
Field {
name: 'timestamp',
type: [Timestamp_ [Timestamp]],
nullable: true,
metadata: Map(0) {}
},
Field {
name: 'dump',
type: [Utf8],
nullable: true,
metadata: Map(0) {}
},
Field {
name: 'segment',
type: [Utf8],
nullable: true,
metadata: Map(0) {}
},
Field {
name: 'image_urls',
type: [List],
nullable: true,
metadata: Map(0) {}
}
],
metadata: Map(1) {
'huggingface' => '{"info": {"features": {"content": {"dtype": "string", "_type": "Value"}, "url": {"dtype": "string", "_type": "Value"}, "timestamp": {"dtype": "timestamp[s]", "_type": "Value"}, "dump": {"dtype": "string", "_type": "Value"}, "segment": {"dtype": "string", "_type": "Value"}, "image_urls": {"feature": {"feature": {"dtype": "string", "_type": "Value"}, "_type": "Sequence"}, "_type": "Sequence"}}}}'
},
dictionaries: Map(0) {},
metadataVersion: 4
} |
Strange. I tried your code (i.e. loading from disk), and that fails too. I upgraded to Node v22, and apache-arrow v17.0.0 - with no luck. Not sure what else to try; maybe it's an engine thing? I'm running on Linux. Anyway, not a huge priority, since I do have a workaround. Just thought it was worth reporting. |
Are you able to slice that data (i.e. take the first 5 rows) and save it as a Parquet file that also fails for you? Then we could check that data in to Git and add it as a test case to this repo. It's good that reading from IPC works, but I do want to make sure that arrow-js-ffi is stable! |
I sliced 5 rows with PyArrow, saved them to disk, then tried FFI again with the new file. No dice, it still fails. Here's the sliced file: https://mega.nz/file/CRsFDJrC#3lRSoohQ1kohnqzX0O0TmVtjrsfgKRgj0KMLzxf2nU8 |
Ok, cool, thanks for making that file. For reference, I find it much easier to zip a Parquet file and share that via github in the issue itself. |
Oops, didn't realize zip files were supported here. See attached. |
I tried to load a new Parquet table, using the same method I always use, but that method failed with the following error:
This error is thrown when trying to load the table with FFI, but does not happen when we use the original implementation.
Since I already found a workaround, this bug isn't a huge priority for me. But I thought you guys might want to know about it.
Here is some reproducible code:
Versions:
The text was updated successfully, but these errors were encountered: