[FFI] - RangeError: byte length of BigInt64Array should be a multiple of 8 #129

Vectorrent · 2024-09-11T04:12:34Z

I tried to load a new Parquet table, using the same method I always use, but that method failed with the following error:

(venv) [crow@crow-pc ode]$ node misc/parquetFailing.js 
file:///home/crow/repos/ode/node_modules/arrow-js-ffi/dist/arrow-js-ffi.es.mjs:300
            ? new dataType.ArrayType(copyBuffer(dataView.buffer, dataPtr, length * byteWidth))
              ^

RangeError: byte length of BigInt64Array should be a multiple of 8
    at new BigInt64Array (<anonymous>)
    at parseDataContent (file:///home/crow/repos/ode/node_modules/arrow-js-ffi/dist/arrow-js-ffi.es.mjs:300:15)
    at parseData (file:///home/crow/repos/ode/node_modules/arrow-js-ffi/dist/arrow-js-ffi.es.mjs:175:16)
    at parseData (file:///home/crow/repos/ode/node_modules/arrow-js-ffi/dist/arrow-js-ffi.es.mjs:139:23)
    at parseTable (file:///home/crow/repos/ode/node_modules/arrow-js-ffi/dist/arrow-js-ffi.es.mjs:935:28)
    at file:///home/crow/repos/ode/misc/parquetFailing.js:25:19
    at process.processTicksAndRejections (node:internal/process/task_queues:95:5)

Node.js v18.20.4

This error is thrown when trying to load the table with FFI, but does not happen when we use the original implementation.

Since I already found a workaround, this bug isn't a huge priority for me. But I thought you guys might want to know about it.

Here is some reproducible code:

import * as arrow from 'apache-arrow'
import { parseTable } from 'arrow-js-ffi'
import { wasmMemory, readParquet } from 'parquet-wasm'

const url =
    'https://huggingface.co/api/datasets/tiiuae/falcon-refinedweb/parquet/default/train/320.parquet'

// This one will succeed
;(async () => {
    const resp = await fetch(url)
    const buffer = new Uint8Array(await resp.arrayBuffer())
    const arrowWasmTable = readParquet(buffer)
    const table = arrow.tableFromIPC(arrowWasmTable.intoIPCStream())
    table.free()

    console.log('successfully loaded table via parquet-wasm')
})()

// This one will fail
;(async () => {
    const resp = await fetch(url)
    const buffer = new Uint8Array(await resp.arrayBuffer())
    const ffiTable = readParquet(buffer).intoFFI()

    const table = parseTable(
        wasmMemory().buffer,
        ffiTable.arrayAddrs(),
        ffiTable.schemaAddr()
    )
    table.free()

    console.log('successfully loaded table via FFI')
})()

Versions:

parquet-wasm v0.6.1
arrow-js-ffi v0.4.2
node v18.20.4

The text was updated successfully, but these errors were encountered:

kylebarron · 2024-09-11T09:54:45Z

@Vectorrent I'm unable to reproduce this:

Node v20.9.0
arrow-js-ffi latest main (which is the same effectively as latest released)
parquet-wasm 0.6.1

With this test case:

// issue129.test.ts
import { readFileSync } from "fs";
import { readParquet, wasmMemory } from "parquet-wasm";
import { describe, it, expect } from "vitest";
import * as arrow from "apache-arrow";
import * as wasm from "rust-arrow-ffi";
import { parseTable } from "../src";

wasm.setPanicHook();

describe("issue 129", (t) => {
  const buffer = readFileSync("0320.parquet");

  const ffiTable = readParquet(buffer).intoFFI();
  const memory = wasmMemory();

  const table = parseTable(
    memory.buffer,
    ffiTable.arrayAddrs(),
    ffiTable.schemaAddr()
  );
  ffiTable.free();

  console.log(table.schema);

  it("Should pass", () => {
    expect(true).toBeTruthy();
  });
});

Schema {
  fields: [
    Field {
      name: 'content',
      type: [Utf8],
      nullable: true,
      metadata: Map(0) {}
    },
    Field {
      name: 'url',
      type: [Utf8],
      nullable: true,
      metadata: Map(0) {}
    },
    Field {
      name: 'timestamp',
      type: [Timestamp_ [Timestamp]],
      nullable: true,
      metadata: Map(0) {}
    },
    Field {
      name: 'dump',
      type: [Utf8],
      nullable: true,
      metadata: Map(0) {}
    },
    Field {
      name: 'segment',
      type: [Utf8],
      nullable: true,
      metadata: Map(0) {}
    },
    Field {
      name: 'image_urls',
      type: [List],
      nullable: true,
      metadata: Map(0) {}
    }
  ],
  metadata: Map(1) {
    'huggingface' => '{"info": {"features": {"content": {"dtype": "string", "_type": "Value"}, "url": {"dtype": "string", "_type": "Value"}, "timestamp": {"dtype": "timestamp[s]", "_type": "Value"}, "dump": {"dtype": "string", "_type": "Value"}, "segment": {"dtype": "string", "_type": "Value"}, "image_urls": {"feature": {"feature": {"dtype": "string", "_type": "Value"}, "_type": "Sequence"}, "_type": "Sequence"}}}}'
  },
  dictionaries: Map(0) {},
  metadataVersion: 4
}

Vectorrent · 2024-09-11T10:59:21Z

Strange. I tried your code (i.e. loading from disk), and that fails too. I upgraded to Node v22, and apache-arrow v17.0.0 - with no luck. Not sure what else to try; maybe it's an engine thing? I'm running on Linux.

Anyway, not a huge priority, since I do have a workaround. Just thought it was worth reporting.

kylebarron · 2024-09-11T12:58:34Z

Are you able to slice that data (i.e. take the first 5 rows) and save it as a Parquet file that also fails for you? Then we could check that data in to Git and add it as a test case to this repo.

It's good that reading from IPC works, but I do want to make sure that arrow-js-ffi is stable!

Vectorrent · 2024-09-11T13:15:09Z

I sliced 5 rows with PyArrow, saved them to disk, then tried FFI again with the new file. No dice, it still fails.

Here's the sliced file: https://mega.nz/file/CRsFDJrC#3lRSoohQ1kohnqzX0O0TmVtjrsfgKRgj0KMLzxf2nU8

kylebarron · 2024-09-11T14:34:00Z

Ok, cool, thanks for making that file.

For reference, I find it much easier to zip a Parquet file and share that via github in the issue itself.

Vectorrent · 2024-09-11T15:26:14Z

0320.output.parquet.zip

Oops, didn't realize zip files were supported here. See attached.

kylebarron transferred this issue from kylebarron/parquet-wasm Sep 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[FFI] - RangeError: byte length of BigInt64Array should be a multiple of 8 #129

[FFI] - RangeError: byte length of BigInt64Array should be a multiple of 8 #129

Vectorrent commented Sep 11, 2024

kylebarron commented Sep 11, 2024 •

edited

Loading

Vectorrent commented Sep 11, 2024

kylebarron commented Sep 11, 2024

Vectorrent commented Sep 11, 2024

kylebarron commented Sep 11, 2024

Vectorrent commented Sep 11, 2024

[FFI] - RangeError: byte length of BigInt64Array should be a multiple of 8 #129

[FFI] - RangeError: byte length of BigInt64Array should be a multiple of 8 #129

Comments

Vectorrent commented Sep 11, 2024

kylebarron commented Sep 11, 2024 • edited Loading

Vectorrent commented Sep 11, 2024

kylebarron commented Sep 11, 2024

Vectorrent commented Sep 11, 2024

kylebarron commented Sep 11, 2024

Vectorrent commented Sep 11, 2024

kylebarron commented Sep 11, 2024 •

edited

Loading