Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

failed to exctract row groups from Parquet file #33

Open
guillermovil opened this issue Nov 5, 2024 · 0 comments
Open

failed to exctract row groups from Parquet file #33

guillermovil opened this issue Nov 5, 2024 · 0 comments

Comments

@guillermovil
Copy link

Hello!
Here are the steps I followed to install this extension on a docker with ubuntu and postgres 14

docker run -d --name postgres-container -e TZ=UTC -p 30432:5432 -e POSTGRES_PASSWORD=postgres -v ./data:/var/lib/postgresql/data ubuntu/postgres:14-22.04_beta
docker exec -it postgres-container /bin/bash
apt-get update
apt-get install git
apt-get install build-essential ninja-build cmake
apt-get install libcurl4-openssl-dev libssl-dev uuid-dev zlib1g-dev libpulse-dev

git clone https://github.com/aws/aws-sdk-cpp.git
cd aws-sdk-cpp/
git checkout 1.9.263
git submodule update --init --recursive
mkdir build
cd build
cmake -DBUILD_ONLY="s3;core;config;sts;cognito-identity;transfer;identity-management" -DCMAKE_CXX_FLAGS=-Wno-error=deprecated-declarations ..
make 
sudo make install


git clone https://github.com/apache/arrow.git
cd arrow
git checkout apache-arrow-7.0.1
cd cpp
mkdir build
cd build
cmake  -DARROW_PARQUET=ON -DARROW_S3=ON -DARROW_WITH_SNAPPY=ON ..
make
sudo make install

git clone 	https://github.com/pgspider/parquet_s3_fdw.git
cd parquet_s3_fdw
git checkout v1.0.0	
sudo apt-get install postgresql-server-dev-14
make USE_PGXS=1 install

apt-get update && apt-get install -y python3-pip
pip install awscli
aws config

here config my bucket

pgsql

DROP DATABASE testparquet;
CREATE DATABASE testparquet;
\c testparquet
CREATE EXTENSION parquet_s3_fdw;
CREATE SERVER parquet_s3_srv FOREIGN DATA WRAPPER parquet_s3_fdw OPTIONS (region 'eu-west-3' , endpoint 'https://mybucket.s3.eu-west-3.amazonaws.com');
CREATE USER MAPPING FOR postgres SERVER parquet_s3_srv OPTIONS (user '***', password '***');
CREATE FOREIGN TABLE users (
    id           int options (key 'true'),
    first_name   text,
    last_name    text
)
SERVER parquet_s3_srv
OPTIONS (
    dirname 's3://mybucketname/'
);
select * from users2;

after all that, when I do the select, I get the error:

ERROR: parquet_s3_fdw: failed to exctract row groups from Parquet file: failed to open Parquet file Parquet magic bytes not found in footer. Either the file is corrupted or this is not a parquet file.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant