Skip to content

Latest commit

 

History

History
56 lines (37 loc) · 2.1 KB

README.md

File metadata and controls

56 lines (37 loc) · 2.1 KB

PySpark StructType schema generator from PostgreSQL table schema

This Python program generates a PySpark StructType schema from a PostgreSQL table schema. The program connects to a PostgreSQL database, reads the schema of the specified table, and maps the PostgreSQL data types to the corresponding PySpark data types.

Prerequisites

  • Python 3.x
  • PySpark
  • psycopg2
  • A PostgreSQL database with a table to generate the schema from

Usage

  1. Clone the repository: git clone https://github.com/username/repo.git
  2. Navigate to the directory: cd repo
  3. Edit the config.ini file to specify the PostgreSQL database connection parameters and the name of the table to generate the schema from
  4. Run the program: python generate_schema.py

Configuring the program

The program can be configured by editing the config.ini file. The file contains the following parameters:

  • host: the hostname or IP address of the PostgreSQL server
  • port: the port number of the PostgreSQL server
  • database: the name of the PostgreSQL database
  • user: the username to connect to the PostgreSQL database
  • password: the password to connect to the PostgreSQL database
  • table_name: the name of the table to generate the schema from

Example output

The program generates output similar to the following:

StructType(List(StructField(id,IntegerType,true),StructField(name,StringType,true),StructField(age,IntegerType,true)))

Contributing

Contributions are welcome! Please submit a pull request if you'd like to contribute.

License

This program is licensed under the MIT license. See the LICENSE.md file for details.