Hudi issues check

Repository for Python scripts to check Hudi issues reproducibility.

Setup

PyCharm is used for running scrtipts. Spark is run locally with one thread.

Download corresponding Apache Spark version with Hadoop and extract to some folder, SPARK_HOME.
Prepare Python virtual environment:
- python3 -m venv [env_name]
- source [env_name]/bin/activate
- pip install -r requirements.txt
Set SPARK_HOME, SPARK_HOST_URL and SPARK_WAREHOUSE_PATH in the .env file. Examples are presented in .env-example.
For PyCharm, add pyspark.zip and py4j-*-src.zip from {SPARK_HOME}/python/lib/ into Project >> Project Structure >> Content root.
Build intended version of Hudi project and copy hudi-spark*.*-bundle_2.1*-*.jar to {SPARK_HOME}/jars/.
Start Spark cluster locally, {SPARK_HOME}/sbin/start-all.sh.

Run script, which corresponds to an issue, from configured PyCharm.

To switch versions:

Change SPARK_HOME in the .env file.
For PyCharm, add related pyspark.zip and py4j-*-src.zip to Content root.
Build new version of Hudi project and copy new hudi-spark*.*-bundle_2.1*-*.jar to corresponding Spark home directory.
Stop previously started Spark cluster, and start proper version of Spark.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
HUDI-5302		HUDI-5302
HUDI-7709		HUDI-7709
HUDI-7717		HUDI-7717
HUDI-7847		HUDI-7847
HUDI-7938		HUDI-7938
HUDI-7952		HUDI-7952
HUDI-7983		HUDI-7983
HUDI-8173		HUDI-8173
HUDI-8394		HUDI-8394
case-sensitivity		case-sensitivity
load-hudi-table		load-hudi-table
utils		utils
.env-example		.env-example
.gitignore		.gitignore
README.md		README.md
requirements.txt		requirements.txt