Data loading utilities for redshift
Input: CSV(s), s3 bucketname, Tablename, Redshift server, db, credentials
Steps:
-
Generate create table statement (with csvsql) : head -n 500000 filename | csvsql -i postgresql -e iso-8859-1
-
change encoding to utf-8 (with iconv): cat
$f | iconv -c -f ISO-8859-1 -t utf8 > $ {f}_utf8.txt -
optional - split csv into several parts to load faster (with split)
-
copy to s3 (with s3cmd): s3cmd put ${f}_utf8.txt s3://bucketname/
-
connect to redshift instance (with psql):
-
run create table statement
-
run copy statement
Code
- convert to UTF-8 and copy to s3 (optionally, files can be split into smaller files and loaded in parallel)
Shell script:
for f in -a
do
cat
$f | iconv -c -f ISO-8859-1 -t utf8 > $ {f}_utf8.txt echo "processed$f to $ {f}_utf8.txt" s3cmd put ${f}_utf8.txt s3://bucketname/ done