Skip to content

Commit

Permalink
change toc and structure of book
Browse files Browse the repository at this point in the history
  • Loading branch information
Hagellach37 committed Aug 26, 2024
1 parent 4abb049 commit 62b03b5
Show file tree
Hide file tree
Showing 15 changed files with 790 additions and 1,308 deletions.
2 changes: 1 addition & 1 deletion _config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ title: SOTM 2024 ohsome-data-insights Workshop
author: B.Herfort (HeiGIT)
logo: HeiGIT_Logo_compact.svg

# Force re-execution of notebooks on each build.
# Force re-execution of book on each build.
# See https://jupyterbook.org/content/execute.html
execute:
#execute_notebooks: auto
Expand Down
22 changes: 13 additions & 9 deletions _toc.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,20 +6,24 @@ root: intro
parts:
- caption: Getting Started
chapters:
- file: notebooks/00_Getting_Started.ipynb
- file: book/00_motivation.md
- file: book/00_data_structure.md
- file: book/00_partitioning_and_sorting.md
- file: book/00_MinIO_Object_Store.ipynb
- file: book/00_Iceberg_Catalog.ipynb
- caption: Data Extraction
chapters:
- file: notebooks/01a_Data_Extraction_DuckDB_PyIceberg.ipynb
- file: notebooks/01b_Data_Extraction_DuckDB_only.ipynb
- file: book/01a_Data_Extraction_DuckDB_PyIceberg.ipynb
- file: book/01b_Data_Extraction_DuckDB_only.ipynb
- caption: Simple Data Analysis
chapters:
- file: notebooks/02a_buildings_currentness_DuckDB_PyIceberg.ipynb
- file: notebooks/02b_buildings_currentness_DuckDB_only.ipynb
- file: notebooks/attribute_completeness.ipynb
- file: book/02a_buildings_currentness_DuckDB_PyIceberg.ipynb
- file: book/02b_buildings_currentness_DuckDB_only.ipynb
- file: book/02_attribute_completeness.ipynb
- caption: Data Integration
chapters:
- file: notebooks/05_mapillary_data_analysis.ipynb
- file: book/03_mapillary_data_analysis.ipynb
- caption: Advanced Data Analysis
chapters:
- file: notebooks/04b_Country_User_Activity_DuckDB_only.ipynb
- file: notebooks/03b_highways_timeline_DuckDB_only.ipynb
- file: book/04_Country_User_Activity_DuckDB_only.ipynb
- file: book/04_highways_timeline_DuckDB_only.ipynb
41 changes: 41 additions & 0 deletions book/00_Iceberg_Catalog.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "0aabe61e-c6fe-49e5-babf-3108f7aef591",
"metadata": {},
"source": [
"# PyIceberg: Connect to Iceberg Catalog"
]
},
{
"cell_type": "code",
"execution_count": null,
"id": "b79af493-a3cd-4e08-9b3b-4a5fa7e19329",
"metadata": {},
"outputs": [],
"source": []
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3 (ipykernel)",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.12.3"
}
},
"nbformat": 4,
"nbformat_minor": 5
}
Original file line number Diff line number Diff line change
@@ -1,5 +1,13 @@
{
"cells": [
{
"cell_type": "markdown",
"id": "775ea76a-be8d-4abd-b13c-5fb96d54e8da",
"metadata": {},
"source": [
"# DuckDB: Connect to MinIO Object Store"
]
},
{
"cell_type": "code",
"execution_count": 1,
Expand Down Expand Up @@ -35,9 +43,7 @@
"cell_type": "markdown",
"id": "e1589972-7ef6-4a21-9bca-9b5f5303faa3",
"metadata": {},
"source": [
"## Connect to MinIO Object Storage"
]
"source": []
},
{
"cell_type": "code",
Expand Down
11 changes: 11 additions & 0 deletions book/00_data_structure.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
# Data Structure

## General OSM Attributes


## OSM Changeset Attributes


## Geographic Attributes


2 changes: 2 additions & 0 deletions book/00_motivation.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
# Why you should be excited about this workshop

17 changes: 17 additions & 0 deletions book/00_partitioning_and_sorting.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# Partitioning and Sorting

## Geo-sorted ohsome contributions

### Partitions


### Sorting



## Time-sorted ohsome contributions

### Partitions


### Sorting
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@
},
{
"cell_type": "code",
"execution_count": 1,
"execution_count": 2,
"id": "554c5bc9-8962-44ed-96f8-fd34b3efe564",
"metadata": {},
"outputs": [],
Expand All @@ -50,7 +50,7 @@
},
{
"cell_type": "code",
"execution_count": 2,
"execution_count": 3,
"id": "f2e24802-8629-4648-afe5-d4c5f21403df",
"metadata": {},
"outputs": [],
Expand Down Expand Up @@ -78,7 +78,17 @@
},
{
"cell_type": "code",
"execution_count": 13,
"execution_count": null,
"id": "5016e60e-6286-40c0-a530-0ca9c0c9229f",
"metadata": {},
"outputs": [],
"source": [
"!pip install \"pyiceberg[s3fs,duckdb,sql-sqlite,pyarrow]\""
]
},
{
"cell_type": "code",
"execution_count": 5,
"id": "825cd3dc-18d1-48fb-ad47-2945bb3a8c53",
"metadata": {},
"outputs": [],
Expand Down Expand Up @@ -113,15 +123,14 @@
},
{
"cell_type": "code",
"execution_count": 14,
"execution_count": 8,
"id": "872da17b-f490-4a1a-8aec-09ad847db0ea",
"metadata": {},
"outputs": [],
"source": [
"# Set iceberg table\n",
"namespace = 'geo_sort'\n",
"tablename = 'contributions_germany'\n",
"#tablename = 'contributions'\n",
"tablename = 'contributions'\n",
"icebergtable = catalog.load_table((namespace, tablename))\n",
"\n",
"# Define status filter\n",
Expand All @@ -135,9 +144,10 @@
" 'berlin': (13.088345, 52.338271, 13.761161, 52.675509)\n",
"}\n",
"\n",
"selected_region = 'heidelberg'\n",
"selected_region = 'nairobi'\n",
"xmin, ymin, xmax, ymax = bboxes[selected_region]\n",
"area_of_interest_file =f\"../data/{selected_region}.geojson\"\n",
"area_of_interest_file = f\"https://raw.githubusercontent.com/GIScience/sotm-2024-ohsome-data-insights-workshop/main/data/{selected_region}.geojson\"\n",
"\n",
"# Define geometry type filter\n",
"geometry_type = 'Polygon'\n",
Expand Down Expand Up @@ -178,15 +188,15 @@
},
{
"cell_type": "code",
"execution_count": 16,
"execution_count": 7,
"id": "5ab182ee-d069-46cb-a88d-e1a76f7ab1f6",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"download took 9.305 sec.\n"
"download took 198.388 sec.\n"
]
}
],
Expand All @@ -198,9 +208,8 @@
" row_filter=(\n",
" f\"status = '{status}' \"\n",
" f\"and geometry_type = '{geometry_type}' \"\n",
" # ToDO: add bbox query here once available in iceberg table\n",
" #f\"and (bbox.xmax >= {xmin} and bbox.xmin <= {xmax}) \"\n",
" #f\"and (bbox.ymax >= {ymin} and bbox.ymin <= {ymax}) \"\n",
" f\"and (xmax >= {xmin} and xmin <= {xmax}) \"\n",
" f\"and (ymax >= {ymin} and ymin <= {ymax}) \"\n",
" ),\n",
" selected_fields=(\n",
" \"user_id\",\n",
Expand All @@ -209,10 +218,7 @@
" \"valid_from\",\n",
" \"tags\",\n",
" \"geometry\",\n",
" \"bbox\"\n",
" ),\n",
" # ToDO: add bbox query here once available in iceberg table\n",
" limit=1_000_000\n",
").to_duckdb('raw_osm_data',connection=con)\n",
"\n",
"download_time = round(time.time() - start_time, 3)\n",
Expand All @@ -232,15 +238,19 @@
},
{
"cell_type": "code",
"execution_count": 21,
"execution_count": 9,
"id": "d2125907-3028-4e76-ae26-39ac7adf0f94",
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"processing took 0.016 sec.\n"
"ename": "IOException",
"evalue": "IO Error: GDAL Error (4): Failed to open file https://raw.githubusercontent.com/GIScience/sotm-2024-ohsome-data-insights-workshop/main/data/nairobi.geojson: {\"exception_type\":\"IO\",\"exception_message\":\"Cannot open file \\\"https://raw.githubusercontent.com/GIScience/sotm-2024-ohsome-data-insights-workshop/main/data/nairobi.geojson\\\": No such file or directory\",\"errno\":\"2\"}",
"output_type": "error",
"traceback": [
"\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
"\u001b[0;31mIOException\u001b[0m Traceback (most recent call last)",
"Cell \u001b[0;32mIn[9], line 20\u001b[0m\n\u001b[1;32m 2\u001b[0m start_time \u001b[38;5;241m=\u001b[39m time\u001b[38;5;241m.\u001b[39mtime()\n\u001b[1;32m 4\u001b[0m query \u001b[38;5;241m=\u001b[39m \u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\"\"\u001b[39m\n\u001b[1;32m 5\u001b[0m \u001b[38;5;124mDROP TABLE IF EXISTS osm_data;\u001b[39m\n\u001b[1;32m 6\u001b[0m \u001b[38;5;124mCREATE TABLE osm_data AS\u001b[39m\n\u001b[0;32m (...)\u001b[0m\n\u001b[1;32m 18\u001b[0m \u001b[38;5;124m;\u001b[39m\n\u001b[1;32m 19\u001b[0m \u001b[38;5;124m\"\"\"\u001b[39m\n\u001b[0;32m---> 20\u001b[0m \u001b[43mcon\u001b[49m\u001b[38;5;241;43m.\u001b[39;49m\u001b[43msql\u001b[49m\u001b[43m(\u001b[49m\u001b[43mquery\u001b[49m\u001b[43m)\u001b[49m\n\u001b[1;32m 22\u001b[0m processing_time \u001b[38;5;241m=\u001b[39m \u001b[38;5;28mround\u001b[39m(time\u001b[38;5;241m.\u001b[39mtime() \u001b[38;5;241m-\u001b[39m start_time, \u001b[38;5;241m3\u001b[39m)\n\u001b[1;32m 23\u001b[0m \u001b[38;5;28mprint\u001b[39m(\u001b[38;5;124mf\u001b[39m\u001b[38;5;124m\"\u001b[39m\u001b[38;5;124mprocessing took \u001b[39m\u001b[38;5;132;01m{\u001b[39;00mprocessing_time\u001b[38;5;132;01m}\u001b[39;00m\u001b[38;5;124m sec.\u001b[39m\u001b[38;5;124m\"\u001b[39m)\n",
"\u001b[0;31mIOException\u001b[0m: IO Error: GDAL Error (4): Failed to open file https://raw.githubusercontent.com/GIScience/sotm-2024-ohsome-data-insights-workshop/main/data/nairobi.geojson: {\"exception_type\":\"IO\",\"exception_message\":\"Cannot open file \\\"https://raw.githubusercontent.com/GIScience/sotm-2024-ohsome-data-insights-workshop/main/data/nairobi.geojson\\\": No such file or directory\",\"errno\":\"2\"}"
]
}
],
Expand All @@ -260,8 +270,6 @@
" and tags['building'][1] is not null\n",
" and tags['building'][1] != 'no'\n",
" -- spatial filtering part\n",
" and (a.bbox.xmax >= {xmin} AND a.bbox.xmin <= {xmax})\n",
" and (a.bbox.ymax >= {ymin} AND a.bbox.ymin <= {ymax})\n",
" and ST_Intersects(st_GeomFromText(a.geometry), aoi.geom)\n",
")\n",
";\n",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -126,14 +126,14 @@
},
{
"cell_type": "code",
"execution_count": 4,
"execution_count": 8,
"id": "872da17b-f490-4a1a-8aec-09ad847db0ea",
"metadata": {},
"outputs": [],
"source": [
"# Set s3 path for parquet input data\n",
"parquet_data_path = \"s3a://heigit-ohsome-sotm24/data/geo_sort_ext/contributions_germany/**\"\n",
"#parquet_data_path = \"s3a://heigit-ohsome-sotm24/data/geo_sort_ext/contributions/**\"\n",
"#parquet_data_path = \"s3a://heigit-ohsome-sotm24/data/geo_sort_ext/contributions_germany/**\"\n",
"parquet_data_path = \"s3a://heigit-ohsome-sotm24/data/geo_sort_ext/contributions/**\"\n",
"\n",
"# Define status filter\n",
"status = 'latest'\n",
Expand All @@ -146,7 +146,7 @@
" 'berlin': (13.088345, 52.338271, 13.761161, 52.675509)\n",
"}\n",
"\n",
"selected_region = 'heidelberg'\n",
"selected_region = 'nairobi'\n",
"xmin, ymin, xmax, ymax = bboxes[selected_region]\n",
"area_of_interest_file =f\"../data/{selected_region}.geojson\"\n",
"\n",
Expand Down Expand Up @@ -189,14 +189,14 @@
},
{
"cell_type": "code",
"execution_count": 5,
"execution_count": 9,
"id": "d2125907-3028-4e76-ae26-39ac7adf0f94",
"metadata": {},
"outputs": [
{
"data": {
"application/vnd.jupyter.widget-view+json": {
"model_id": "e6953f8db31349a98f3de95b19c9362e",
"model_id": "6f78b1ecc1374eabaf3e9128fc96d1ba",
"version_major": 2,
"version_minor": 0
},
Expand All @@ -211,7 +211,7 @@
"name": "stdout",
"output_type": "stream",
"text": [
"download took 6.382 sec.\n"
"download took 34.878 sec.\n"
]
}
],
Expand Down
Loading

0 comments on commit 62b03b5

Please sign in to comment.