-
Notifications
You must be signed in to change notification settings - Fork 28
/
Copy pathLesson2-geopandas-basics.html
798 lines (654 loc) · 49.5 KB
/
Lesson2-geopandas-basics.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
<!DOCTYPE html>
<!--[if IE 8]><html class="no-js lt-ie9" lang="en" > <![endif]-->
<!--[if gt IE 8]><!--> <html class="no-js" lang="en" > <!--<![endif]-->
<head>
<meta charset="utf-8">
<script type="text/javascript">
var _gaq = _gaq || [];
_gaq.push(['_setAccount', 'UA-88382509-1']);
_gaq.push(['_trackPageview']);
(function() {
var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
})();
</script>
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Introduction to Geopandas — GeoPython - AutoGIS 1 documentation</title>
<link rel="stylesheet" href="_static/css/theme.css" type="text/css" />
<link rel="index" title="Index"
href="genindex.html"/>
<link rel="search" title="Search" href="search.html"/>
<link rel="top" title="GeoPython - AutoGIS 1 documentation" href="index.html"/>
<link rel="next" title="Exercise 2" href="Exercise-2.html"/>
<link rel="prev" title="Introduction to Pandas" href="Lesson2-pandas-intro.html"/>
<script src="_static/js/modernizr.min.js"></script>
</head>
<body class="wy-body-for-nav" role="document">
<div class="wy-grid-for-nav">
<nav data-toggle="wy-nav-shift" class="wy-nav-side">
<div class="wy-side-scroll">
<div class="wy-side-nav-search">
<a href="index.html" class="icon icon-home"> GeoPython - AutoGIS
<img src="_static/logo_hy_geo_135.png" class="logo" />
</a>
<div class="version">
2016 Autumn
</div>
<div role="search">
<form id="rtd-search-form" class="wy-form" action="search.html" method="get">
<input type="text" name="q" placeholder="Search docs" />
<input type="hidden" name="check_keywords" value="yes" />
<input type="hidden" name="area" value="default" />
</form>
</div>
</div>
<div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
<p class="caption"><span class="caption-text">Course information</span></p>
<ul>
<li class="toctree-l1"><a class="reference internal" href="course-info.html">General info</a></li>
<li class="toctree-l1"><a class="reference internal" href="Installing_Anacondas_GIS.html">Installing Python + GIS</a></li>
<li class="toctree-l1"><a class="reference internal" href="License-terms.html">License and terms of usage</a></li>
</ul>
<p class="caption"><span class="caption-text">Lesson 1</span></p>
<ul>
<li class="toctree-l1"><a class="reference internal" href="Lesson1-Intro-Python-GIS.html">Introduction to Python GIS</a></li>
<li class="toctree-l1"><a class="reference internal" href="Lesson1-Geometric-Objects.html">Geometric Objects - Spatial Data Model</a></li>
<li class="toctree-l1"><a class="reference internal" href="Exercise-1.html">Exercise 1</a></li>
</ul>
<p class="caption"><span class="caption-text">Lesson 2</span></p>
<ul class="current">
<li class="toctree-l1"><a class="reference internal" href="Lesson2-overview.html">Lesson 2 Overview</a></li>
<li class="toctree-l1"><a class="reference internal" href="Lesson2-download-data.html">Download datasets</a></li>
<li class="toctree-l1"><a class="reference internal" href="Lesson2-overview-pandas-geopandas.html">Pandas and Geopandas -modules</a></li>
<li class="toctree-l1"><a class="reference internal" href="Lesson2-pandas-intro.html">Introduction to Pandas</a></li>
<li class="toctree-l1 current"><a class="current reference internal" href="#">Introduction to Geopandas</a><ul>
<li class="toctree-l2"><a class="reference internal" href="#reading-a-shapefile">Reading a Shapefile</a></li>
<li class="toctree-l2"><a class="reference internal" href="#coordinate-reference-system-crs">Coordinate reference system (CRS)</a></li>
<li class="toctree-l2"><a class="reference internal" href="#writing-a-shapefile">Writing a Shapefile</a></li>
<li class="toctree-l2"><a class="reference internal" href="#geometries-in-geopandas">Geometries in Geopandas</a></li>
<li class="toctree-l2"><a class="reference internal" href="#creating-geometries-into-a-geodataframe">Creating geometries into a GeoDataFrame</a></li>
<li class="toctree-l2"><a class="reference internal" href="#pro-tips-optional-but-recommended">Pro -tips (optional but recommended)</a><ul>
<li class="toctree-l3"><a class="reference internal" href="#grouping-data">Grouping data</a></li>
</ul>
</li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="Exercise-2.html">Exercise 2</a></li>
</ul>
<p class="caption"><span class="caption-text">Lesson 3</span></p>
<ul>
<li class="toctree-l1"><a class="reference internal" href="Lesson3-overview.html">Lesson 3 Overview</a></li>
<li class="toctree-l1"><a class="reference internal" href="Lesson3-geocoding.html">Geocoding</a></li>
<li class="toctree-l1"><a class="reference internal" href="Lesson3-table-join.html">Table join</a></li>
<li class="toctree-l1"><a class="reference internal" href="Lesson3-projections.html">Re-projecting data</a></li>
<li class="toctree-l1"><a class="reference internal" href="Lesson3-point-in-polygon.html">Point in Polygon & Intersect</a></li>
<li class="toctree-l1"><a class="reference internal" href="Lesson3-spatial-join.html">Spatial join</a></li>
<li class="toctree-l1"><a class="reference internal" href="Exercise-3.html">Exercise 3</a></li>
</ul>
<p class="caption"><span class="caption-text">Lesson 4</span></p>
<ul>
<li class="toctree-l1"><a class="reference internal" href="Lesson4-overview.html">Lesson 4 Overview</a></li>
<li class="toctree-l1"><a class="reference internal" href="Lesson4-download-data.html">Download datasets</a></li>
<li class="toctree-l1"><a class="reference internal" href="Lesson4-geometric-operations.html">Geometric operations</a></li>
<li class="toctree-l1"><a class="reference internal" href="Lesson4-reclassify.html">Data reclassification</a></li>
<li class="toctree-l1"><a class="reference internal" href="Lesson4-nearest-neighbour.html">Nearest Neighbour Analysis</a></li>
<li class="toctree-l1"><a class="reference internal" href="Exercise-4.html">Exercise 4</a></li>
</ul>
<p class="caption"><span class="caption-text">Lesson 5</span></p>
<ul>
<li class="toctree-l1"><a class="reference internal" href="Lesson5-overview.html">Lesson 5 Overview</a></li>
<li class="toctree-l1"><a class="reference internal" href="Lesson5-download-data.html">Download datasets</a></li>
<li class="toctree-l1"><a class="reference internal" href="Lesson5-static-maps.html">Static maps</a></li>
<li class="toctree-l1"><a class="reference internal" href="Lesson5-interactive-map-bokeh.html">Interactive maps with Bokeh</a></li>
<li class="toctree-l1"><a class="reference internal" href="Lesson5-share-on-github.html">Sharing interactive plots on GitHub</a></li>
<li class="toctree-l1"><a class="reference internal" href="Lesson5-interactive-map-folium.html">Interactive maps on Leaflet</a></li>
<li class="toctree-l1"><a class="reference internal" href="Lesson5-World-3D.html">Inspiration: World 3D</a></li>
<li class="toctree-l1"><a class="reference internal" href="Exercise-5.html">Exercise 5</a></li>
</ul>
<p class="caption"><span class="caption-text">Lesson 6</span></p>
<ul>
<li class="toctree-l1"><a class="reference internal" href="Lesson6-overview.html">Lesson 6 Overview</a></li>
<li class="toctree-l1"><a class="reference internal" href="Lesson6-arcpy.html">Python in ArcGIS</a></li>
<li class="toctree-l1"><a class="reference internal" href="Lesson6-toolbox.html">ArcGIS Toolbox</a></li>
<li class="toctree-l1"><a class="reference internal" href="Lesson6-arcpy-script.html">Writing arcpy scripts</a></li>
<li class="toctree-l1"><a class="reference internal" href="Lesson6-run-the-tool.html">Running the Python script from ArcGIS</a></li>
</ul>
<p class="caption"><span class="caption-text">Lesson 7</span></p>
<ul>
<li class="toctree-l1"><a class="reference internal" href="Lesson7-overview.html">Lesson 7 Overview</a></li>
<li class="toctree-l1"><a class="reference internal" href="Lesson7-download.html">Download data</a></li>
<li class="toctree-l1"><a class="reference internal" href="Lesson7-read-raster.html">Reading raster files with GDAL</a></li>
<li class="toctree-l1"><a class="reference internal" href="Lesson7-read-raster-array.html">Reading raster as a numerical array</a></li>
<li class="toctree-l1"><a class="reference internal" href="Lesson7-gdal-utilities.html">GDAL command line tools</a></li>
<li class="toctree-l1"><a class="reference internal" href="Exercise-7.html">Exercise 7</a></li>
</ul>
<p class="caption"><span class="caption-text">Lesson 8</span></p>
<ul>
<li class="toctree-l1"><a class="reference internal" href="Lesson8-network-analysis.html">Network analysis in Python</a></li>
</ul>
<p class="caption"><span class="caption-text">Final Assignment</span></p>
<ul>
<li class="toctree-l1"><a class="reference internal" href="Final-assignment.html">Final assignment</a></li>
</ul>
<p class="caption"><span class="caption-text">Map Challenge 2016</span></p>
<ul>
<li class="toctree-l1"><a class="reference internal" href="map-challenge.html">Map Challenge 2016</a></li>
</ul>
</div>
</div>
</nav>
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap">
<nav class="wy-nav-top" role="navigation" aria-label="top navigation">
<i data-toggle="wy-nav-top" class="fa fa-bars"></i>
<a href="index.html">GeoPython - AutoGIS</a>
</nav>
<div class="wy-nav-content">
<div class="rst-content">
<div role="navigation" aria-label="breadcrumbs navigation">
<ul class="wy-breadcrumbs">
<li><a href="index.html">Docs</a> »</li>
<li>Introduction to Geopandas</li>
<li class="wy-breadcrumbs-aside">
<a href="https://github.com/Automating-GIS-processes/2016/blob/master/source/Lesson2-geopandas-basics.rst" class="fa fa-github"> Edit on GitHub</a>
</li>
</ul>
<hr/>
</div>
<div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
<div itemprop="articleBody">
<div class="section" id="introduction-to-geopandas">
<h1>Introduction to Geopandas<a class="headerlink" href="#introduction-to-geopandas" title="Permalink to this headline">¶</a></h1>
<div class="section" id="reading-a-shapefile">
<h2>Reading a Shapefile<a class="headerlink" href="#reading-a-shapefile" title="Permalink to this headline">¶</a></h2>
<p>Spatial data can be read easily with geopandas using <code class="docutils literal"><span class="pre">gpd.from_file()</span></code>
-function:</p>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="go"># Import necessary modules</span>
<span class="gp">In [1]: </span><span class="kn">import</span> <span class="nn">geopandas</span> <span class="kn">as</span> <span class="nn">gpd</span>
<span class="go"># Set filepath (fix path relative to yours)</span>
<span class="gp">In [2]: </span><span class="n">fp</span> <span class="o">=</span> <span class="s2">"/home/geo/Data/DAMSELFISH_distributions.shp"</span>
<span class="go"># Read file using gpd.read_file()</span>
<span class="gp">In [3]: </span><span class="n">data</span> <span class="o">=</span> <span class="n">gpd</span><span class="o">.</span><span class="n">read_file</span><span class="p">(</span><span class="n">fp</span><span class="p">)</span>
</pre></div>
</div>
<ul class="simple">
<li>Let’s see what datatype is our ‘data’ variable</li>
</ul>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [4]: </span><span class="nb">type</span><span class="p">(</span><span class="n">data</span><span class="p">)</span>
<span class="gh">Out[4]: </span><span class="go">geopandas.geodataframe.GeoDataFrame</span>
</pre></div>
</div>
<p>Okey so from the above we can see that our <code class="docutils literal"><span class="pre">data</span></code> -variable is a
<strong>GeoDataFrame</strong>. GeoDataFrame extends the functionalities of
<strong>pandas.DataFrame</strong> in a way that it is possible to use and handle
spatial data within pandas (hence the name geopandas). GeoDataFrame have
some special features and functions that are useful in GIS.</p>
<ul class="simple">
<li>Let’s take a look at our data and print the first 5 rows using the
<code class="docutils literal"><span class="pre">head()</span></code> -function prints the first 5 rows by default</li>
</ul>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [5]: </span><span class="n">data</span><span class="o">.</span><span class="n">head</span><span class="p">()</span>
<span class="gh">Out[5]: </span><span class="go"></span>
<span class="go"> id_no binomial origin compiler year \</span>
<span class="go">0 183963.0 Stegastes leucorus 1 IUCN 2010 </span>
<span class="go">1 183963.0 Stegastes leucorus 1 IUCN 2010 </span>
<span class="go">2 183963.0 Stegastes leucorus 1 IUCN 2010 </span>
<span class="go">3 183793.0 Chromis intercrusma 1 IUCN 2010 </span>
<span class="go">4 183793.0 Chromis intercrusma 1 IUCN 2010 </span>
<span class="go"> citation source dist_comm island \</span>
<span class="go">0 International Union for Conservation of Nature... None None None </span>
<span class="go">1 International Union for Conservation of Nature... None None None </span>
<span class="go">2 International Union for Conservation of Nature... None None None </span>
<span class="go">3 International Union for Conservation of Nature... None None None </span>
<span class="go">4 International Union for Conservation of Nature... None None None </span>
<span class="go"> subspecies ... kingdom_na \</span>
<span class="go">0 None ... ANIMALIA </span>
<span class="go">1 None ... ANIMALIA </span>
<span class="go">2 None ... ANIMALIA </span>
<span class="go">3 None ... ANIMALIA </span>
<span class="go">4 None ... ANIMALIA </span>
<span class="go"> phylum_nam class_name order_name family_nam genus_name \</span>
<span class="go">0 CHORDATA ACTINOPTERYGII PERCIFORMES POMACENTRIDAE Stegastes </span>
<span class="go">1 CHORDATA ACTINOPTERYGII PERCIFORMES POMACENTRIDAE Stegastes </span>
<span class="go">2 CHORDATA ACTINOPTERYGII PERCIFORMES POMACENTRIDAE Stegastes </span>
<span class="go">3 CHORDATA ACTINOPTERYGII PERCIFORMES POMACENTRIDAE Chromis </span>
<span class="go">4 CHORDATA ACTINOPTERYGII PERCIFORMES POMACENTRIDAE Chromis </span>
<span class="go"> species_na category testCol \</span>
<span class="go">0 leucorus VU 1 </span>
<span class="go">1 leucorus VU 1 </span>
<span class="go">2 leucorus VU 1 </span>
<span class="go">3 intercrusma LC 1 </span>
<span class="go">4 intercrusma LC 1 </span>
<span class="go"> geometry </span>
<span class="go">0 POLYGON ((-115.6437454219999 29.71392059300007... </span>
<span class="go">1 POLYGON ((-105.589950704 21.89339825500002, -1... </span>
<span class="go">2 POLYGON ((-111.159618439 19.01535626700007, -1... </span>
<span class="go">3 POLYGON ((-80.86500229899997 -0.77894492099994... </span>
<span class="go">4 POLYGON ((-67.33922225599997 -55.6761029239999... </span>
<span class="go">[5 rows x 25 columns]</span>
</pre></div>
</div>
<ul class="simple">
<li>Let’s also take a look how our data looks like on a map. If you just
want to explore your data on a map, you can use <code class="docutils literal"><span class="pre">.plot()</span></code> -function
in geopandas that creates a simple map out of the data (uses
matplotlib as a backend):</li>
</ul>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [6]: </span><span class="n">data</span><span class="o">.</span><span class="n">plot</span><span class="p">();</span>
</pre></div>
</div>
<a class="reference internal image-reference" href="_images/damselfish.png"><img alt="_images/damselfish.png" src="_images/damselfish.png" style="width: 5in;" /></a>
</div>
<div class="section" id="coordinate-reference-system-crs">
<h2>Coordinate reference system (CRS)<a class="headerlink" href="#coordinate-reference-system-crs" title="Permalink to this headline">¶</a></h2>
<p>GeoDataFrame that is read from a Shapefile contains <em>always</em> (well not
always but should) information about the coordinate system in which the
data is projected.</p>
<ul class="simple">
<li>We can see the current coordinate reference system from <code class="docutils literal"><span class="pre">.crs</span></code>
attribute:</li>
</ul>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [7]: </span><span class="n">data</span><span class="o">.</span><span class="n">crs</span>
<span class="gh">Out[7]: </span><span class="go">{'init': 'epsg:4326'}</span>
</pre></div>
</div>
<p>Okey, so from this we can see that the data is something called
<strong>epsg:4326</strong>. The EPSG number (<em>“European Petroleum Survey Group”</em>) is
a code that tells about the coordinate system of the dataset. “<a class="reference external" href="http://www.epsg.org/">EPSG
Geodetic Parameter Dataset</a> is a collection of
definitions of coordinate reference systems and coordinate
transformations which may be global, regional, national or local in
application”. EPSG-number 4326 that we have here belongs to the WGS84
coordinate system (i.e. coordinates are in decimal degrees (lat, lon)).
You can check easily different epsg-codes from <a class="reference external" href="http://spatialreference.org/ref/epsg/">this
website</a>.</p>
</div>
<div class="section" id="writing-a-shapefile">
<h2>Writing a Shapefile<a class="headerlink" href="#writing-a-shapefile" title="Permalink to this headline">¶</a></h2>
<p>Writing a new Shapefile is also something that is needed frequently.</p>
<ul class="simple">
<li>Let’s select 50 first rows of the input data and write those into a
new Shapefile by first selecting the data using index slicing and
then write the selection into a Shapefile with <code class="docutils literal"><span class="pre">gpd.to_file()</span></code>
-function:</li>
</ul>
<div class="code python highlight-default"><div class="highlight"><pre><span></span><span class="c1"># Create a output path for the data</span>
<span class="n">out</span> <span class="o">=</span> <span class="s2">r"/home/geo/Data/DAMSELFISH_distributions_SELECTION.shp"</span>
<span class="c1"># Select first 50 rows</span>
<span class="n">selection</span> <span class="o">=</span> <span class="n">data</span><span class="p">[</span><span class="mi">0</span><span class="p">:</span><span class="mi">50</span><span class="p">]</span>
<span class="c1"># Write those rows into a new Shapefile (the default output file format is Shapefile)</span>
<span class="n">selection</span><span class="o">.</span><span class="n">to_file</span><span class="p">(</span><span class="n">out</span><span class="p">)</span>
</pre></div>
</div>
<p><strong>Task:</strong> Open the Shapefile now in QGIS that has been installed into
our computer instance, and see how the data looks like.</p>
</div>
<div class="section" id="geometries-in-geopandas">
<h2>Geometries in Geopandas<a class="headerlink" href="#geometries-in-geopandas" title="Permalink to this headline">¶</a></h2>
<p>Geopandas takes advantage of Shapely’s geometric objects. Geometries are
stored in a column called <em>geometry</em> that is a default column name for
storing geometric information in geopandas.</p>
<ul class="simple">
<li>Let’s print the first 5 rows of the column ‘geometry’:</li>
</ul>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="go"># It is possible to use only specific columns by specifying the column name within square brackets []</span>
<span class="gp">In [8]: </span><span class="n">data</span><span class="p">[</span><span class="s1">'geometry'</span><span class="p">]</span><span class="o">.</span><span class="n">head</span><span class="p">()</span>
<span class="gh">Out[8]: </span><span class="go"></span>
<span class="go">0 POLYGON ((-115.6437454219999 29.71392059300007...</span>
<span class="go">1 POLYGON ((-105.589950704 21.89339825500002, -1...</span>
<span class="go">2 POLYGON ((-111.159618439 19.01535626700007, -1...</span>
<span class="go">3 POLYGON ((-80.86500229899997 -0.77894492099994...</span>
<span class="go">4 POLYGON ((-67.33922225599997 -55.6761029239999...</span>
<span class="go">Name: geometry, dtype: object</span>
</pre></div>
</div>
<p>Since spatial data is stored as Shapely objects, <strong>it is possible to use
all of the functionalities of Shapely module</strong> that we practiced
earlier.</p>
<ul class="simple">
<li>Let’s print the areas of the first 5 polygons:</li>
</ul>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="go"># Make a selection that contains only the first five rows</span>
<span class="gp">In [9]: </span><span class="n">selection</span> <span class="o">=</span> <span class="n">data</span><span class="p">[</span><span class="mi">0</span><span class="p">:</span><span class="mi">5</span><span class="p">]</span>
</pre></div>
</div>
<ul class="simple">
<li>We can iterate over the selected rows using a specific
<code class="docutils literal"><span class="pre">.iterrows()</span></code> -function in (geo)pandas:</li>
</ul>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [10]: </span><span class="k">for</span> <span class="n">index</span><span class="p">,</span> <span class="n">row</span> <span class="ow">in</span> <span class="n">selection</span><span class="o">.</span><span class="n">iterrows</span><span class="p">():</span>
<span class="gp"> ....: </span> <span class="n">poly_area</span> <span class="o">=</span> <span class="n">row</span><span class="p">[</span><span class="s1">'geometry'</span><span class="p">]</span><span class="o">.</span><span class="n">area</span>
<span class="gp"> ....: </span> <span class="k">print</span><span class="p">(</span><span class="s2">"Polygon area at index {0} is: {1:.3f}"</span><span class="o">.</span><span class="n">format</span><span class="p">(</span><span class="n">index</span><span class="p">,</span> <span class="n">poly_area</span><span class="p">))</span>
<span class="gp"> ....: </span>
<span class="go">Polygon area at index 0 is: 19.396</span>
<span class="go">Polygon area at index 1 is: 6.146</span>
<span class="go">Polygon area at index 2 is: 2.697</span>
<span class="go">Polygon area at index 3 is: 87.461</span>
<span class="go">Polygon area at index 4 is: 0.001</span>
</pre></div>
</div>
<ul class="simple">
<li>Let’s create a new column into our GeoDataFrame where we calculate
and store the areas individual polygons:</li>
</ul>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="go"># Empty column for area</span>
<span class="gp">In [11]: </span><span class="n">data</span><span class="p">[</span><span class="s1">'area'</span><span class="p">]</span> <span class="o">=</span> <span class="bp">None</span>
</pre></div>
</div>
<ul class="simple">
<li>Let’s iterate over the rows and calculate the areas</li>
</ul>
<div class="code python highlight-default"><div class="highlight"><pre><span></span><span class="c1"># Iterate rows one at the time</span>
<span class="k">for</span> <span class="n">index</span><span class="p">,</span> <span class="n">row</span> <span class="ow">in</span> <span class="n">data</span><span class="o">.</span><span class="n">iterrows</span><span class="p">():</span>
<span class="c1"># Update the value in 'area' column with area information at index</span>
<span class="n">data</span><span class="o">.</span><span class="n">loc</span><span class="p">[</span><span class="n">index</span><span class="p">,</span> <span class="s1">'area'</span><span class="p">]</span> <span class="o">=</span> <span class="n">row</span><span class="p">[</span><span class="s1">'geometry'</span><span class="p">]</span><span class="o">.</span><span class="n">area</span>
</pre></div>
</div>
<ul class="simple">
<li>Let’s see the first 2 rows of our ‘area’ column</li>
</ul>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [12]: </span><span class="n">data</span><span class="p">[</span><span class="s1">'area'</span><span class="p">]</span><span class="o">.</span><span class="n">head</span><span class="p">(</span><span class="mi">2</span><span class="p">)</span>
<span class="gh">Out[12]: </span><span class="go"></span>
<span class="go">0 19.3963</span>
<span class="go">1 6.1459</span>
<span class="go">Name: area, dtype: object</span>
</pre></div>
</div>
<ul class="simple">
<li>Let’s check what is the min and the max of those areas using
familiar functions from our previous numpy lessions</li>
</ul>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="go"># Maximum area</span>
<span class="gp">In [13]: </span><span class="n">max_area</span> <span class="o">=</span> <span class="n">data</span><span class="p">[</span><span class="s1">'area'</span><span class="p">]</span><span class="o">.</span><span class="n">max</span><span class="p">()</span>
<span class="go"># Minimum area</span>
<span class="gp">In [14]: </span><span class="n">min_area</span> <span class="o">=</span> <span class="n">data</span><span class="p">[</span><span class="s1">'area'</span><span class="p">]</span><span class="o">.</span><span class="n">mean</span><span class="p">()</span>
<span class="gp">In [15]: </span><span class="k">print</span><span class="p">(</span><span class="s2">"Max area: </span><span class="si">%s</span><span class="se">\n</span><span class="s2">Mean area: </span><span class="si">%s</span><span class="s2">"</span> <span class="o">%</span> <span class="p">(</span><span class="nb">round</span><span class="p">(</span><span class="n">max_area</span><span class="p">,</span> <span class="mi">2</span><span class="p">),</span> <span class="nb">round</span><span class="p">(</span><span class="n">min_area</span><span class="p">,</span> <span class="mi">2</span><span class="p">)))</span>
<span class="go">Max area: 1493.2</span>
<span class="go">Mean area: 19.96</span>
</pre></div>
</div>
</div>
<div class="section" id="creating-geometries-into-a-geodataframe">
<h2>Creating geometries into a GeoDataFrame<a class="headerlink" href="#creating-geometries-into-a-geodataframe" title="Permalink to this headline">¶</a></h2>
<p>Since geopandas takes advantage of Shapely geometric objects it is
possible to create a Shapefile from a scratch by passing Shapely’s
geometric objects into the GeoDataFrame. This is useful as it makes it
easy to convert e.g. a text file that contains coordinates into a
Shapefile.</p>
<ul class="simple">
<li>Let’s create an empty <code class="docutils literal"><span class="pre">GeoDataFrame</span></code>.</li>
</ul>
<div class="code python highlight-default"><div class="highlight"><pre><span></span><span class="c1"># Import necessary modules first</span>
<span class="kn">import</span> <span class="nn">pandas</span> <span class="k">as</span> <span class="nn">pd</span>
<span class="kn">import</span> <span class="nn">geopandas</span> <span class="k">as</span> <span class="nn">gpd</span>
<span class="kn">from</span> <span class="nn">shapely.geometry</span> <span class="k">import</span> <span class="n">Point</span><span class="p">,</span> <span class="n">Polygon</span>
<span class="kn">import</span> <span class="nn">fiona</span>
<span class="c1"># Create an empty geopandas GeoDataFrame</span>
<span class="n">newdata</span> <span class="o">=</span> <span class="n">gpd</span><span class="o">.</span><span class="n">GeoDataFrame</span><span class="p">()</span>
</pre></div>
</div>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="go"># Let's see what's inside</span>
<span class="gp">In [16]: </span><span class="n">newdata</span>
<span class="gh">Out[16]: </span><span class="go"></span>
<span class="go">Empty GeoDataFrame</span>
<span class="go">Columns: []</span>
<span class="go">Index: []</span>
</pre></div>
</div>
<p>The GeoDataFrame is empty since we haven’t placed any data inside.</p>
<ul class="simple">
<li>Let’s create a new column called <code class="docutils literal"><span class="pre">geometry</span></code> that will contain our
Shapely objects:</li>
</ul>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="go"># Create a new column called 'geometry' to the GeoDataFrame</span>
<span class="gp">In [17]: </span><span class="n">newdata</span><span class="p">[</span><span class="s1">'geometry'</span><span class="p">]</span> <span class="o">=</span> <span class="bp">None</span>
<span class="go"># Let's see what's inside</span>
<span class="gp">In [18]: </span><span class="n">newdata</span>
<span class="gh">Out[18]: </span><span class="go"></span>
<span class="go">Empty GeoDataFrame</span>
<span class="go">Columns: [geometry]</span>
<span class="go">Index: []</span>
</pre></div>
</div>
<p>Now we have a geometry column in our GeoDataFrame but we don’t have any
data yet.</p>
<ul class="simple">
<li>Let’s create a Shapely Polygon repsenting the Helsinki Senate square
that we can insert to our GeoDataFrame:</li>
</ul>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="go"># Coordinates of the Helsinki Senate square in Decimal Degrees</span>
<span class="gp">In [19]: </span><span class="n">coordinates</span> <span class="o">=</span> <span class="p">[(</span><span class="mf">24.950899</span><span class="p">,</span> <span class="mf">60.169158</span><span class="p">),</span> <span class="p">(</span><span class="mf">24.953492</span><span class="p">,</span> <span class="mf">60.169158</span><span class="p">),</span> <span class="p">(</span><span class="mf">24.953510</span><span class="p">,</span> <span class="mf">60.170104</span><span class="p">),</span> <span class="p">(</span><span class="mf">24.950958</span><span class="p">,</span> <span class="mf">60.169990</span><span class="p">)]</span>
<span class="go"># Create a Shapely polygon from the coordinate-tuple list</span>
<span class="gp">In [20]: </span><span class="n">poly</span> <span class="o">=</span> <span class="n">Polygon</span><span class="p">(</span><span class="n">coordinates</span><span class="p">)</span>
<span class="go"># Let's see what we have</span>
<span class="gp">In [21]: </span><span class="n">poly</span>
<span class="gh">Out[21]: </span><span class="go"><shapely.geometry.polygon.Polygon at 0xf05a160></span>
</pre></div>
</div>
<p>Okey, so now we have appropriate Polygon -object.</p>
<ul class="simple">
<li>Let’s insert the polygon into our ‘geometry’ column in our
GeoDataFrame:</li>
</ul>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="go"># Insert the polygon into 'geometry' -column at index 0</span>
<span class="gp">In [22]: </span><span class="n">newdata</span><span class="o">.</span><span class="n">loc</span><span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="s1">'geometry'</span><span class="p">]</span> <span class="o">=</span> <span class="n">poly</span>
<span class="go"># Let's see what we have now</span>
<span class="gp">In [23]: </span><span class="n">newdata</span>
<span class="gh">Out[23]: </span><span class="go"></span>
<span class="go"> geometry</span>
<span class="go">0 POLYGON ((24.950899 60.169158, 24.953492 60.16...</span>
</pre></div>
</div>
<p>Now we have a GeoDataFrame with Polygon that we can export to a
Shapefile.</p>
<ul class="simple">
<li>Let’s add another column to our GeoDataFrame called <code class="docutils literal"><span class="pre">Location</span></code> with
text <em>Senaatintori</em>.</li>
</ul>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="go"># Add a new column and insert data</span>
<span class="gp">In [24]: </span><span class="n">newdata</span><span class="o">.</span><span class="n">loc</span><span class="p">[</span><span class="mi">0</span><span class="p">,</span> <span class="s1">'Location'</span><span class="p">]</span> <span class="o">=</span> <span class="s1">'Senaatintori'</span>
<span class="go"># Let's check the data</span>
<span class="gp">In [25]: </span><span class="n">newdata</span>
<span class="gh">Out[25]: </span><span class="go"></span>
<span class="go"> geometry Location</span>
<span class="go">0 POLYGON ((24.950899 60.169158, 24.953492 60.16... Senaatintori</span>
</pre></div>
</div>
<p>Okey, now we have additional information that is useful to be able to
recognice what the feature represents.</p>
<p>Before exporting the data it is useful to <strong>determine the spatial
reference system for the GeoDataFrame.</strong></p>
<p>As was shown earlier, GeoDataFrame has a property called <em>.crs</em> that
shows the coordinate system of the data which is empty (None) in our
case since we are creating the data from the scratch:</p>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [26]: </span><span class="k">print</span><span class="p">(</span><span class="n">newdata</span><span class="o">.</span><span class="n">crs</span><span class="p">)</span>
<span class="go">None</span>
</pre></div>
</div>
<ul class="simple">
<li>Let’s add a crs for our GeoDataFrame. A Python module called
<strong>fiona</strong> has a nice function called <code class="docutils literal"><span class="pre">from_epsg()</span></code> for passing
coordinate system for the GeoDataFrame. Next we will use that and
determine the projection to WGS84 (epsg code: 4326):</li>
</ul>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="go"># Import specific function 'from_epsg' from fiona module</span>
<span class="gp">In [27]: </span><span class="kn">from</span> <span class="nn">fiona.crs</span> <span class="kn">import</span> <span class="n">from_epsg</span>
<span class="go"># Set the GeoDataFrame's coordinate system to WGS84</span>
<span class="gp">In [28]: </span><span class="n">newdata</span><span class="o">.</span><span class="n">crs</span> <span class="o">=</span> <span class="n">from_epsg</span><span class="p">(</span><span class="mi">4326</span><span class="p">)</span>
<span class="go"># Let's see how the crs definition looks like</span>
<span class="gp">In [29]: </span><span class="n">newdata</span><span class="o">.</span><span class="n">crs</span>
<span class="gh">Out[29]: </span><span class="go">{'init': 'epsg:4326', 'no_defs': True}</span>
</pre></div>
</div>
<ul class="simple">
<li>Finally, we can export the data using GeoDataFrames <code class="docutils literal"><span class="pre">.to_file()</span></code>
-function. The function works similarly as numpy or pandas, but here
we only need to provide the output path for the Shapefile. Easy isn’t
it!:</li>
</ul>
<div class="code python highlight-default"><div class="highlight"><pre><span></span><span class="c1"># Determine the output path for the Shapefile</span>
<span class="n">outfp</span> <span class="o">=</span> <span class="s2">r"/home/geo/Data/Senaatintori.shp"</span>
<span class="c1"># Write the data into that Shapefile</span>
<span class="n">newdata</span><span class="o">.</span><span class="n">to_file</span><span class="p">(</span><span class="n">out</span><span class="p">)</span>
</pre></div>
</div>
<p>Now we have successfully created a Shapefile from the scratch using only
Python programming. Similar approach can be used to for example to read
coordinates from a text file (e.g. points) and create Shapefiles from
those automatically.</p>
<p><strong>Task:</strong> check the output Shapefile in QGIS and make sure that the
attribute table seems correct.</p>
</div>
<div class="section" id="pro-tips-optional-but-recommended">
<h2>Pro -tips (optional but recommended)<a class="headerlink" href="#pro-tips-optional-but-recommended" title="Permalink to this headline">¶</a></h2>
<div class="section" id="grouping-data">
<h3>Grouping data<a class="headerlink" href="#grouping-data" title="Permalink to this headline">¶</a></h3>
<p>One really useful function that can be used in Pandas/Geopandas is <a class="reference external" href="http://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.groupby.html">.groupby()</a>.
This function groups data based on values on selected column(s).</p>
<ul class="simple">
<li>Let’s group individual fishes in <code class="docutils literal"><span class="pre">DAMSELFISH_distribution.shp</span></code> and export the species to individual Shapefiles.<ul>
<li><em>Note: If your `data` -variable doesn’t contain the Damselfish data anymore, read the Shapefile again into memory using `gpd.read_file()` -function</em></li>
</ul>
</li>
</ul>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="go"># Group the data by column 'binomial'</span>
<span class="gp">In [30]: </span><span class="n">grouped</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">groupby</span><span class="p">(</span><span class="s1">'binomial'</span><span class="p">)</span>
<span class="go"># Let's see what we got</span>
<span class="gp">In [31]: </span><span class="n">grouped</span>
<span class="gh">Out[31]: </span><span class="go"><pandas.core.groupby.DataFrameGroupBy object at 0x000000000BCF4828></span>
</pre></div>
</div>
<ul class="simple">
<li><code class="docutils literal"><span class="pre">groupby</span></code> -function gives us an object called <code class="docutils literal"><span class="pre">DataFrameGroupBy</span></code> which is similar to list of keys and values (in a dictionary) that we can iterate over.</li>
</ul>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="go"># Iterate over the group object</span>
<span class="gp">In [32]: </span><span class="k">for</span> <span class="n">key</span><span class="p">,</span> <span class="n">values</span> <span class="ow">in</span> <span class="n">grouped</span><span class="p">:</span>
<span class="gp"> ....: </span> <span class="n">individual_fish</span> <span class="o">=</span> <span class="n">values</span>
<span class="gp"> ....: </span>
<span class="go"># Let's see what is the LAST item that we iterated</span>
<span class="gp">In [33]: </span><span class="n">individual_fish</span>
<span class="gh">Out[33]: </span><span class="go"></span>
<span class="go"> id_no binomial origin compiler year \</span>
<span class="go">27 154915.0 Teixeirichthys jordani 1 None 2012 </span>
<span class="go">28 154915.0 Teixeirichthys jordani 1 None 2012 </span>
<span class="go">29 154915.0 Teixeirichthys jordani 1 None 2012 </span>
<span class="go">30 154915.0 Teixeirichthys jordani 1 None 2012 </span>
<span class="go">31 154915.0 Teixeirichthys jordani 1 None 2012 </span>
<span class="go">32 154915.0 Teixeirichthys jordani 1 None 2012 </span>
<span class="go">33 154915.0 Teixeirichthys jordani 1 None 2012 </span>
<span class="go"> citation source dist_comm island \</span>
<span class="go">27 Red List Index (Sampled Approach), Zoological ... None None None </span>
<span class="go">28 Red List Index (Sampled Approach), Zoological ... None None None </span>
<span class="go">29 Red List Index (Sampled Approach), Zoological ... None None None </span>
<span class="go">30 Red List Index (Sampled Approach), Zoological ... None None None </span>
<span class="go">31 Red List Index (Sampled Approach), Zoological ... None None None </span>
<span class="go">32 Red List Index (Sampled Approach), Zoological ... None None None </span>
<span class="go">33 Red List Index (Sampled Approach), Zoological ... None None None </span>
<span class="go"> subspecies ... phylum_nam class_name order_name family_nam \</span>
<span class="go">27 None ... CHORDATA ACTINOPTERYGII PERCIFORMES POMACENTRIDAE </span>
<span class="go">28 None ... CHORDATA ACTINOPTERYGII PERCIFORMES POMACENTRIDAE </span>
<span class="go">29 None ... CHORDATA ACTINOPTERYGII PERCIFORMES POMACENTRIDAE </span>
<span class="go">30 None ... CHORDATA ACTINOPTERYGII PERCIFORMES POMACENTRIDAE </span>
<span class="go">31 None ... CHORDATA ACTINOPTERYGII PERCIFORMES POMACENTRIDAE </span>
<span class="go">32 None ... CHORDATA ACTINOPTERYGII PERCIFORMES POMACENTRIDAE </span>
<span class="go">33 None ... CHORDATA ACTINOPTERYGII PERCIFORMES POMACENTRIDAE </span>
<span class="go"> genus_name species_na category testCol \</span>
<span class="go">27 Teixeirichthys jordani LC 1 </span>
<span class="go">28 Teixeirichthys jordani LC 1 </span>
<span class="go">29 Teixeirichthys jordani LC 1 </span>
<span class="go">30 Teixeirichthys jordani LC 1 </span>
<span class="go">31 Teixeirichthys jordani LC 1 </span>
<span class="go">32 Teixeirichthys jordani LC 1 </span>
<span class="go">33 Teixeirichthys jordani LC 1 </span>
<span class="go"> geometry area </span>
<span class="go">27 POLYGON ((121.6300326400001 33.04248618400004,... 38.6712 </span>
<span class="go">28 POLYGON ((32.56219482400007 29.97488975500005,... 37.4457 </span>
<span class="go">29 POLYGON ((130.9052090560001 34.02498196400006,... 16.9395 </span>
<span class="go">30 POLYGON ((56.32233070000007 -3.707270205999976... 10.127 </span>
<span class="go">31 POLYGON ((40.64476131800006 -10.85502363999996... 7.7603 </span>
<span class="go">32 POLYGON ((48.11258402900006 -9.335103113999935... 3.43424 </span>
<span class="go">33 POLYGON ((51.75403543100003 -9.21679305899994,... 2.40862 </span>
<span class="go">[7 rows x 26 columns]</span>
</pre></div>
</div>
<p>From here we can see that an individual_fish variable now contains all the rows that belongs to a fish called <code class="docutils literal"><span class="pre">Teixeirichthys</span> <span class="pre">jordani</span></code>. Notice that the index numbers refer to the row numbers in the
original data -GeoDataFrame.</p>
<ul class="simple">
<li>Let’s check the datatype of the grouped object and what does the <code class="docutils literal"><span class="pre">key</span></code> variable contain</li>
</ul>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [34]: </span><span class="nb">type</span><span class="p">(</span><span class="n">individual_fish</span><span class="p">)</span>
<span class="gh">Out[34]: </span><span class="go">geopandas.geodataframe.GeoDataFrame</span>
<span class="gp">In [35]: </span><span class="k">print</span><span class="p">(</span><span class="n">key</span><span class="p">)</span>
<span class="go">