-
Notifications
You must be signed in to change notification settings - Fork 28
/
Copy pathLesson4-reclassify.html
798 lines (649 loc) · 58 KB
/
Lesson4-reclassify.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
<!DOCTYPE html>
<!--[if IE 8]><html class="no-js lt-ie9" lang="en" > <![endif]-->
<!--[if gt IE 8]><!--> <html class="no-js" lang="en" > <!--<![endif]-->
<head>
<meta charset="utf-8">
<script type="text/javascript">
var _gaq = _gaq || [];
_gaq.push(['_setAccount', 'UA-88382509-1']);
_gaq.push(['_trackPageview']);
(function() {
var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true;
ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js';
var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s);
})();
</script>
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Data reclassification — GeoPython - AutoGIS 1 documentation</title>
<link rel="stylesheet" href="_static/css/theme.css" type="text/css" />
<link rel="index" title="Index"
href="genindex.html"/>
<link rel="search" title="Search" href="search.html"/>
<link rel="top" title="GeoPython - AutoGIS 1 documentation" href="index.html"/>
<link rel="next" title="Nearest Neighbour Analysis" href="Lesson4-nearest-neighbour.html"/>
<link rel="prev" title="Geometric operations" href="Lesson4-geometric-operations.html"/>
<script src="_static/js/modernizr.min.js"></script>
</head>
<body class="wy-body-for-nav" role="document">
<div class="wy-grid-for-nav">
<nav data-toggle="wy-nav-shift" class="wy-nav-side">
<div class="wy-side-scroll">
<div class="wy-side-nav-search">
<a href="index.html" class="icon icon-home"> GeoPython - AutoGIS
<img src="_static/logo_hy_geo_135.png" class="logo" />
</a>
<div class="version">
2016 Autumn
</div>
<div role="search">
<form id="rtd-search-form" class="wy-form" action="search.html" method="get">
<input type="text" name="q" placeholder="Search docs" />
<input type="hidden" name="check_keywords" value="yes" />
<input type="hidden" name="area" value="default" />
</form>
</div>
</div>
<div class="wy-menu wy-menu-vertical" data-spy="affix" role="navigation" aria-label="main navigation">
<p class="caption"><span class="caption-text">Course information</span></p>
<ul>
<li class="toctree-l1"><a class="reference internal" href="course-info.html">General info</a></li>
<li class="toctree-l1"><a class="reference internal" href="Installing_Anacondas_GIS.html">Installing Python + GIS</a></li>
<li class="toctree-l1"><a class="reference internal" href="License-terms.html">License and terms of usage</a></li>
</ul>
<p class="caption"><span class="caption-text">Lesson 1</span></p>
<ul>
<li class="toctree-l1"><a class="reference internal" href="Lesson1-Intro-Python-GIS.html">Introduction to Python GIS</a></li>
<li class="toctree-l1"><a class="reference internal" href="Lesson1-Geometric-Objects.html">Geometric Objects - Spatial Data Model</a></li>
<li class="toctree-l1"><a class="reference internal" href="Exercise-1.html">Exercise 1</a></li>
</ul>
<p class="caption"><span class="caption-text">Lesson 2</span></p>
<ul>
<li class="toctree-l1"><a class="reference internal" href="Lesson2-overview.html">Lesson 2 Overview</a></li>
<li class="toctree-l1"><a class="reference internal" href="Lesson2-download-data.html">Download datasets</a></li>
<li class="toctree-l1"><a class="reference internal" href="Lesson2-overview-pandas-geopandas.html">Pandas and Geopandas -modules</a></li>
<li class="toctree-l1"><a class="reference internal" href="Lesson2-pandas-intro.html">Introduction to Pandas</a></li>
<li class="toctree-l1"><a class="reference internal" href="Lesson2-geopandas-basics.html">Introduction to Geopandas</a></li>
<li class="toctree-l1"><a class="reference internal" href="Exercise-2.html">Exercise 2</a></li>
</ul>
<p class="caption"><span class="caption-text">Lesson 3</span></p>
<ul>
<li class="toctree-l1"><a class="reference internal" href="Lesson3-overview.html">Lesson 3 Overview</a></li>
<li class="toctree-l1"><a class="reference internal" href="Lesson3-geocoding.html">Geocoding</a></li>
<li class="toctree-l1"><a class="reference internal" href="Lesson3-table-join.html">Table join</a></li>
<li class="toctree-l1"><a class="reference internal" href="Lesson3-projections.html">Re-projecting data</a></li>
<li class="toctree-l1"><a class="reference internal" href="Lesson3-point-in-polygon.html">Point in Polygon & Intersect</a></li>
<li class="toctree-l1"><a class="reference internal" href="Lesson3-spatial-join.html">Spatial join</a></li>
<li class="toctree-l1"><a class="reference internal" href="Exercise-3.html">Exercise 3</a></li>
</ul>
<p class="caption"><span class="caption-text">Lesson 4</span></p>
<ul class="current">
<li class="toctree-l1"><a class="reference internal" href="Lesson4-overview.html">Lesson 4 Overview</a></li>
<li class="toctree-l1"><a class="reference internal" href="Lesson4-download-data.html">Download datasets</a></li>
<li class="toctree-l1"><a class="reference internal" href="Lesson4-geometric-operations.html">Geometric operations</a></li>
<li class="toctree-l1 current"><a class="current reference internal" href="#">Data reclassification</a><ul>
<li class="toctree-l2"><a class="reference internal" href="#data-preparation">Data preparation</a></li>
<li class="toctree-l2"><a class="reference internal" href="#calculations-in-dataframes">Calculations in DataFrames</a></li>
<li class="toctree-l2"><a class="reference internal" href="#classifying-data">Classifying data</a><ul>
<li class="toctree-l3"><a class="reference internal" href="#creating-a-custom-classifier">Creating a custom classifier</a></li>
<li class="toctree-l3"><a class="reference internal" href="#multicriteria-data-classification">Multicriteria data classification</a></li>
<li class="toctree-l3"><a class="reference internal" href="#classification-based-on-common-classifiers">Classification based on common classifiers</a></li>
</ul>
</li>
</ul>
</li>
<li class="toctree-l1"><a class="reference internal" href="Lesson4-nearest-neighbour.html">Nearest Neighbour Analysis</a></li>
<li class="toctree-l1"><a class="reference internal" href="Exercise-4.html">Exercise 4</a></li>
</ul>
<p class="caption"><span class="caption-text">Lesson 5</span></p>
<ul>
<li class="toctree-l1"><a class="reference internal" href="Lesson5-overview.html">Lesson 5 Overview</a></li>
<li class="toctree-l1"><a class="reference internal" href="Lesson5-download-data.html">Download datasets</a></li>
<li class="toctree-l1"><a class="reference internal" href="Lesson5-static-maps.html">Static maps</a></li>
<li class="toctree-l1"><a class="reference internal" href="Lesson5-interactive-map-bokeh.html">Interactive maps with Bokeh</a></li>
<li class="toctree-l1"><a class="reference internal" href="Lesson5-share-on-github.html">Sharing interactive plots on GitHub</a></li>
<li class="toctree-l1"><a class="reference internal" href="Lesson5-interactive-map-folium.html">Interactive maps on Leaflet</a></li>
<li class="toctree-l1"><a class="reference internal" href="Lesson5-World-3D.html">Inspiration: World 3D</a></li>
<li class="toctree-l1"><a class="reference internal" href="Exercise-5.html">Exercise 5</a></li>
</ul>
<p class="caption"><span class="caption-text">Lesson 6</span></p>
<ul>
<li class="toctree-l1"><a class="reference internal" href="Lesson6-overview.html">Lesson 6 Overview</a></li>
<li class="toctree-l1"><a class="reference internal" href="Lesson6-arcpy.html">Python in ArcGIS</a></li>
<li class="toctree-l1"><a class="reference internal" href="Lesson6-toolbox.html">ArcGIS Toolbox</a></li>
<li class="toctree-l1"><a class="reference internal" href="Lesson6-arcpy-script.html">Writing arcpy scripts</a></li>
<li class="toctree-l1"><a class="reference internal" href="Lesson6-run-the-tool.html">Running the Python script from ArcGIS</a></li>
</ul>
<p class="caption"><span class="caption-text">Lesson 7</span></p>
<ul>
<li class="toctree-l1"><a class="reference internal" href="Lesson7-overview.html">Lesson 7 Overview</a></li>
<li class="toctree-l1"><a class="reference internal" href="Lesson7-download.html">Download data</a></li>
<li class="toctree-l1"><a class="reference internal" href="Lesson7-read-raster.html">Reading raster files with GDAL</a></li>
<li class="toctree-l1"><a class="reference internal" href="Lesson7-read-raster-array.html">Reading raster as a numerical array</a></li>
<li class="toctree-l1"><a class="reference internal" href="Lesson7-gdal-utilities.html">GDAL command line tools</a></li>
<li class="toctree-l1"><a class="reference internal" href="Exercise-7.html">Exercise 7</a></li>
</ul>
<p class="caption"><span class="caption-text">Lesson 8</span></p>
<ul>
<li class="toctree-l1"><a class="reference internal" href="Lesson8-network-analysis.html">Network analysis in Python</a></li>
</ul>
<p class="caption"><span class="caption-text">Final Assignment</span></p>
<ul>
<li class="toctree-l1"><a class="reference internal" href="Final-assignment.html">Final assignment</a></li>
</ul>
<p class="caption"><span class="caption-text">Map Challenge 2016</span></p>
<ul>
<li class="toctree-l1"><a class="reference internal" href="map-challenge.html">Map Challenge 2016</a></li>
</ul>
</div>
</div>
</nav>
<section data-toggle="wy-nav-shift" class="wy-nav-content-wrap">
<nav class="wy-nav-top" role="navigation" aria-label="top navigation">
<i data-toggle="wy-nav-top" class="fa fa-bars"></i>
<a href="index.html">GeoPython - AutoGIS</a>
</nav>
<div class="wy-nav-content">
<div class="rst-content">
<div role="navigation" aria-label="breadcrumbs navigation">
<ul class="wy-breadcrumbs">
<li><a href="index.html">Docs</a> »</li>
<li>Data reclassification</li>
<li class="wy-breadcrumbs-aside">
<a href="https://github.com/Automating-GIS-processes/2016/blob/master/source/Lesson4-reclassify.rst" class="fa fa-github"> Edit on GitHub</a>
</li>
</ul>
<hr/>
</div>
<div role="main" class="document" itemscope="itemscope" itemtype="http://schema.org/Article">
<div itemprop="articleBody">
<div class="section" id="data-reclassification">
<h1>Data reclassification<a class="headerlink" href="#data-reclassification" title="Permalink to this headline">¶</a></h1>
<p>Reclassifying data based on specific criteria is a common task when doing GIS analysis.
The purpose of this lesson is to see how we can reclassify values based on some criteria which can be whatever, such as:</p>
<div class="code highlight-default"><div class="highlight"><pre><span></span><span class="mf">1.</span> <span class="k">if</span> <span class="n">available</span> <span class="n">space</span> <span class="ow">in</span> <span class="n">a</span> <span class="n">pub</span> <span class="ow">is</span> <span class="n">less</span> <span class="n">than</span> <span class="n">the</span> <span class="n">space</span> <span class="ow">in</span> <span class="n">my</span> <span class="n">wardrobe</span>
<span class="n">AND</span>
<span class="mf">2.</span> <span class="n">the</span> <span class="n">temperature</span> <span class="n">outside</span> <span class="ow">is</span> <span class="n">warmer</span> <span class="n">than</span> <span class="n">my</span> <span class="n">beer</span>
<span class="o">------------------------------------------------------</span>
<span class="n">IF</span> <span class="n">TRUE</span><span class="p">:</span> <span class="o">==></span> <span class="n">I</span> <span class="n">go</span> <span class="ow">and</span> <span class="n">drink</span> <span class="n">my</span> <span class="n">beer</span> <span class="n">outside</span>
<span class="n">IF</span> <span class="n">NOT</span> <span class="n">TRUE</span><span class="p">:</span> <span class="o">==></span> <span class="n">I</span> <span class="n">go</span> <span class="ow">and</span> <span class="n">enjoy</span> <span class="n">my</span> <span class="n">beer</span> <span class="n">inside</span> <span class="n">at</span> <span class="n">a</span> <span class="n">table</span>
</pre></div>
</div>
<p>Even though, the above would be an interesting study case, we will use slightly more traditional cases to learn classifications.
We will use Corine land cover layer from year 2012, and a Travel Time Matrix data from Helsinki to classify some features of them based on our own
self-made classifier, or using a ready made classifiers that are commonly used e.g. when doing visualizations.</p>
<p>The target in this part of the lesson is to:</p>
<ol class="arabic">
<li><p class="first">classify the lakes into big and small lakes where</p>
<blockquote>
<div><ul class="simple">
<li>a big lake is a lake that is larger than the average size of all lakes in our study region</li>
<li>a small lake ^ vice versa</li>
</ul>
</div></blockquote>
</li>
<li><p class="first">use travel times and distances to find out</p>
<ul class="simple">
<li>good locations to buy an apartment with good public tranportation accessibility to city center</li>
<li>but from a bit further away from city center where the prices are lower (or at least we assume so).</li>
</ul>
</li>
<li><p class="first">use ready made classifiers from pysal -module to classify travel times into multiple classes.</p>
</li>
</ol>
<div class="section" id="data-preparation">
<h2>Data preparation<a class="headerlink" href="#data-preparation" title="Permalink to this headline">¶</a></h2>
<p>Before doing any classification, we need to prepare our data a little bit. Make sure you have <a class="reference external" href="Lesson4-download-data.html">downloaded and extracted the data</a> before continuing.</p>
<p>Let’s read the data in and select only English columns from it and plot our data so that we can see how it looks like on a map.</p>
<div class="code highlight-default"><div class="highlight"><pre><span></span><span class="kn">import</span> <span class="nn">geopandas</span> <span class="k">as</span> <span class="nn">gpd</span>
<span class="kn">import</span> <span class="nn">matplotlib.pyplot</span> <span class="k">as</span> <span class="nn">plt</span>
<span class="c1"># File path</span>
<span class="n">fp</span> <span class="o">=</span> <span class="s2">"/home/data/Corine2012_Uusimaa.shp"</span>
<span class="n">data</span> <span class="o">=</span> <span class="n">gpd</span><span class="o">.</span><span class="n">read_file</span><span class="p">(</span><span class="n">fp</span><span class="p">)</span>
</pre></div>
</div>
<p>Let’s see what we have.</p>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [1]: </span><span class="n">data</span><span class="o">.</span><span class="n">head</span><span class="p">(</span><span class="mi">2</span><span class="p">)</span>
<span class="gh">Out[1]: </span><span class="go"></span>
<span class="go"> Level1 Level1Eng Level1Suo Level2 Level2Eng \</span>
<span class="go">0 1 Artificial surfaces Rakennetut alueet 11 Urban fabric </span>
<span class="go">1 1 Artificial surfaces Rakennetut alueet 11 Urban fabric </span>
<span class="go"> Level2Suo Level3 Level3Eng \</span>
<span class="go">0 Asuinalueet 112 Discontinuous urban fabric </span>
<span class="go">1 Asuinalueet 112 Discontinuous urban fabric </span>
<span class="go"> Level3Suo Luokka3 \</span>
<span class="go">0 Väljästi rakennetut asuinalueet 112 </span>
<span class="go">1 Väljästi rakennetut asuinalueet 112 </span>
<span class="go"> geometry </span>
<span class="go">0 POLYGON ((279500 6640640, 279507.469 6640635.3... </span>
<span class="go">1 POLYGON ((313620 6655820, 313639.8910000001 66... </span>
</pre></div>
</div>
<p>Let’s select only English columns</p>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="go"># Select only English columns</span>
<span class="gp">In [2]: </span><span class="n">selected_cols</span> <span class="o">=</span> <span class="p">[</span><span class="s1">'Level1'</span><span class="p">,</span> <span class="s1">'Level1Eng'</span><span class="p">,</span> <span class="s1">'Level2'</span><span class="p">,</span> <span class="s1">'Level2Eng'</span><span class="p">,</span> <span class="s1">'Level3'</span><span class="p">,</span> <span class="s1">'Level3Eng'</span><span class="p">,</span> <span class="s1">'Luokka3'</span><span class="p">,</span> <span class="s1">'geometry'</span><span class="p">]</span>
<span class="go"># Select data</span>
<span class="gp">In [3]: </span><span class="n">data</span> <span class="o">=</span> <span class="n">data</span><span class="p">[</span><span class="n">selected_cols</span><span class="p">]</span>
<span class="go"># What are the columns now?</span>
<span class="gp">In [4]: </span><span class="n">data</span><span class="o">.</span><span class="n">columns</span>
<span class="gh">Out[4]: </span><span class="go"></span>
<span class="go">Index(['Level1', 'Level1Eng', 'Level2', 'Level2Eng', 'Level3', 'Level3Eng',</span>
<span class="go"> 'Luokka3', 'geometry'],</span>
<span class="go"> dtype='object')</span>
</pre></div>
</div>
<p>Let’s plot the data and use column ‘Level3’ as our color.</p>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [5]: </span><span class="n">data</span><span class="o">.</span><span class="n">plot</span><span class="p">(</span><span class="n">column</span><span class="o">=</span><span class="s1">'Level3'</span><span class="p">,</span> <span class="n">linewidth</span><span class="o">=</span><span class="mf">0.05</span><span class="p">)</span>
<span class="gh">Out[5]: </span><span class="go"><matplotlib.axes._subplots.AxesSubplot at 0x126b0ac8></span>
<span class="go"># Use tight layout and remove empty whitespace around our map</span>
<span class="gp">In [6]: </span><span class="n">plt</span><span class="o">.</span><span class="n">tight_layout</span><span class="p">()</span>
</pre></div>
</div>
<a class="reference internal image-reference" href="_images/corine-level3.png"><img alt="_images/corine-level3.png" src="_images/corine-level3.png" style="width: 7in;" /></a>
<p>Let’s see what kind of values we have in ‘Level3Eng’ column.</p>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [7]: </span><span class="nb">list</span><span class="p">(</span><span class="n">data</span><span class="p">[</span><span class="s1">'Level3Eng'</span><span class="p">]</span><span class="o">.</span><span class="n">unique</span><span class="p">())</span>
<span class="gh">Out[7]: </span><span class="go"></span>
<span class="go">['Discontinuous urban fabric',</span>
<span class="go"> 'Transitional woodland/shrub',</span>
<span class="go"> 'Non-irrigated arable land',</span>
<span class="go"> 'Fruit trees and berry plantations',</span>
<span class="go"> 'Pastures',</span>
<span class="go"> 'Land principally occupied by agriculture, with significant areas of natural vegetation',</span>
<span class="go"> 'Bare rock',</span>
<span class="go"> 'Inland marshes',</span>
<span class="go"> 'Peatbogs',</span>
<span class="go"> 'Salt marshes',</span>
<span class="go"> 'Water courses',</span>
<span class="go"> 'Water bodies',</span>
<span class="go"> 'Sea and ocean',</span>
<span class="go"> 'Industrial or commercial units',</span>
<span class="go"> 'Road and rail networks and associated land',</span>
<span class="go"> 'Port areas',</span>
<span class="go"> 'Airports',</span>
<span class="go"> 'Mineral extraction sites',</span>
<span class="go"> 'Broad-leaved forest',</span>
<span class="go"> 'Dump sites',</span>
<span class="go"> 'Coniferous forest',</span>
<span class="go"> 'Construction sites',</span>
<span class="go"> 'Green urban areas',</span>
<span class="go"> 'Sport and leisure facilities',</span>
<span class="go"> 'Mixed forest']</span>
</pre></div>
</div>
<p>Okey we have plenty of different kind of land covers in our data. Let’s select only lakes from our data. Selecting specific rows from a DataFrame
based on some value(s) is easy to do in Pandas / Geopandas using a specific indexer called <code class="docutils literal"><span class="pre">.ix[]</span></code>, read more from <a class="reference external" href="http://pandas.pydata.org/pandas-docs/stable/indexing.html#different-choices-for-indexing">here</a>..</p>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="go"># Select lakes (i.e. 'waterbodies' in the data) and make a proper copy out of our data</span>
<span class="gp">In [8]: </span><span class="n">lakes</span> <span class="o">=</span> <span class="n">data</span><span class="o">.</span><span class="n">ix</span><span class="p">[</span><span class="n">data</span><span class="p">[</span><span class="s1">'Level3Eng'</span><span class="p">]</span> <span class="o">==</span> <span class="s1">'Water bodies'</span><span class="p">]</span><span class="o">.</span><span class="n">copy</span><span class="p">()</span>
<span class="gp">In [9]: </span><span class="n">lakes</span><span class="o">.</span><span class="n">head</span><span class="p">(</span><span class="mi">2</span><span class="p">)</span>
<span class="gh">Out[9]: </span><span class="go"></span>
<span class="go"> Level1 Level1Eng Level2 Level2Eng Level3 Level3Eng \</span>
<span class="go">1388 5 Water bodies 51 Inland waters 512 Water bodies </span>
<span class="go">1389 5 Water bodies 51 Inland waters 512 Water bodies </span>
<span class="go"> Luokka3 geometry </span>
<span class="go">1388 512 POLYGON ((298388.189 6642944.189999999, 298364... </span>
<span class="go">1389 512 POLYGON ((286629.2579999999 6643429.219000001,... </span>
</pre></div>
</div>
</div>
<div class="section" id="calculations-in-dataframes">
<h2>Calculations in DataFrames<a class="headerlink" href="#calculations-in-dataframes" title="Permalink to this headline">¶</a></h2>
<p>Okey now we have our lakes dataset ready. The aim was to classify those lakes into small and big lakes based on <strong>the average size of all lakes</strong> in our
study area. Thus, we need to calculate the average size of our lakes.</p>
<p>Let’s check the coordinate system.</p>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="go"># Check coordinate system information</span>
<span class="gp">In [10]: </span><span class="n">data</span><span class="o">.</span><span class="n">crs</span>
<span class="gh">Out[10]: </span><span class="go">{'ellps': 'GRS80', 'no_defs': True, 'proj': 'utm', 'units': 'm', 'zone': 35}</span>
</pre></div>
</div>
<p>Okey we can see that the units are in meters and we have a <a class="reference external" href="https://en.wikipedia.org/wiki/Universal_Transverse_Mercator_coordinate_system">UTM projection.</a></p>
<p>Let’s calculate first the are of our lakes.</p>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="go"># Calculate the area of lakes</span>
<span class="gp">In [11]: </span><span class="n">lakes</span><span class="p">[</span><span class="s1">'area'</span><span class="p">]</span> <span class="o">=</span> <span class="n">lakes</span><span class="o">.</span><span class="n">area</span>
<span class="go"># What do we have?</span>
<span class="gp">In [12]: </span><span class="n">lakes</span><span class="p">[</span><span class="s1">'area'</span><span class="p">]</span><span class="o">.</span><span class="n">head</span><span class="p">(</span><span class="mi">2</span><span class="p">)</span>
<span class="gh">Out[12]: </span><span class="go"></span>
<span class="go">1388 268310.708164</span>
<span class="go">1389 917661.921348</span>
<span class="go">Name: area, dtype: float64</span>
</pre></div>
</div>
<p>Notice that the values are now in square meters.. Let’s change those into square kilometers so they are easier to read. Doing calculations in Pandas / Geopandas
are easy to do:</p>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [13]: </span><span class="n">lakes</span><span class="p">[</span><span class="s1">'area_km2'</span><span class="p">]</span> <span class="o">=</span> <span class="n">lakes</span><span class="p">[</span><span class="s1">'area'</span><span class="p">]</span> <span class="o">/</span> <span class="mi">1000000</span>
<span class="go"># What is the mean size of our lakes?</span>
<span class="gp">In [14]: </span><span class="n">l_mean_size</span> <span class="o">=</span> <span class="n">lakes</span><span class="p">[</span><span class="s1">'area_km2'</span><span class="p">]</span><span class="o">.</span><span class="n">mean</span><span class="p">()</span>
<span class="gp">In [15]: </span><span class="n">l_mean_size</span>
<span class="gh">Out[15]: </span><span class="go">1.5828513727796711</span>
</pre></div>
</div>
<p>Okey so the size of our lakes seem to be approximately 1.58 square kilometers.</p>
<div class="admonition note">
<p class="first admonition-title">Note</p>
<p>It is also easy to calculate e.g. sum or difference between two or more layers (plus all other mathematical operations), e.g.:</p>
<div class="code python last highlight-default"><div class="highlight"><pre><span></span><span class="c1"># Sum two columns</span>
<span class="n">data</span><span class="p">[</span><span class="s1">'sum_of_columns'</span><span class="p">]</span> <span class="o">=</span> <span class="n">data</span><span class="p">[</span><span class="s1">'col_1'</span><span class="p">]</span> <span class="o">+</span> <span class="n">data</span><span class="p">[</span><span class="s1">'col_2'</span><span class="p">]</span>
<span class="c1"># Calculate the difference of three columns</span>
<span class="n">data</span><span class="p">[</span><span class="s1">'difference'</span><span class="p">]</span> <span class="o">=</span> <span class="n">data</span><span class="p">[</span><span class="s1">'some_column'</span><span class="p">]</span> <span class="o">-</span> <span class="n">data</span><span class="p">[</span><span class="s1">'col_1'</span><span class="p">]</span> <span class="o">+</span> <span class="n">data</span><span class="p">[</span><span class="s1">'col_2'</span><span class="p">]</span>
</pre></div>
</div>
</div>
</div>
<div class="section" id="classifying-data">
<h2>Classifying data<a class="headerlink" href="#classifying-data" title="Permalink to this headline">¶</a></h2>
<div class="section" id="creating-a-custom-classifier">
<h3>Creating a custom classifier<a class="headerlink" href="#creating-a-custom-classifier" title="Permalink to this headline">¶</a></h3>
<p>Let’s create a function where we classify the geometries into two classes based on a given <code class="docutils literal"><span class="pre">threshold</span></code> -parameter.
If the area of a polygon is lower than the threshold value (average size of the lake), the output column will get a value 0,
if it is larger, it will get a value 1. This kind of classification is often called a <a class="reference external" href="https://en.wikipedia.org/wiki/Binary_classification">binary classification</a>.</p>
<p>First we need to create a function for our classification task. This function takes a single row of the GeoDataFrame as input,
plus few other parameters that we can use.</p>
<div class="code highlight-default"><div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">binaryClassifier</span><span class="p">(</span><span class="n">row</span><span class="p">,</span> <span class="n">source_col</span><span class="p">,</span> <span class="n">output_col</span><span class="p">,</span> <span class="n">threshold</span><span class="p">):</span>
<span class="c1"># If area of input geometry is lower that the threshold value</span>
<span class="k">if</span> <span class="n">row</span><span class="p">[</span><span class="n">source_col</span><span class="p">]</span> <span class="o"><</span> <span class="n">threshold</span><span class="p">:</span>
<span class="c1"># Update the output column with value 0</span>
<span class="n">row</span><span class="p">[</span><span class="n">output_col</span><span class="p">]</span> <span class="o">=</span> <span class="mi">0</span>
<span class="c1"># If area of input geometry is higher than the threshold value update with value 1</span>
<span class="k">else</span><span class="p">:</span>
<span class="n">row</span><span class="p">[</span><span class="n">output_col</span><span class="p">]</span> <span class="o">=</span> <span class="mi">1</span>
<span class="c1"># Return the updated row</span>
<span class="k">return</span> <span class="n">row</span>
</pre></div>
</div>
<p>Let’s create an empty column for our classification</p>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [16]: </span><span class="n">lakes</span><span class="p">[</span><span class="s1">'small_big'</span><span class="p">]</span> <span class="o">=</span> <span class="bp">None</span>
</pre></div>
</div>
<p>We can use our custom function by using a Pandas / Geopandas function called <code class="docutils literal"><span class="pre">.apply()</span></code>.
Thus, let’s apply our function and do the classification.</p>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [17]: </span><span class="n">lakes</span> <span class="o">=</span> <span class="n">lakes</span><span class="o">.</span><span class="n">apply</span><span class="p">(</span><span class="n">binaryClassifier</span><span class="p">,</span> <span class="n">source_col</span><span class="o">=</span><span class="s1">'area_km2'</span><span class="p">,</span> <span class="n">output_col</span><span class="o">=</span><span class="s1">'small_big'</span><span class="p">,</span> <span class="n">threshold</span><span class="o">=</span><span class="n">l_mean_size</span><span class="p">,</span> <span class="n">axis</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
</pre></div>
</div>
<p>Let’s plot these lakes and see how they look like.</p>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [18]: </span><span class="n">lakes</span><span class="o">.</span><span class="n">plot</span><span class="p">(</span><span class="n">column</span><span class="o">=</span><span class="s1">'small_big'</span><span class="p">,</span> <span class="n">linewidth</span><span class="o">=</span><span class="mf">0.05</span><span class="p">,</span> <span class="n">cmap</span><span class="o">=</span><span class="s2">"seismic"</span><span class="p">)</span>
<span class="gh">Out[18]: </span><span class="go"><matplotlib.axes._subplots.AxesSubplot at 0x12a8e048></span>
<span class="gp">In [19]: </span><span class="n">plt</span><span class="o">.</span><span class="n">tight_layout</span><span class="p">()</span>
</pre></div>
</div>
<a class="reference internal image-reference" href="_images/small-big-lakes.png"><img alt="_images/small-big-lakes.png" src="_images/small-big-lakes.png" style="width: 6in;" /></a>
<p>Okey so it looks like they are correctly classified, good.</p>
<div class="admonition note">
<p class="first admonition-title">Note</p>
<p>There is also a way of doing this without a function but with the previous example might be easier to understand how the function works.
Doing more complicated set of criteria should definitely be done in a function as it is much more human readable.</p>
<p>Let’s give a value 0 for small lakes and value 1 for big lakes by using an alternative technique:</p>
<div class="code last highlight-default"><div class="highlight"><pre><span></span><span class="n">lakes</span><span class="p">[</span><span class="s1">'small_big_alt'</span><span class="p">]</span> <span class="o">=</span> <span class="kc">None</span>
<span class="n">lakes</span><span class="o">.</span><span class="n">loc</span><span class="p">[</span><span class="n">lakes</span><span class="p">[</span><span class="s1">'area_km2'</span><span class="p">]</span> <span class="o"><</span> <span class="n">l_mean_size</span><span class="p">,</span> <span class="s1">'small_big_alt'</span><span class="p">]</span> <span class="o">=</span> <span class="mi">0</span>
<span class="n">lakes</span><span class="o">.</span><span class="n">loc</span><span class="p">[</span><span class="n">lakes</span><span class="p">[</span><span class="s1">'area_km2'</span><span class="p">]</span> <span class="o">>=</span> <span class="n">l_mean_size</span><span class="p">,</span> <span class="s1">'small_big_alt'</span><span class="p">]</span> <span class="o">=</span> <span class="mi">1</span>
</pre></div>
</div>
</div>
</div>
<div class="section" id="multicriteria-data-classification">
<h3>Multicriteria data classification<a class="headerlink" href="#multicriteria-data-classification" title="Permalink to this headline">¶</a></h3>
<p>It also possible to do classifiers with multiple criteria easily in Pandas/Geopandas by extending the example that we started earlier.
Now we will modify our binaryClassifier function a bit so that it classifies the data based on two columns.</p>
<p>Let’s call it customClassifier2 as it takes into account two criteria:</p>
<div class="code python highlight-default"><div class="highlight"><pre><span></span><span class="k">def</span> <span class="nf">customClassifier2</span><span class="p">(</span><span class="n">row</span><span class="p">,</span> <span class="n">src_col1</span><span class="p">,</span> <span class="n">src_col2</span><span class="p">,</span> <span class="n">threshold1</span><span class="p">,</span> <span class="n">threshold2</span><span class="p">,</span> <span class="n">output_col</span><span class="p">):</span>
<span class="c1"># 1. If the value in src_col1 is LOWER than the threshold1 value</span>
<span class="c1"># 2. AND the value in src_col2 is HIGHER than the threshold2 value, give value 1, otherwise give 0</span>
<span class="k">if</span> <span class="n">row</span><span class="p">[</span><span class="n">src_col1</span><span class="p">]</span> <span class="o"><</span> <span class="n">threshold1</span> <span class="ow">and</span> <span class="n">row</span><span class="p">[</span><span class="n">src_col2</span><span class="p">]</span> <span class="o">></span> <span class="n">threshold2</span><span class="p">:</span>
<span class="c1"># Update the output column with value 0</span>
<span class="n">row</span><span class="p">[</span><span class="n">output_col</span><span class="p">]</span> <span class="o">=</span> <span class="mi">1</span>
<span class="c1"># If area of input geometry is higher than the threshold value update with value 1</span>
<span class="k">else</span><span class="p">:</span>
<span class="n">row</span><span class="p">[</span><span class="n">output_col</span><span class="p">]</span> <span class="o">=</span> <span class="mi">0</span>
<span class="c1"># Return the updated row</span>
<span class="k">return</span> <span class="n">row</span>
</pre></div>
</div>
<p>Okey, now we have our classifier ready, let’s use it to our data.</p>
<p>First, we need to read our Travel Time data from Helsinki into memory from the GeoJSON file that <a class="reference external" href="Lesson4-geometric-operations.html">we prepared earlier</a> with overlay analysis.</p>
<div class="code python highlight-default"><div class="highlight"><pre><span></span><span class="n">fp</span> <span class="o">=</span> <span class="s2">r"/home/geo/TravelTimes_to_5975375_RailwayStation_Helsinki.geojson"</span>
<span class="c1"># Read the GeoJSON file similarly as Shapefile</span>
<span class="n">acc</span> <span class="o">=</span> <span class="n">gpd</span><span class="o">.</span><span class="n">read_file</span><span class="p">(</span><span class="n">fp</span><span class="p">)</span>
<span class="c1"># Let's see what we have</span>
<span class="n">acc</span><span class="o">.</span><span class="n">head</span><span class="p">(</span><span class="mi">2</span><span class="p">)</span>
</pre></div>
</div>
<p>Okey we have plenty of different variables (see <a class="reference external" href="http://blogs.helsinki.fi/accessibility/helsinki-region-travel-time-matrix-2015/">from here the description</a>
for all attributes) but what we are
interested in are columns called <code class="docutils literal"><span class="pre">pt_r_tt</span></code> which is telling the time in minutes that it takes to reach city center
from different parts of the city, and <code class="docutils literal"><span class="pre">walk_d</span></code> that tells the network distance by roads to reach city center
from different parts of the city (almost equal to Euclidian distance).</p>
<p><strong>The NoData values are presented with value -1</strong>. Thus we need to remove those first.</p>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [20]: </span><span class="n">acc</span> <span class="o">=</span> <span class="n">acc</span><span class="o">.</span><span class="n">ix</span><span class="p">[</span><span class="n">acc</span><span class="p">[</span><span class="s1">'pt_r_tt'</span><span class="p">]</span> <span class="o">>=</span><span class="mi">0</span><span class="p">]</span>
</pre></div>
</div>
<p>Let’s plot it and see how our data looks like.</p>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [21]: </span><span class="kn">import</span> <span class="nn">matplotlib.pyplot</span> <span class="kn">as</span> <span class="nn">plt</span>
<span class="go"># Plot using 9 classes and classify the values using "Fisher Jenks" classification</span>
<span class="gp">In [22]: </span><span class="n">acc</span><span class="o">.</span><span class="n">plot</span><span class="p">(</span><span class="n">column</span><span class="o">=</span><span class="s2">"pt_r_tt"</span><span class="p">,</span> <span class="n">scheme</span><span class="o">=</span><span class="s2">"Fisher_Jenks"</span><span class="p">,</span> <span class="n">k</span><span class="o">=</span><span class="mi">9</span><span class="p">,</span> <span class="n">cmap</span><span class="o">=</span><span class="s2">"RdYlBu"</span><span class="p">,</span> <span class="n">linewidth</span><span class="o">=</span><span class="mi">0</span><span class="p">);</span>
<span class="go"># Use tight layour</span>
<span class="gp">In [23]: </span><span class="n">plt</span><span class="o">.</span><span class="n">tight_layout</span><span class="p">()</span>
</pre></div>
</div>
<a class="reference internal image-reference" href="_images/pt_time.png"><img alt="_images/pt_time.png" src="_images/pt_time.png" style="width: 7in;" /></a>
<p>Okey so from this figure we can see that the travel times are lower in the south where
the city center is located but there are some areas of “good” accessibility also in some other areas
(where the color is red).</p>
<p>Let’s also make a plot about walking distances</p>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [24]: </span><span class="n">acc</span><span class="o">.</span><span class="n">plot</span><span class="p">(</span><span class="n">column</span><span class="o">=</span><span class="s2">"walk_d"</span><span class="p">,</span> <span class="n">scheme</span><span class="o">=</span><span class="s2">"Fisher_Jenks"</span><span class="p">,</span> <span class="n">k</span><span class="o">=</span><span class="mi">9</span><span class="p">,</span> <span class="n">cmap</span><span class="o">=</span><span class="s2">"RdYlBu"</span><span class="p">,</span> <span class="n">linewidth</span><span class="o">=</span><span class="mi">0</span><span class="p">);</span>
<span class="go"># Use tight layour</span>
<span class="gp">In [25]: </span><span class="n">plt</span><span class="o">.</span><span class="n">tight_layout</span><span class="p">();</span>
</pre></div>
</div>
<a class="reference internal image-reference" href="_images/walk_distances.png"><img alt="_images/walk_distances.png" src="_images/walk_distances.png" style="width: 7in;" /></a>
<p>Okey, from here we can see that the walking distances (along road network) reminds
more or less Euclidian distances.</p>
<p>Let’s finally do our classification based on two criteria
and find out grid cells where the <strong>travel time is lower or equal to 20 minutes</strong> but they are further away
<strong>than 4 km (4000 meters) from city center</strong>.</p>
<p>Let’s create an empty column for our classification results called “Suitable_area”.</p>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [26]: </span><span class="n">acc</span><span class="p">[</span><span class="s2">"Suitable_area"</span><span class="p">]</span> <span class="o">=</span> <span class="bp">None</span>
</pre></div>
</div>
<p>Now we are ready to apply our custom classifier to our data with our own criteria.</p>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [27]: </span><span class="n">acc</span> <span class="o">=</span> <span class="n">acc</span><span class="o">.</span><span class="n">apply</span><span class="p">(</span><span class="n">customClassifier2</span><span class="p">,</span> <span class="n">src_col1</span><span class="o">=</span><span class="s1">'pt_r_tt'</span><span class="p">,</span> <span class="n">src_col2</span><span class="o">=</span><span class="s1">'walk_d'</span><span class="p">,</span> <span class="n">threshold1</span><span class="o">=</span><span class="mi">20</span><span class="p">,</span> <span class="n">threshold2</span><span class="o">=</span><span class="mi">4000</span><span class="p">,</span> <span class="n">output_col</span><span class="o">=</span><span class="s2">"Suitable_area"</span><span class="p">,</span> <span class="n">axis</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span>
</pre></div>
</div>
<p>Let’s see what we got.</p>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [28]: </span><span class="n">acc</span><span class="o">.</span><span class="n">head</span><span class="p">()</span>
<span class="gh">Out[28]: </span><span class="go"></span>
<span class="go"> car_m_d car_m_t car_r_d car_r_t from_id pt_m_d pt_m_t pt_m_tt \</span>
<span class="go">0 15981 36 15988 41 6002702 14698 65 73 </span>
<span class="go">1 16190 34 16197 39 6002701 14661 64 73 </span>
<span class="go">2 15727 33 15733 37 6001132 14256 59 69 </span>
<span class="go">3 15975 33 15982 37 6001131 14512 62 73 </span>
<span class="go">4 16136 35 16143 40 6001138 14730 65 73 </span>
<span class="go"> pt_r_d pt_r_t ... to_id walk_d walk_t GML_ID NAMEFIN \</span>
<span class="go">0 14698 61 ... 5975375 14456 207 27517366 Helsinki </span>
<span class="go">1 14661 60 ... 5975375 14419 206 27517366 Helsinki </span>
<span class="go">2 14256 55 ... 5975375 14014 200 27517366 Helsinki </span>
<span class="go">3 14512 58 ... 5975375 14270 204 27517366 Helsinki </span>
<span class="go">4 14730 61 ... 5975375 14212 203 27517366 Helsinki </span>
<span class="go"> NAMESWE NATCODE area \</span>
<span class="go">0 Helsingfors 091 62499.999976 </span>
<span class="go">1 Helsingfors 091 62499.999977 </span>
<span class="go">2 Helsingfors 091 62499.999977 </span>
<span class="go">3 Helsingfors 091 62499.999976 </span>
<span class="go">4 Helsingfors 091 62499.999977 </span>
<span class="go"> geometry Suitable_area </span>
<span class="go">0 POLYGON ((391000.0001349226 6667750.00004299, ... 0 </span>
<span class="go">1 POLYGON ((390750.0001349644 6668000.000042951,... 0 </span>
<span class="go">2 POLYGON ((391000.0001349143 6668000.000042943,... 0 </span>
<span class="go">3 POLYGON ((390750.0001349644 6668000.000042951,... 0 </span>
<span class="go">4 POLYGON ((392500.0001346234 6668000.000042901,... 0 </span>
<span class="go">[5 rows x 21 columns]</span>
</pre></div>
</div>
<p>Okey we have new values in <code class="docutils literal"><span class="pre">Suitable_area</span></code> .column.</p>
<p>How many Polygons are suitable for us? Let’s find out by using a Pandas function called <code class="docutils literal"><span class="pre">value_counts()</span></code> that return the count of
different values in our column.</p>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [29]: </span><span class="n">acc</span><span class="p">[</span><span class="s1">'Suitable_area'</span><span class="p">]</span><span class="o">.</span><span class="n">value_counts</span><span class="p">()</span>
<span class="gh">Out[29]: </span><span class="go"></span>
<span class="go">0 3808</span>
<span class="go">1 9</span>
<span class="go">Name: Suitable_area, dtype: int64</span>
</pre></div>
</div>
<p>Okey so there seems to be nine suitable locations for us where we can try to find an appartment to buy
Let’s see where they are located.</p>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="go"># Plot</span>
<span class="gp">In [30]: </span><span class="n">acc</span><span class="o">.</span><span class="n">plot</span><span class="p">(</span><span class="n">column</span><span class="o">=</span><span class="s2">"Suitable_area"</span><span class="p">,</span> <span class="n">linewidth</span><span class="o">=</span><span class="mi">0</span><span class="p">);</span>
<span class="go"># Use tight layour</span>
<span class="gp">In [31]: </span><span class="n">plt</span><span class="o">.</span><span class="n">tight_layout</span><span class="p">();</span>
</pre></div>
</div>
<a class="reference internal image-reference" href="_images/suitable_areas.png"><img alt="_images/suitable_areas.png" src="_images/suitable_areas.png" style="width: 7in;" /></a>
<p>A-haa, okey so we can see that suitable places for us with our criteria seem to be located in the
eastern part from the city center. Actually, those locations are along the metro line which makes them
good locations in terms of travel time to city center since metro is really fast travel mode.</p>
<div class="admonition-todo admonition" id="index-0">
<p class="first admonition-title">Todo</p>
<p><strong>Task:</strong></p>
<p class="last">Try to change your classification criteria and see how your results change! What places would be
suitable for you to buy an apartment in Helsinki region? You can also change the travel mode and see how
they change the results.</p>
</div>
</div>
<div class="section" id="classification-based-on-common-classifiers">
<h3>Classification based on common classifiers<a class="headerlink" href="#classification-based-on-common-classifiers" title="Permalink to this headline">¶</a></h3>
<p><a class="reference external" href="http://pysal.readthedocs.io/en/latest/">Pysal</a> -module is an extensive Python library including various functions and tools to
do spatial data analysis. It also includes all of the most common data classifiers that are used commonly e.g. when visualizing data.
Available map classifiers in pysal -module are (<a class="reference external" href="http://pysal.readthedocs.io/en/latest/library/esda/mapclassify.html">see here for more details</a>):</p>
<blockquote>
<div><ul class="simple">
<li>Box_Plot</li>
<li>Equal_Interval</li>
<li>Fisher_Jenks</li>
<li>Fisher_Jenks_Sampled</li>
<li>HeadTail_Breaks</li>
<li>Jenks_Caspall</li>
<li>Jenks_Caspall_Forced</li>
<li>Jenks_Caspall_Sampled</li>
<li>Max_P_Classifier</li>
<li>Maximum_Breaks</li>
<li>Natural_Breaks</li>
<li>Quantiles</li>
<li>Percentiles</li>
<li>Std_Mean</li>
<li>User_Defined</li>
</ul>
</div></blockquote>
<p>Let’s apply one of those classifiers into our data and classify the travel times by public transport into 9 classes.</p>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="gp">In [32]: </span><span class="kn">import</span> <span class="nn">pysal</span> <span class="kn">as</span> <span class="nn">ps</span>
<span class="go"># Define the number of classes</span>
<span class="gp">In [33]: </span><span class="n">n_classes</span> <span class="o">=</span> <span class="mi">9</span>
</pre></div>
</div>
<p>The classifier needs to be initialized first with <code class="docutils literal"><span class="pre">make()</span></code> function that takes the number of desired classes as input parameter.</p>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="go"># Create a Natural Breaks classifier</span>
<span class="gp">In [34]: </span><span class="n">classifier</span> <span class="o">=</span> <span class="n">ps</span><span class="o">.</span><span class="n">Natural_Breaks</span><span class="o">.</span><span class="n">make</span><span class="p">(</span><span class="n">k</span><span class="o">=</span><span class="n">n_classes</span><span class="p">)</span>
</pre></div>
</div>
<p>Now we can apply that classifier into our data quite similarly as in our previous examples.</p>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="go"># Classify the data</span>
<span class="gp">In [35]: </span><span class="n">classifications</span> <span class="o">=</span> <span class="n">acc</span><span class="p">[[</span><span class="s1">'pt_r_tt'</span><span class="p">]]</span><span class="o">.</span><span class="n">apply</span><span class="p">(</span><span class="n">classifier</span><span class="p">)</span>
<span class="go"># Let's see what we have</span>
<span class="gp">In [36]: </span><span class="n">classifications</span><span class="o">.</span><span class="n">head</span><span class="p">()</span>
<span class="gh">Out[36]: </span><span class="go"></span>
<span class="go"> pt_r_tt</span>
<span class="go">0 7</span>
<span class="go">1 7</span>
<span class="go">2 6</span>
<span class="go">3 6</span>
<span class="go">4 7</span>
</pre></div>
</div>
<p>Okey, so we have a DataFrame where our input column was classified into 9 different classes (numbers 1-9) based on <a class="reference external" href="http://wiki-1-1930356585.us-east-1.elb.amazonaws.com/wiki/index.php/Jenks_Natural_Breaks_Classification">Natural Breaks classification</a>.</p>
<p>Now we want to join that reclassification into our original data but let’s first rename the column so that we recognize it later on.</p>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="go"># Rename the column so that we know that it was classified with natural breaks</span>
<span class="gp">In [37]: </span><span class="n">classifications</span><span class="o">.</span><span class="n">columns</span> <span class="o">=</span> <span class="p">[</span><span class="s1">'nb_pt_r_tt'</span><span class="p">]</span>
<span class="go"># Join with our original data (here index is the key</span>
<span class="gp">In [38]: </span><span class="n">acc</span> <span class="o">=</span> <span class="n">acc</span><span class="o">.</span><span class="n">join</span><span class="p">(</span><span class="n">classifications</span><span class="p">)</span>
<span class="go"># Let's see how our data looks like</span>
<span class="gp">In [39]: </span><span class="n">acc</span><span class="o">.</span><span class="n">head</span><span class="p">()</span>
<span class="gh">Out[39]: </span><span class="go"></span>
<span class="go"> car_m_d car_m_t car_r_d car_r_t from_id pt_m_d pt_m_t pt_m_tt \</span>
<span class="go">0 15981 36 15988 41 6002702 14698 65 73 </span>
<span class="go">1 16190 34 16197 39 6002701 14661 64 73 </span>
<span class="go">2 15727 33 15733 37 6001132 14256 59 69 </span>
<span class="go">3 15975 33 15982 37 6001131 14512 62 73 </span>
<span class="go">4 16136 35 16143 40 6001138 14730 65 73 </span>
<span class="go"> pt_r_d pt_r_t ... walk_d walk_t GML_ID NAMEFIN NAMESWE \</span>
<span class="go">0 14698 61 ... 14456 207 27517366 Helsinki Helsingfors </span>
<span class="go">1 14661 60 ... 14419 206 27517366 Helsinki Helsingfors </span>
<span class="go">2 14256 55 ... 14014 200 27517366 Helsinki Helsingfors </span>
<span class="go">3 14512 58 ... 14270 204 27517366 Helsinki Helsingfors </span>
<span class="go">4 14730 61 ... 14212 203 27517366 Helsinki Helsingfors </span>
<span class="go"> NATCODE area geometry \</span>
<span class="go">0 091 62499.999976 POLYGON ((391000.0001349226 6667750.00004299, ... </span>
<span class="go">1 091 62499.999977 POLYGON ((390750.0001349644 6668000.000042951,... </span>
<span class="go">2 091 62499.999977 POLYGON ((391000.0001349143 6668000.000042943,... </span>
<span class="go">3 091 62499.999976 POLYGON ((390750.0001349644 6668000.000042951,... </span>
<span class="go">4 091 62499.999977 POLYGON ((392500.0001346234 6668000.000042901,... </span>
<span class="go"> Suitable_area nb_pt_r_tt </span>
<span class="go">0 0 7 </span>
<span class="go">1 0 7 </span>
<span class="go">2 0 6 </span>
<span class="go">3 0 6 </span>
<span class="go">4 0 7 </span>
<span class="go">[5 rows x 22 columns]</span>
</pre></div>
</div>
<p>Great, now we have those values in our accessibility GeoDataFrame. Let’s visualize the results and see how they look.</p>
<div class="highlight-ipython"><div class="highlight"><pre><span></span><span class="go"># Plot</span>
<span class="gp">In [40]: </span><span class="n">acc</span><span class="o">.</span><span class="n">plot</span><span class="p">(</span><span class="n">column</span><span class="o">=</span><span class="s2">"nb_pt_r_tt"</span><span class="p">,</span> <span class="n">linewidth</span><span class="o">=</span><span class="mi">0</span><span class="p">,</span> <span class="n">legend</span><span class="o">=</span><span class="bp">True</span><span class="p">);</span>
<span class="go"># Use tight layour</span>
<span class="gp">In [41]: </span><span class="n">plt</span><span class="o">.</span><span class="n">tight_layout</span><span class="p">()</span>
</pre></div>
</div>
<a class="reference internal image-reference" href="_images/natural_breaks_pt_accessibility.png"><img alt="_images/natural_breaks_pt_accessibility.png" src="_images/natural_breaks_pt_accessibility.png" style="width: 7in;" /></a>
<p>And here we go, now we have a map where we have used one of the common classifiers to classify our data into 9 classes.</p>
</div>
</div>
</div>
</div>
</div>
<footer>
<div class="rst-footer-buttons" role="navigation" aria-label="footer navigation">
<a href="Lesson4-nearest-neighbour.html" class="btn btn-neutral float-right" title="Nearest Neighbour Analysis" accesskey="n">Next <span class="fa fa-arrow-circle-right"></span></a>
<a href="Lesson4-geometric-operations.html" class="btn btn-neutral" title="Geometric operations" accesskey="p"><span class="fa fa-arrow-circle-left"></span> Previous</a>
</div>
<hr/>
<div role="contentinfo">
<p>
© Copyright 2016, Henrikki Tenkanen.
Last updated on Feb 20, 2017.
</p>
</div>
Built with <a href="http://sphinx-doc.org/">Sphinx</a> using a <a href="https://github.com/snide/sphinx_rtd_theme">theme</a> provided by <a href="https://readthedocs.org">Read the Docs</a>.
</footer>
</div>
</div>
</section>
</div>
<script type="text/javascript">
var DOCUMENTATION_OPTIONS = {
URL_ROOT:'./',
VERSION:'1',
COLLAPSE_INDEX:false,
FILE_SUFFIX:'.html',
HAS_SOURCE: true
};
</script>
<script type="text/javascript" src="_static/jquery.js"></script>
<script type="text/javascript" src="_static/underscore.js"></script>
<script type="text/javascript" src="_static/doctools.js"></script>
<script type="text/javascript" src="https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML"></script>
<script type="text/javascript" src="_static/js/theme.js"></script>
<script type="text/javascript">
jQuery(function () {
SphinxRtdTheme.StickyNav.enable();
});
</script>
<li><a href="http://project.invalid/">Project Homepage</a> »</li>
<div class="footer">
<img src="../img/GPLv3_Logo.svg">
</div>
</body>
</html>