Canva-Close-Up-Photo-of-Assorted-Color-of-Push-Pin

Geocoding :

Geocoding refers to the process of taking inputs such as an address or the name of a place in the form of text, and returning latitude-longitude information i.e. location on the Earth’s surface for that place.

Reverse Geocoding :

Reverse Geocoding is a process which takes geographic coordinates (latitude-longitude) information as input and results in the description of a location such as a name, address.

Why Do We Need Geocoding?

As compared to traditional address systems, geocoding has some advantages such as-

1) High Precision & Accuracy:

Traditional address systems can be sometime unprecise, and it can contain spelling mistakes due to inevitable human errors.

geocoding, on the other hand, is more precise and accurate and therefore does not create confusion.

2) Universally Consistent :

Traditional Address system is bound by languages and common names which may mislead the information. E.g. In India, there are more than a thousand road names like Mahatma Gandhi Road. Obviously, these frequently recurring street names can lead to misconceptions.

Geocoding makes it easy with the universal acceptable system. Data is guaranteed to be consistent, straightforward and clear.

Let’s Begin !!

In this tutorial, we will use the table from Wikipedia

Whoila a little surprise!! 🙂

We will geocode the addresses of National Parks In India

To start with, we need to import the required libraries which will help us to achieve our objectives:

1) Geocoding the addresses using Geopandas Library

2) Plotting and Visualization using Folium

In previous articles, we have seen information about Geopandas Library.

lxml is the most feature-rich and easy-to-use library for processing XML and HTML in the Python language.

import pandas as pd
import geopandas as gpd
from geopandas.tools import geocode
import lxml

We will use the geopandas to access the geocode tool. The Pandas Library comes with a method that reads the HTML path given in args.

After reading the HTML through pandas check the type of the variable. It shows list as a datatype but if you see cautiously it is nothing but a list of data frames.

data= pd.read_html('https://en.wikipedia.org/wiki/List_of_national_parks_of_India')
#type(data)

data=data[1]
data

NameStateEstablishedArea (in km2)NotabilityRivers and lakes inside the national parkUnnamed: 6
0Anshi National ParkKarnataka1987417.34Great hornbill, tiger, leopard, black panther,…Kali River (Karnataka)NaN
1Balphakram National ParkMeghalaya1986220.00Wild water buffalo, red panda, elephant and ei…NaNNaN
2Bandhavgarh National ParkMadhya Pradesh1982446.001336 species of endemic plantsNaNNaN
3Bandipur National ParkKarnataka1974874.20Chital, Bengal tiger, gray langurs, Indian gia…Kabini River, Moyar RiverNaN
4Bannerghatta National ParkKarnataka1986104.30Tiger, sloth bear, peacock, elephant, sambar d…NaNNaN
100Valmiki National ParkBihar1976898.45NaNNaNNaN
101Vansda National ParkGujarat197923.99NaNNaNNaN
102Van Vihar National ParkMadhya Pradesh19834.48NaNIndravati (kutri) National ParkChhattisgarh

If we want to keep only certain columns and get rid of other columns which will not be used further, we will drop the columns

and make a dataset as shown in the code below:

data_cleaned=data.drop(columns=['Notability', 'Rivers and lakes inside the national park', 'Unnamed: 6'])
#data_cleaned

Here comes the most important step:

We will use a For Loop, it will iterate through the rows of our dataframe and save the information in a given variable here variable is named as ‘info’.

for index, row in data_cleaned.iterrows():
    print (row['Name'])
    info= geocode(str(row['Name']), provider='arcgis')#(timeout=3))
    #data_cleaned.loc[int(index), 'Address']= info['address'].loc[0]
    data_cleaned.loc[int(index), 'lon']=info['geometry'].loc[0].x
    data_cleaned.loc[int(index), 'lat']=info['geometry'].loc[0].y

Let’s check our data frame, we can see two extra columns containing Longitude and Latitude information of that area.

This means that we have geocoded the Names of National Parks in India and identified its location.

We will save the data frame to CSV format for further reference

data_cleaned
NameStateEstablishedArea (in km2)lonlat
0Anshi National ParkKarnataka1987417.3474.3563715.00966
1Balphakram National ParkMeghalaya1986220.0090.6437525.21499
2Bandhavgarh National ParkMadhya Pradesh1982446.0081.0245823.72319
3Bandipur National ParkKarnataka1974874.2076.5976011.72309
4Bannerghatta National ParkKarnataka1986104.3077.5774212.80102
98Tadoba National ParkMaharashtra1955625.0079.4650420.23460
99Valley of Flowers National ParkUttarakhand198287.5079.5870830.72513
100Valmiki National ParkBihar1976898.4583.9616127.35111
101Vansda National ParkGujarat197923.9973.4860520.76368
102Van Vihar National ParkMadhya Pradesh19834.4877.3636623.22284
data_cleaned.to_csv('national_parks_india.csv')

Plotting HeatMap Using Folium Library Python

The primary purpose of Heat Maps is to better visualize the volume of locations/events within a dataset and assist in directing viewers towards areas on data visualizations that matter most.

Here, as the heatmap shows: There are more national parks in the North and East side of India, As compared to that of Western and North-West zone.

For plotting a heat map, We will use Folium along with plugins called HeatMap. be cautious because HeatMap takes lat-lon tuples in the form of a list.

We will zip the lat-lon values from our dataframe and create a list of tuples

import os
import folium
from folium.plugins import HeatMap
locations = list(zip(data_cleaned['lat'], data_cleaned['lon']))
map_= folium.Map([28.59, 78.96], zoom_start=6)
HeatMap(locations).add_to(map_)
map_






map_.save('heatmap_national_parks.html')

Plotting Marker Clusters Using Folium Library Python

The Marker Clustering is a technique that creates a cluster at a particular marker and adds markers that are in its bounds. It repeats this process until all markers are allocated to the closest grid-based marker clusters based on the map’s zoom level. Unlike HeatMaps, we can give popups to each marker. Therefore, every time when the user clicks on the marker, it will show a popup.

Here, we will show the National Park’s name as a popup

popup= [names for names in data_cleaned['Name']]
#popup

from folium.plugins import MarkerCluster
m = folium.Map(location=[28.59, 78.96], zoom_start=5)
marker_cluster =  MarkerCluster(
    locations=locations, popups=popup,
    overlay=True,
    control=True,
    #icon_create_function=icon_create_function
)
marker_cluster.add_to(m)
m



Hence, we are done with the Geocoding with Geopandas. Hope you enjoyed it!

Please follow my LinkedIn page for interesting updates

(1) Comment

Leave A Comment

All fields marked with an asterisk (*) are required