Geocoding :
Geocoding refers to the process of taking inputs such as an address or the name of a place in the form of text, and returning latitude-longitude information i.e. location on the Earth’s surface for that place.
Reverse Geocoding :
Reverse Geocoding is a process which takes geographic coordinates (latitude-longitude) information as input and results in the description of a location such as a name, address.
Why Do We Need Geocoding?
As compared to traditional address systems, geocoding has some advantages such as-
1) High Precision & Accuracy:
Traditional address systems can be sometime unprecise, and it can contain spelling mistakes due to inevitable human errors.
geocoding, on the other hand, is more precise and accurate and therefore does not create confusion.
2) Universally Consistent :
Traditional Address system is bound by languages and common names which may mislead the information. E.g. In India, there are more than a thousand road names like Mahatma Gandhi Road. Obviously, these frequently recurring street names can lead to misconceptions.
Geocoding makes it easy with the universal acceptable system. Data is guaranteed to be consistent, straightforward and clear.
Let’s Begin !!
In this tutorial, we will use the table from Wikipedia
Whoila a little surprise!! 🙂
We will geocode the addresses of National Parks In India
To start with, we need to import the required libraries which will help us to achieve our objectives:
1) Geocoding the addresses using Geopandas Library
2) Plotting and Visualization using Folium
In previous articles, we have seen information about Geopandas Library.
lxml is the most feature-rich and easy-to-use library for processing XML and HTML in the Python language.
import pandas as pd import geopandas as gpd from geopandas.tools import geocode import lxml
We will use the geopandas to access the geocode tool. The Pandas Library comes with a method that reads the HTML path given in args.
After reading the HTML through pandas check the type of the variable. It shows list as a datatype but if you see cautiously it is nothing but a list of data frames.
We will access the list item ( here 1st), which contains the information required as given in the link before.
data= pd.read_html('https://en.wikipedia.org/wiki/List_of_national_parks_of_India') #type(data)
data=data[1] data
Name | State | Established | Area (in km2) | Notability | Rivers and lakes inside the national park | Unnamed: 6 | |
---|---|---|---|---|---|---|---|
0 | Anshi National Park | Karnataka | 1987 | 417.34 | Great hornbill, tiger, leopard, black panther,… | Kali River (Karnataka) | NaN |
1 | Balphakram National Park | Meghalaya | 1986 | 220.00 | Wild water buffalo, red panda, elephant and ei… | NaN | NaN |
2 | Bandhavgarh National Park | Madhya Pradesh | 1982 | 446.00 | 1336 species of endemic plants | NaN | NaN |
3 | Bandipur National Park | Karnataka | 1974 | 874.20 | Chital, Bengal tiger, gray langurs, Indian gia… | Kabini River, Moyar River | NaN |
4 | Bannerghatta National Park | Karnataka | 1986 | 104.30 | Tiger, sloth bear, peacock, elephant, sambar d… | NaN | NaN |
… | … | … | … | … | … | … | … |
100 | Valmiki National Park | Bihar | 1976 | 898.45 | NaN | NaN | NaN |
101 | Vansda National Park | Gujarat | 1979 | 23.99 | NaN | NaN | NaN |
102 | Van Vihar National Park | Madhya Pradesh | 1983 | 4.48 | NaN | Indravati (kutri) National Park | Chhattisgarh |
If we want to keep only certain columns and get rid of other columns which will not be used further, we will drop the columns
and make a dataset as shown in the code below:
data_cleaned=data.drop(columns=['Notability', 'Rivers and lakes inside the national park', 'Unnamed: 6'])
#data_cleaned
Here comes the most important step:
We will use a For Loop, it will iterate through the rows of our dataframe and save the information in a given variable here variable is named as ‘info’.
for index, row in data_cleaned.iterrows(): print (row['Name']) info= geocode(str(row['Name']), provider='arcgis')#(timeout=3)) #data_cleaned.loc[int(index), 'Address']= info['address'].loc[0] data_cleaned.loc[int(index), 'lon']=info['geometry'].loc[0].x data_cleaned.loc[int(index), 'lat']=info['geometry'].loc[0].y
Let’s check our data frame, we can see two extra columns containing Longitude and Latitude information of that area.
This means that we have geocoded the Names of National Parks in India and identified its location.
We will save the data frame to CSV format for further reference
data_cleaned
Name | State | Established | Area (in km2) | lon | lat | |
---|---|---|---|---|---|---|
0 | Anshi National Park | Karnataka | 1987 | 417.34 | 74.35637 | 15.00966 |
1 | Balphakram National Park | Meghalaya | 1986 | 220.00 | 90.64375 | 25.21499 |
2 | Bandhavgarh National Park | Madhya Pradesh | 1982 | 446.00 | 81.02458 | 23.72319 |
3 | Bandipur National Park | Karnataka | 1974 | 874.20 | 76.59760 | 11.72309 |
4 | Bannerghatta National Park | Karnataka | 1986 | 104.30 | 77.57742 | 12.80102 |
… | … | … | … | … | … | … |
98 | Tadoba National Park | Maharashtra | 1955 | 625.00 | 79.46504 | 20.23460 |
99 | Valley of Flowers National Park | Uttarakhand | 1982 | 87.50 | 79.58708 | 30.72513 |
100 | Valmiki National Park | Bihar | 1976 | 898.45 | 83.96161 | 27.35111 |
101 | Vansda National Park | Gujarat | 1979 | 23.99 | 73.48605 | 20.76368 |
102 | Van Vihar National Park | Madhya Pradesh | 1983 | 4.48 | 77.36366 | 23.22284 |
data_cleaned.to_csv('national_parks_india.csv')
Plotting HeatMap Using Folium Library Python
The primary purpose of Heat Maps is to better visualize the volume of locations/events within a dataset and assist in directing viewers towards areas on data visualizations that matter most.
Here, as the heatmap shows: There are more national parks in the North and East side of India, As compared to that of Western and North-West zone.
For plotting a heat map, We will use Folium along with plugins called HeatMap. be cautious because HeatMap takes lat-lon tuples in the form of a list.
We will zip the lat-lon values from our dataframe and create a list of tuples
import os import folium from folium.plugins import HeatMap
locations = list(zip(data_cleaned['lat'], data_cleaned['lon']))
map_= folium.Map([28.59, 78.96], zoom_start=6)
HeatMap(locations).add_to(map_)
map_
[iframe src= "https://geospatialawarenesshub.com/wp-content/uploads/2020/03/heatmap_national_parks.html"]
map_.save('heatmap_national_parks.html')
Plotting Marker Clusters Using Folium Library Python
The Marker Clustering is a technique that creates a cluster at a particular marker and adds markers that are in its bounds. It repeats this process until all markers are allocated to the closest grid-based marker clusters based on the map’s zoom level. Unlike HeatMaps, we can give popups to each marker. Therefore, every time when the user clicks on the marker, it will show a popup.
Here, we will show the National Park’s name as a popup
popup= [names for names in data_cleaned['Name']]
#popup
from folium.plugins import MarkerCluster
m = folium.Map(location=[28.59, 78.96], zoom_start=5)
marker_cluster = MarkerCluster(
locations=locations, popups=popup,
overlay=True,
control=True,
#icon_create_function=icon_create_function
)
marker_cluster.add_to(m)
m
[iframe src= "https://geospatialawarenesshub.com/wp-content/uploads/2020/03/MarkerCluster_national_park.html"]
Hence, we are done with the Geocoding with Geopandas. Hope you enjoyed it!
Please follow my LinkedIn page for interesting updates