It is well known that over the past few decades remote sensing technologies have been achieved a major development in the research field with the advancement in both technology and computational ways. There are different types of remote sensing techniques available like multispectral, hyperspectral, microwave, and LIDAR remote sensing, etc.

Hyperspectral imagery (HSI) is distinguished by its high spectral resolution. Because of this, it has been popular among all the other imageries in recent years. Hyperspectral imaging technology is formed of images with hundreds to thousand number of spectral bands.  These sensors usually sample the reflective portion of the electromagnetic spectrum varying from visible to the shortwave infrared region.

Spectral resolution can be defined as the number of spectral channels and the electromagnetic spectrum range measured by the sensor. Suppose an imaging sensor captures in a wide spectrum but it only acquires a small number of spectral channels, results in low spectral resolution. Whereas if a sensor operates in a small range of the spectrum, but captures a large number of spectral bands, leads to high spectral resolution, which indicates its potential to differentiate objects which have a similar spectral response.  Different objects on the earth’s surface interact differently with electromagnetic radiations. For each object, distinct patterns of reflection or radiation will be present in different regions of the electromagnetic spectrum [1]. Hence HSI can be widely used in the studies of individual plant monitoring, crop classification, weed extraction, yield prediction, etc.

It is possible to embed the hyperspectral sensors in satellite-based, airborne based as well as a drone (UAV) based platforms.

A 3D hyperspectral data cube consists of n1*n2*d number of pixels in which n1 and n2 represent the width and height of each spectral channel and d represents the number of spectral channels (figure 1).

Figure 1: A generic scheme of HSI mapping of soil, vegetation and water (Source: [1])
The Hyperspectral imagery can be used in applications such as precision agriculture, water management, food quality analysis, defense, medical diagnosis, mineralogical mapping over the earth surface, archaeology etc.

Hope you have got a brief idea of what is hyperspectral remote sensing. Now let’s see how to download some standard hyperspectral datasets. You can follow this website for downloading the standard dataset as well as the ground truth. Other than this for downloading AVIRIS (airborne) data you can use this link. And also you can download EO-1 (Hyperion) images here.

For our analysis, we have used the standard hyperspectral images available from the first link mentioned above. And now we will see how to read the images and pre-processing the downloaded image using python.

Import required libraries

import numpy as np
import scipy.io as sio
import sklearn
from sklearn.decomposition import PCA
from sklearn import preprocessing
import matplotlib.pyplot as plt

Import the dataset

The downloaded image is in .mat file format. So in python we can use loadmat function from scipy.io library for inputting images in .mat extension.

indian_pines = sio.loadmat('Dataset\Indian_pines_corrected.mat')
print(indian_pines)

Output:

The variable indian_pines contains following data, out of which we only need the data associated with key ‘indian_pines_corrected’.Indian_pines_key will have 4 keys of the data and Indian_pines_IN will take the data associated with the key ‘indian_pines_corrected’.

indian_pines_key=list(indian_pines.keys())
indian_pines_IN = (indian_pines[indian_pines_key[3]])
print(indian_pines_IN)

Output:

# To check the shape of the dataset

print(indian_pines_IN.shape)

Output:

(145, 145, 200)

To plot the dataset-

fig = plt.figure(figsize = (10,10))
plt.imshow(indian_pines_IN[:,:,2], interpolation='nearest')
plt.show()

Dimensionality Reduction using PCA

Now we will check how to do the Dimensionality Reduction (DR) of hyperspectral images and why do we need to do the DR. As we know, hyperspectral imagery has several spectral bands it requires large time and space to process and store these images. Reduction in the number of spectral channels can be possible by DR techniques, which can further lead to the improvement in the performance. How can we choose which band to remove or which band is more important? It can be done fixed using DR methods. Some of the DR methods are PCA, NMF, SVD etc.

We will look into one of the most widely used DR techniques called Principal Component Analysis (PCA). It is a linear dimensionality reduction technique, which works based on the correlation and variance of the band. It will try to keep the bands with maximum variance. This is calculated purely based on eigenvalues and eigenvectors.

Scale the data

Standardize the dataset since PCA has an impact on the scale of the data. Before that reshape the data into 2D form because PCA and Standard scaler accept 2D data.

indian_pine_data = indian_pines_IN.reshape(np.prod(indian_pines_IN.shape[:2]),np.prod(indian_pines_IN.shape[2:]))

# New shape of the data is
print(indian_pine_data.shape)

Output:

(21025, 200)

To check the dataset-

print(indian_pine_data)

Standardize data

from sklearn.preprocessing import StandardScaler
indian_pine_data = StandardScaler().fit_transform(indian_pine_data)

To check shape of the data-

print(indian_pine_data.shape)

Output:

(21025, 200)

print(indian_pine_data)

Now by looking at the data it can be observed how the data has changed or scaled.

Here we are going to reduce the dimensions from 200 to 20.
from sklearn.decomposition import PCA
pca_decompostn = PCA(n_components=20)
indian_pine_data_pca = pca_decompostn.fit_transform(indian_pine_data)

You can find the amount of information provided by each band using explained_variance_ratio_.

print(pca_decompostn.explained_variance_ratio_)

Output:

From the above result we can say that almost 68% of information is present in 1 st principal component and 19% in second principal component and so on.

print(indian_pine_data_pca.shape)

Output:

(21025, 20)

To reshape the array-

indian_pine_data_pca_new = indian_pine_data_pca.reshape(145,145,20)
print(indian_pine_data_pca_new.shape)

Output:

(145, 145, 20)

To display one band data-

fig = plt.figure(figsize = (10,10))
plt.imshow(indian_pine_data_pca_new[:,:,1], interpolation='nearest')
plt.show()

In this tutorial, we discussed about what is hyperspectral imagery, how can we download and read the data using python, and how to apply PCA on hyperspectral imagery.

References:

  1. Govender, Megandhren, K. Chetty, and Hartley Bulcock. “A review of hyperspectral remote sensing and its application in vegetation and water resource studies.” Water Sa 33.2 (2007).
  2. https://scikit-learn.org/stable/modules/generated/sklearn.decomposition.PCA.html
  3. http://lesun.weebly.com/hyperspectral-data-set.html
  4. https://docs.scipy.org/doc/scipy/reference/generated/scipy.io.loadmat.html
  5. https://itres.com/

 

For any queries related to this blog, you can reach me at Anagha P.

Leave A Comment

All fields marked with an asterisk (*) are required