The Effects of Dimension Reduction in Hyperspectral Data Classification

Document Type:Thesis

Document 1

Name________________________________Signature_____________________Date________ Dedication I dedicate this project to my family, parents, teachers, and friends who have until this far, supported me in various ways, both financially, morally, and emotionally in the pursuit of my postgraduate degree. Above all, I give thanks and glory to the Almighty God for the care and the gift of good life he has bestowed on me throughout this noble course. Abstract Dimension reduction and hyperspectral data imaging systems have in the recent past gained significant attention from various research experts and institutions. By active acquisition of data through the use of data sensors, the middle infrared wavelengths can be altered and captured through various existing spectral channels from a particular defined interface within the surface of the earth. The hyperspectral sensors are quite crucial in the provision of detailed spectral information and this is quite priceless in as far as the increase in accuracy and the discrimination of materials of interest are concerned.

Introduction 1 1. Hyperspectral Remote Sensing. SVM Based Dimensionality Reduction 3 1. PCA Based Dimensionality Reduction 4 1. ISOMAP Based Dimensionality Reduction 5 1. Data Classification and Decision Making 11 3. Support Vector Machine and Decision Making 11 3. ISOMAP Dimensionality Reduction and Decision Making 12 3. PCA Dimensionality Reduction and Decision Making 12 4. Methodology and Experiments. Separability Analysis 22 4. Data and Dataset Validation 22 4. Software 22 5. Conclusion and Feature Work 23 6. bibliography 23 1. Also known as the imaging spectroscopy, hyperspectral remote sensing measures the number of spectra obtained as a result of electromagnetic radiation of light rays from a target point or source. It is through hyperspectral remote sensing that various elements of spectroscopy and imaging are then combined and reduced to a large single system of data sets which are then collected in the x-y plane in a Z-direction to present a pixel which then represents a spectral signature of the imaged material (Burgers et al.

A hyperspectral image is typically a representation of a contiguous set of co-registered bands. The implication here is that different multispectral images have different discrete spectral bands. In most cases however, the hyperspectral images have narrow diameter ranging from 5 micrometres to 20 micrometres depending on the thermal range of infrared regions. In real essence, the SVM comes in handy in enabling the active testing and training of datasets without the dire need to sacrifice accuracies of prediction required during text classification. This is significantly important especially under circumstances where the input dimension space is required to be significantly reduced. PCA Based Dimensionality Reduction The Principal Component Analysis (PCA) is one the most classical methods that are used in dimensionality reduction to achieve a high-dimensionality observation. Ideally, the hyperspectral data pre-processing steps include simplification, dimensionality reduction, cleaning and supervised training notwithstanding.

In Principal Component Analysis, maximum data variability is achieved by modelling the level of covariance data structure as opposed to other reduction techniques such as the Independent Component Analysis(ICA)where the models are based on correlation data structures. Now, by letting H(. be some function which measures the degree at which aTX is interesting for an X data matrix, then the projection index for a is a real-valued function defined by I(a): RK→R such that I(a) = H(aTX). Identification of Research According to a significant number of previous research, dimensionality reduction is known to proportionately increase with an increase in the level of accuracy of classification. However, clear guidelines for selecting procedures for dimensionality reduction using SVM, PCA and Projection Index have not been fully defined. Since most datasets no not follow normal distributions, it is prudent to treat such data with high dimensions using both linear and nonlinear techniques which is present in SVM procedures.

e. preparing pixels) required to keep up least factual certainty and usefulness in hyperspectral information for characterization purposes develops exponentially, making it extremely hard to address this issue satisfactorily. For instance, if we somehow managed to characterize 10 arrive cover classes utilizing 100s or 1000s of HNBs, we will require large preparing tests for each class to build up measurable honesty of order, though broadband information like Landsat can be ordered with essentially less preparing tests for each class. Likewise, more noteworthy measurement of hyperspectral information enables more noteworthy number of classes to be accomplished. Normally, it is of incredible favourable position to have an expansive number of HNBs to order complex land cover classes. For closeness seek, the curse of dimensionality implies that the quantity of items in the informational collection that should be gotten to develops exponentially with the fundamental dimensionality.

The curse of dimensionality is a snag for taking care of dynamic improvement issues by in reverse enlistment. In addition, it renders machine learning issues entangled, when it is important to take in a condition of-nature from limited number information tests in a high dimensional component space. At long last, the curse of dimensionality genuinely influences the inquiry execution for similitude look over multidimensional lists. Dimension Reduction and Extraction of Features The natural angle, or the appointment pivot, is an enlightening feature in the environmental appointment issue. Feature extraction is, subsequently, additionally a critical computational procedure. SVM Data Classification and Dimensionality Reduction 2. PCA Data Classification and Dimensionality Reduction 2. ISOMAP Data Classification and Dimensionality Reduction 3. Data Classification and Decision Making 3. The aftereffects of test ponder affirm the productivity of the offered approaches for Big Data classification.

ISOMAP Dimensionality Reduction and Decision Making When performing perception and classification, individuals regularly go up against the issue of dimensionality reduction. Isomap is a standout amongst the most encouraging nonlinear dimensionality reduction strategies. Nonetheless, when Isomap is connected to genuine data, it demonstrates a few confinements, for example, being touchy to commotion. In this paper, an enhanced variant of Isomap, in particular S-Isomap, is proposed. PCA is an unsupervised direct dimensionality reduction technique, which looks for a subspace of the data that have the greatest change and thusly extends the information data onto it. PCA may not give great classification execution since it does not consider any distinction in the data class. Then again, LDA is a capable customary measurable method for administered data dimensionality reduction and has been connected effectively towards numerous applications to manage high-dimensional data.

It is unique in relation to PCA in that it lessens dimensionality while saving however much of the class unfair data as could reasonably be expected. It unequivocally endeavors to show the distinction between the classes of data. SVMDRC encompasses the development of the framework and the application on the hyperspectral image while SVMC handles the classification of SVM on the hyperspectral image. The two components of the research method are essential for understanding and readability of the concepts. While no dimensionality reduction is carried out in the case of SVMC, dimensionality reduction is performed in together with classification in the case of SVMDRC. Once the training samples have been provided, the SVM classifier determines the position of the hyperplane and the generation of the support vectors for separation of classes of the images occurs.

The research method is performed according to the following steps: 1. The next step is to identify the regions of interest comprising of the pure endmembers, followed by the conversation of the regions of interest to tiff format to facilitate further processing of the image. The dimensionality reduction and image classification are performed in a single process using the unified framework developed. In case the hyperspectral image provides the framework and the region of interest is to be classified, the support vector machine (SVM) performs the dimensionality reduction and classifications 4. Support Vector Machine (SVM) The machine solves the regression and classification problems according to the statistical learning theory and minimization of structural risks. SVM transforms the data to a higher dimensional space using the techniques of nonlinear transformations to separate the classes after identifying the linear space between them.

for a separable data. l has a class labels of yi € (1, -1) and xi €R2. A is represented by +1 while class B as -1. The SVM introduces the hyperplane that separates the points that belong to -1 and +1 on their separate corresponding sides. The hyperplane separates the classes to an extent that the closest vectors belonging to the two classes achieve the maximum separation according to the equation; w. The equation for the variable is given by yi (w. xi +b) -1 + €i ≥ 0. The optimal hyperplane equation is derived by solving the equation: minw,b,€i. €t (1∕2 w2 + C∑i i=1€i). Where C is a constant which minimizes the solution under which €i becomes larger and the constant decides the appropriate hyperplane of the classifier. Polynomial kernels and Radial Basis Function Kernel are the types of the nonlinear SVM 4.

ISOMAP ISOMAP preserves the distances between the data samples based on the concept of classical method of multi-dimensional scaling (MDS) followed by the refining of the distribution of neighbouring data points. ISOMAP encompasses the introduction of various additional procedures to estimate the geodesic length between the points of data taking into account the shortest graph paths. The introduced procedures help to determine the intrinsic geometry of the multidimensional manifold. ISOMAP begins by building the algorithm on the neighbouring graph, then computation by the use of the shortest algorithm path and finally, approximating the distance of geodesic. Such maximum output is obtained by transforming the vector w using the formula; max var(wTx)=cov(wTx,wTx)=wTCw. Since Cw=E(xTx) the equation simplifies to Cw=θw. The constant C is symmetric and semidefinite.

The projected variable with the maximal variance is determined by the eigenvectors w and θ. Moreover, the dimensionality reduction can be performed by the PCA if the input variables are distributed linearly. The components contain the data in the descending variability and therefore, the first data obtained has the maximum variability and maximum amount of data. Unfortunately, no standards are available to calculate the variables thus; it is hard to identify the maximum number of bands used to obtain the dimensionally reduced image. According to the traditional method, the bands were selected by identifying those that fall on the thresholds determined by the percentage variability of the image. Ideally, an intrinsic dimensionality between 2 and 5 can be achieved, which could give significant results if low number of the bands is used. Therefore, the feedback obtained when using the hyperspectral image cannot be satisfactory.

The Jeffries-Matusita (JM) distance measures the class separability between two distributions based on two class probability density functions thus Jij=2(1-e-B) The factor B represents the Bhattacharyya distance and it indicates the decreasing weight of the increasing separability between the classes of the spectral. Having the values ranging from zero to 2, the two represents the classes that are 100% separable while the 0 classes cannot be separated. SVM 4. ISOMAP 4. PCA 4. Due to the fact that the software is an open source, its distributions are done under the GNU. The language needs to import the necessary packages. The imported packages are many and therefore, one has the freedom to choose the best for completing the job. The five packages that have been used in the framework development are: kernlab, rgdal, sp, raster and rioja.

The packages mentioned have distinct responsibilities in the framework developments.

From $10 to earn access

Only on Studyloop

Original template

Downloadable

Content type:User generated

Words:5691

Pages:21

Level:High School

Language:English

Reference list:Yes

Formatting:AMA

Uploaded by:Elizabeth Davison