Comparison between neural network and machine learning algorithms

Document 1

Model predictors were 10 neuropsychological tests currently used in the diagnosis of dementia. Statistical distributions of classification parameters obtained from a 5-fold cross-validation were compared using the Friedman's nonparametric test. [1] Machine learning is an area of study on computer science applies algorithms on a set of data samples to discover patterns of interest. Supervised Learning is a type of machine learning algorithm that is used if one want to discover known patterns on unknown data. Unsupervised Learning is another type of machine learning algorithm, used if someone wants to discover unknown patterns on known data. It is possible to identify people with cognitive complaints who are at risk for the progression to dementia, that is to say, who have Mild Cognitive Impairment (MCI).

Since the establishment of MCI requires the demonstration of cognitive decline greater than expected for an individual's age and education level, neuropsychological testing is a key element in the diagnostic procedures. [2] Recently, it has become possible to identify the biomarkers, of Alzheimer's disease in patients with MCI, by the use of Magnetic Resonance Imaging (MRI) volumetric studies, neurochemical analysis of the cerebrospinal fluid, and Positron Emission Tomography (PET) scan. These studies, however, are expensive, technically challenging, some invasive, and not widely available. Longitudinal studies assessing the predictive value of neuropsychological tests in progression of MCI patients to dementia have shown an area under the receiver operating characteristic curve of 61-94% (being higher for tests assessing verbal episodic memory) but with lower accuracy and sensitivity values.

[1] However, this superiority is not apparent with all data sets, especially with real data. Results regarding the superiority of classification accuracy of newer classification methods as compared to traditional, less computer demanding methods, as well as the stability of the findings are still controversial. Most comparisons between methods are based only on total classification accuracy and/or error rates; they involve human intervention for training and optimization of the data mining classifiers vs. out-of-the-box results for the traditional classifiers. Furthermore, in medical contexts, sensitivity (the ability to predict the condition when the condition is present), specificity (the ability to predict the absence of the condition when the condition is not present) as well as the classifier discriminant power (as estimated from the area under the Receiver Operating Characteristic (ROC) curve) are key features that must be considered when comparing classifiers and diagnostic methods.

The second phase had an average age of 76 years with the difference in age between dementia patients and MCI being five years. In measuring the patient’s cognitive ability, their level of education affected the results. There was a formal consent from all the participants for the study to be carried out. [3] Classifiers Discriminant Analysis The oldest classifier still in use was devised almost 100 years ago by Sir R. Fisher. 5 (or other user pre-defined threshold value), the subject is classified into the success group; otherwise, it is classified into the failure group. Neural Networks Neural Networks (NN) methods have been used extensively in classification problems and this is one of the most active research and application areas in the Neural Networks field.

Inspired from the biological neuron cells, a NN is a multi-stage, multi-unit classifier, with input, hidden or processing, and output layers as illustrated by Figure 1. Figure 1 Pictorial representation of a neural network (multilayer perceptron) with input layer (dendrites), hidden layer (nucleus) and output layer (axon For a polytomous criterion y k with k classes, the NN can be described by general the model yˆk=f(∑j=1hokj⋅g(∑i=1pwjixi+x0j)+o0k)=fk(x,w,o,x0,o0k,θ)= Where x is the vector of p predictors, w is the vector of input weights, o is the vector of hidden weights for the hidden layer, x0 and o0k are bias (memory) constants. The functions g(. To find the optimum plane furthest from both {-1} and {+1} groups, one strategy is to maximize the distance or margin of separation from the supporting planes, respectively w'ϕ(x) + b≥ +1 for the {+1} group and w'ϕ(x) + b ≤ -1 for the {-1} group.

These support planes are pushed apart until they bum into a small number of observations or training patterns that respect the above constrains and thus are called support vectors. Figure 2illustrates this concept. The classification goal can be achieved by maximizing the distance or margin of separation r between the two planes w'ϕ(x) + b = +1 and w'x + b = -1 given by r = 2/|| w ||. This is equivalent to minimizing the cost function Figure 2 Schematic representation of the optimum hyperplane (H0) by a Support Vector Machine. The use of kernel functions has the advantage of operating in the original input variables where the solution of the classification problem is a weighted sum of kernels evaluated at the support vectors. Classification Trees Classification Trees (CT) are non-parametric classifiers that construct hierarchical decision trees by splitting data among classes of the criterion at a given step (node) accordingly to an "if-then" rule applied to a set of predictors, into two child nodes repeatedly, from a root node that contains the whole sample.

Thus, CT can select the predictors and its interactions that are most important in determining an outcome for a criterion variable. The development of a CT is supported on three major elements: 1. Choosing a sample-splitting rule that defines the tree branch which connect the classification nodes. In QUEST, the homogeneity of groups at each branch of the tree is evaluated with the ratio of the within group variance and between group variances for continuous predictors which define the F statistic: FX=∑Cc=1nc(t)(x̄ c(t)−x̄ (t))2(C−1)∑ni=1(xi−x̄ c(t))(n(t)−C)~F(C−1;n(t)−C) Where x̄ c(t) is the average of predictor X in the c group at node t and x̄ (t) is the average of predictor X at node t for all groups.

For categorical predictors, a chi-square like statistic similar to the one defined for a CHAID is used. Random Forests Random Forests (RF) were proposed by Leo Breinman. This "ensemble learning" classification method construct a series of CART using random bootstrap samples of the original data sample. Each of these trees is built from further random sub-set of the total predictors who maximize the classification criteria at each node. The final sample was composed by 400 patients (see Table 1 for sample demographics) who gave voluntary consent to participate in this study. The local ethics committee approved the study. Table 1 Sample demographics: The two groups in the criterion were "MCI" - Mild Cognitive impaired patients; and "Dementia" patients. The class to predict was "Dementia".

P-values for group comparison were obtained from Student's-t test (†) or χ2 test (‡). With regards to the MMSE-KC data, only location and data got ranked the highest with the rest being at the bottom. Classifiers Data mining is one of the most significant impacts of machine learning. With it, people can analyze data and discern the relationship between various variables as the mistakes hinder the whole process of solving the issue. The presence of the machine learning algorithms, every instance in the dataset can be represented in a consistent set of features binary, continuous or categorical. Among the methods of machine learning, we have the “support vector machine” (SVM), Naive Bayes, random forest, bagging, logistic repression and the multilayer perceptron (Chen & Herskovits 2010). The figure thus reveals that MLP records lowest error in the RMSE in comparison to the other algorithms (Chen, & Herskovits, 2010).

Based on accuracy, MLP records about 97. 2% algorithm value being the highest accuracy with bagging and random forest recording 94. 4% and 96. 3% respectively. 71% respectively. The figure further shows that MLP records the lowest classification accuracy. In summing up, MLP reveals the highest accuracy of 97. 2% as seen in the first phase followed by bagging and random forest. The lowest accuracy is the naïve Bayes with 81. No statistically significant differences were found in the total accuracy of 8 of the 10 evaluated classifiers (Medians between 0. 63 and 0. 73), but RF (Me = 0. 74) and SVM (Me = 0. 76) obtained statistically significant higher classification accuracy. Only six of the ten classifiers tested showed median sensitivity larger than 0. 5 (and only five had 1st quartile sensitivity larger than 0. Considering that conversion into dementia is the key prediction in this biomedical application and thus higher sensitivity of classifiers is required, classifiers like Logistic Regression, Neural Networks, Support Vector Machines and CHAID trees are inappropriate for this type of binary classification task.

Similar findings were observed in studies comparing different classifiers in other biomedical conditions. Total accuracy of classifiers is misleading since some classifiers are good only at predicting the larger group membership (high specificity) but quite insufficient at predicting the smaller group membership (low sensitivity). The present sample size was not, apparently, limiting for the achievement of an acceptable accuracy, specificity and sensitivity of both Random Forests and LDA, as reported elsewhere. Furthermore, there are studies with relatively small samples where data mining techniques, like SVM and Neural Networks have been used with high accuracy in classification problems. Equivalent or even superior performances have been reported for Linear Discriminant Analysis and Random Forests when compared with Neural Networks, Classification Trees and Support Vector Machines.

However, controversy still prevails regarding the effects on classifiers' performance of different combinations of predictors, data assumptions, sample sizes and parameters tuning. Different application with different data sets (both real and simulated) have failed to produce a classifier that ranks best in all applications as shown in the studies by Michie et al. However, it is noteworthy to mention that Fisher's Linear Discriminant Analysis, a classifier devised almost a century ago, stands up against computer intensive classifiers, as a simple, efficient, user- and time-proof classifier. References 1. Prince, M. The Global Impact of Dementia: An Analysis of Prevalence, Incidence, Cost and Trends; Alzheimer’s disease International (ADI): London, UK 2. Trambaiolli, L. Neuropsychological Assessment; Oxford University Press: Oxford, MS, USA, 5. Kim, K. Park, J.

From $10 to earn access

Only on Studyloop

Original template

Downloadable

Content type:User generated

Words:5753

Pages:21

Level:High School

Language:English

Reference list:Yes

Formatting:AMA

Uploaded by:Debbie Fuller