Microsoft word - trans-jour.doc

Variable Selection for Sickle Cell Anemia Sivanarayana Gaddam, Haofei Fang, and Shanthi Potla space (m < n) that maximizes the classification accuracy Abstract—Variable Selection is the process of deriving a new
[1,2,6]. GA maintains a pool of competing feature matrices subset of variables from the original features in order to increase
and each members of the pool are then sent to classifier for classifier efficiency, and allow higher classification accuracy. In
fitness evaluation. The fitness information will be used as a general, because of higher dimensionality the deterministic
algorithms cannot be applied unless we map the original pattern
vectors to the new vectors of lower dimensionality. But many
feature extraction techniques have their own disadvantages. The
GA typically requires a solution to be encoded in the form of popular techniques such as Principal Component Analysis (PCA)
chromosome and a fitness function for evaluation. Siedlecki and Linear Discriminant Analysis (LDA) conduct feature
and Sklansky [7] proposed a simple direct GA approach for extraction independently with the classifier, which may degrade
feature selection. They used a binary chromosome of length n the performance of classifiers. Here, we followed wrapper model
(dimensions) where each gene (bit) is associated with the to feature extraction in which feature selection, feature extraction
and classifier training performed simultaneously using genetic
feature. If the ith bit is 0 then the corresponding ith feature will algorithm (GA) and artificial neural networks (ANN). We tested
be discarded other wise considered for classification. Each our algorithm on sickle cell patients data and found 10 of the 23
chromosome is evaluated on a set of test data using k-nearest parameters are sufficient to train ANNs for classification.
neighbor classification. This technique was further extended to Furthermore, we compared the performance of GA with the PCA
allow linear feature extraction [6] and provided a basis for feature extraction technique and selection based on the
contribution.

Index Terms—Genetic Algorithm, Artificial Neural Network,
Here, we followed the simple genetic algorithm approach but Principal Component Analysis, Linear Discriminant Analysis,
we used artificial neural networks as wrapper method for Feature Selection, Feature Extraction, Wrapper Model.
fitness evaluation. We did not use the K-NN classification because of its sensitiveness to the redundant features and not suitable for difficult classification tasks. Moreover, K-NN is outperformed by ANN on many difficult classification tasks [8]. We tested our approach on sickle cell anemia data and THE purpose of feature selection is to design a compact compared our results with PCA feature extraction combined classifier with high classification accuracy. The selection process should remove useless, redundant features [1]. Over the years, thorough investigation has been carried out and several variable selection algorithms are proposed [3] and comparative studies [4]. Finding an optimal feature subset is We used sickle cell anemia patient’s data to test our usually intractable and many problems have been related to algorithms in this project. Among different treatments, the first feature selection shown to be NP-hard [2]. Thus, an efficient approved drug for the causative treatment of sickle-cell search strategy is required. Recently, genetic algorithms have anemia, Hydroxyurea, was shown to decrease the number and drawn attention due to their capability of finding approximate severity of attacks in a study in 1995 (Charache et al) [12]. For this specific treatment, patients will have variant response on the Hydroxyurea [9]; therefore, clinicians have to classify GAs are a particular class of evolutionary algorithms that use patients into two classes, responding and non-responding to techniques inspired by the evolutionary biology such as take more effective treatment. In the original dataset, some of inheritance, mutation, crossover and selection [5]. GAs are the patients did not have data for all the 23 parameters and parallel, iterative optimizers and have been successfully represented with zero. In order to alleviate the effect of zero applied to many optimization problems. Typically, given an n- data on back propagation neural networks, we shifted those dimensional space, the GAs task is to identify m-dimensional values by one because, ANNs associate a special meaning with the number zero. The shifting principle is applied to all β globin haplotypes (Bantu, Benin, Cameroon, Senegal) and to gene contains 0 means that the corresponding ith feature is nucleate red blood cell counts (NRBC). The ANN algorithm discarded else included in the data set for classification. looks for differences between input values and therefore, the differences between 1,2,3 are tantamount to the differences The performance of the chromosome is determined according to the classification accuracy. The fitness function can be described as below should be maximized. PCA is one of the popular techniques for dimensionality reduction. In general, principle component analysis can be Step1: For PCA to work correctly, we have to subtract mean from each of the data dimensions. This is produced a dataset whose mean is zero. The mean subtraction is part of PCA in order to minimize the mean square error of approximation. Step2: Calculate the covariance matrix of mean adjusted Step3: Calculate the eigen values and eigen vectors of the covariance matrix. These are very important because they tell us the useful information about the data. In fact, the eigen Fig.1. A binary chromosome of length N (no of dimensions) vector with the highest eigen value is the principle component Step4: Derive the new data set. Once we have chosen our data we wish to keep in our data and formed a feature vector C1: 111000 100000 000011 001000 000011
by taking the transpose of the vector multiplied by the original C2: 000100 010000 000011 000100 001100
C1: 111000 100000 000011 000100 001100
A simple variable selection [10] has applied on the data C2: 000100 010000 000011 001000 000011
obtained using PCA analysis. This method considers every feature into account using the following formula. Fig.2. Crossover operator exchanges the information between Then we sort the resultant vector to get a ranking for the C1: 000100 010000 000011 000100 001100
variables. A threshold can then be used to select the
variables. In this project, we used 95% threshold to select C1: 000100 010000 100011 000100 001100
Fig.3. Mutation operator helps to escape from local minima There are three design considerations to consider when implementing a GA to solve a particular problem. First, a solution must be encoded on GA chromosome. Secondly, an objective function needs to be identified to evaluate the fitness of a chromosome. Finally, GA run parameters must be specified including genetic operators and their probabilities. Chromosome For the GA feature extractor, definition of chromosome is fairly straightforward. A binary chromosome(Fig1) of length n (dimensions) in which a gene value of “0” indicates that the corresponding feature is discarded. That means, if ith An artificial neural network is a system based on the operation of biological neural networks, in other words, is an emulation of biological neural system. [14] The commonest type of artificial neural network consists of three layers. [15] While (Termination Condition is not satisfied) do Each perceptron in the layers has its own weights and activation function. Those weights will be updated according to the error from the calculated output and desired output. The effect of the error would be propagated back from the output layer to the hidden layer(s). This kind of network also is called The activation function also can vary from needs. Here we use sigmoid function for both hidden layer and output layer. The promotion of chromosome to the next generation is F(Ch) > min (mean(prev population),θ ) θ is a threshold which is a control parameter of our genetic algorithm. In this project, we experimentally chosen 70% as threshold but it can be adapted automatically based on the We have implemented the standard operators such as crossover and Mutation with high crossover probability (0.7) and low mutation probability (0.01). Crossover operator (Fig2) aims to interchange the information and genes between chromosomes. Therefore, crossover operator combines two or more parents to reproduce new children, then, one of these children may hopefully collect all good features that exist in Generations
his parents. Mutation (Fig3) is a genetic operator used to maintain chromosomes to the next. The purpose of mutation operator is Fig.5.Exp1: Genetic Algorithm terminated at 50 generations. to allow the algorithm to avoid local minima by preventing the population of chromosomes from becoming too similar to each other. Genetic Algorithm in the project implemented as shown in Fig4. We terminated the algorithm based on the number of generations. The results show that there is no intersection of feature subset selected by KLE expansion and the feature subset selected using exhaustive search [9]. Our approach involves the classifier to find the best subset of features. On the other hand, KLE expansion works independently with the classifier, which may degrade the classifier performance. The proposed GA algorithm is suitable for large-scale selection problems and has high possibility to find better solutions. Despite the several advantages, our approach is methodologies. Other common disadvantage of GA is the premature convergence. We overcome this problem by carefully designed the algorithm in such a way that diversity of Generations
population is maintained. In this research project, we encoded the solution in a binary chromosome of length tantamount to Algorithm terminated at 100 number of dimensions. This approach may not be a good way of representing solution for micro array data or any other high dimensional problem domains. Despite all these minor disadvantages, GA is a good technique for variable selection. [1] Seok Oh, Jin-Seon Lee, Byung-Ro Moon, “Hybrid Genetic Algorithms for Feature Selection”. ieee transactions on pattern analysis and machine intelligence, vol. 26, no. 11, november 2004 [2] Huan Liu, Lei Yu, “Toward Integrating Feature SelectionAlgorithms for Classification and Clustering”. ieee transactions on knowledge and data engineering, vol. 17, no. 4, april 2005 [3] Anil Jain and Douglas Zongker, “ Feature Selection: Evaluation, Application and Small Sample Performance”. ieee transactions on pattern analysis and machine intelligence, vol. 19, no. 2, february 1997 [4] FJ.Ferri, P.Pudil, M.Hatef and J.Kittler, “Comparative Study of Techniques for Large-Scale Feature Selection.” [5] “Introduction to Genetic Algorithms” by Joachim Stender,Brainware Table.1 Selected variables and accuracy from methods [6] Michael L. Raymer, William F. Punch, Erik D. Goodman, Leslie A. Kuhn, and Anil K. Jain."Dimensionality Reduction Using Genetic Algorithms".IEEE transactions on evolutionary computation, vol. 4, no. Feature Selection using wrapper model has produced better [7] W. Siedlecki and J. Sklansky, “A note on genetic algorithms for results than the other two approaches. For the sickle cell largescale feature selection,” Pattern Recognit. Lett., vol. 10, pp. 335–347,1989. anemia data, the integrated approach of GA and ANN selected 10 relevant variables and obtained a classification accuracy of [8] P´adraig Cunningham1 and Sarah Jane Delany2 "k-Nearest Neighbour 87.5%. GA can be better than the recursive elimination used in [9] in following two points: (1) GA is controllable in the [9] Homayoun Valafar, Faramarz Valafar, Alan Darvill and Peter execution time, indeed we can terminate the generation Albersheim, Complex Carbohydrate Research Center and the whenever we want, and (2) the result of GA can be improved Department of Biochemistry and Molecular Biology, University of Georgia, 220 Riverbend Road, Athens, GA 30602 and Abdullah Kutlar, by repeating trials and by varying the values of parameters. Kristy F. Woods, and John Hardin, Department of Medicine,Medical With regard to this algorithm, GA seems preferable for all College of Georgia, Augusta, GA 30912. “ predicting the effectiveness large-scale problems in which n >100. For example, to get the of hydroxyurea in ndividual sickle cell anemia patients”. Journal of optimal solution, the recursive elimination algorithm evaluates Artificial Intelligence in Medicine, 18 (2): 133-148, February 2000 2^100 –1 combinations. In our setting of GA, we need only [10] Valafar, Faramarz, San Diego State University. “ Lecture Notes: 5000 evaluations for n >100 to find the approximate solution. Methods in Bio informatics and medical informatics ” [11] Abhilash Alexander Miranda · Yann-Aël Le Borgne ·Gianluca Bontempi." New Routes from Minimal Approximation Errorto Principal Components". [12] Wikipedia, the free encyclopedia Available: http://en.wikipedia.org/wiki/Sickle-cell_disease [13] Artificial Neural Networks – A neural network tutorial Available: http://www.learnartificialneuralnetworks.com/ http://www.doc.ic.ac.uk/~nd/surprise_96/journal/vol4/cs11/report.html#Feed-forward%20networks

Source: http://duty9347.info/projects/Variable%20Selection.pdf

Untitled document

VIDENSKAB OG PRAKSIS | Retningslinjer for behandling af overvægt/fedme anno 2006 STATUSARTIKEL Overlæge Ole Lander Svendsen, overlæge Søren Toubro, øgning af fysisk aktivitet og motion vanskelig at gennemføre læge Jens Meldgaard Bruun, læge Jens Peder Linnet & for fede personer, og øgningen i sig selv fremkalder ofte kun et beskedent vægttab. Hvis man kombinerer øget fysi

kockavilaga.fw.hu

reinkarnáció - ASZTRÁLUTAZÁS - új energiaÖSSZEFÜGGÉSEK I. - H1N1 és még más is amivel az egészségünket manipulálják Hozzáadta: gyury2009. August 10. Monday 09:07Utolsó frissités 2010. September 28. Tuesday 06:53 Érdemes visszanézni mert napi aktualitásokkal folyamatosan bõvítem! 2010 SZEPTEMBERI frissítés: A lassú halál, avagy fizess, hogy hamarább meghalhass! Mott�