Feature selection

As a result of our work with biological problems and text mining we have been considerably exposed to the problem of high dimensionality and small sample size. We have extensively studied the problem of feature selection on this type of problems and have developed a number of algorithms that address some of the limitations of the existing algorithms. For example we have proposed a variant of the well known SVMRFE algorithm which is based not on a linear kernel but on a kernel of feature ratios, as a result feature selection with that kernel is less sensitive to redundant and noisy information typically found in biological problems. More recently we have proposed a variant of the standard SVM algorithm that optimizes the actual error bound which is based on the radius and margin and not just the margin, this new variant incorporates an explicit feature weighting mechanism which together with an l1 norm constraint on the feature weights produces relatively sparse feature sets, making it thus a feature selection method. In addition we have started exploring metric learning methods and developed a number of different approaches for learning the neighborhood in metric learning problems and for local metric learning. The latter achieves excellent predictive performance and scales very well to problems with tens of thousands of instances and thousands of attributes, unlike existing metric learning methods that have poor scaling properties. As a result of our work on metric learning and SVM we have been able to uncover some quite interesting and unknown connections between large margin metric learning and support vector machines, connections which have the potential to produce new learning methods that build on the strengths of both. Finally in a broader context we have also examined the problem of stability of feature selection.

Relevant publications:
Huyen Do and Alexandros Kalousis. Convex formulations of radius-margin based Support Vector Machines. ICML 2013, (pdf), (appendix).
Huyen Do, Alexandros Kalousis and Jun Wang and Adam Woznica. A metric learning perspective of SVM: on the relation of SVM and LMNN, AISTATS 2012, (pdf).
Huyen Do, Alexandros Kalousis and Melanie Hilario. Feature weighting using margin and radius based error bound optimization in SVMs, ECML 2009, (pdf
Julien Prados, Alexandros Kalousis and Melanie Hilario: Feature Selection with the logRatio Kernel. SDM 2008: 177-187, (pdf)
Melanie Hilario and Alexandros Kalousis: Approaches to dimensionality reduction in proteomic biomarker studies. Briefings in Bioinformatics 9(2): 102-118 (2008)