Feature selection
As a result of our work with biological problems and text mining we have been
considerably exposed to the problem of high dimensionality and small sample
size. We have extensively studied the problem of feature selection on this type
of problems and have developed a number of algorithms that address some of the
limitations of the existing algorithms. For example we have proposed a variant
of the well known SVMRFE algorithm which is based not on a linear kernel but on
a kernel of feature ratios, as a result feature selection with that kernel is
less sensitive to redundant and noisy information typically found in biological
problems. More recently we have proposed a variant of the standard SVM
algorithm that optimizes the actual error bound which is based on the radius
and margin and not just the margin, this new variant incorporates an explicit
feature weighting mechanism which together with an l1 norm constraint on the
feature weights produces relatively sparse feature sets, making it thus a
feature selection method. In addition we have started exploring metric learning
methods and developed a number of different approaches for learning the
neighborhood in metric learning problems and for local metric learning. The
latter achieves excellent predictive performance and scales very well to
problems with tens of thousands of instances and thousands of attributes,
unlike existing metric learning methods that have poor scaling properties. As a
result of our work on metric learning and SVM we have been able to uncover some
quite interesting and unknown connections between large margin metric learning
and support vector machines, connections which have
the potential to produce new learning methods that build on the strengths of
both. Finally in a broader context we have also examined the problem of stability of feature selection.
Relevant publications:
Huyen Do and Alexandros Kalousis. Convex formulations of radius-margin based Support Vector Machines. ICML 2013,
(pdf),
(appendix).
Huyen Do, Alexandros Kalousis and Jun Wang and Adam Woznica.
A metric learning perspective of SVM: on the relation of SVM and LMNN, AISTATS
2012, (pdf).
Huyen Do, Alexandros Kalousis and Melanie Hilario. Feature weighting using margin and radius based error bound optimization in SVMs, ECML
2009, (pdf).
Julien Prados, Alexandros Kalousis and Melanie Hilario: Feature Selection with the logRatio Kernel. SDM 2008: 177-187,
(pdf)
Melanie Hilario and Alexandros Kalousis: Approaches to dimensionality reduction in proteomic biomarker studies. Briefings in Bioinformatics 9(2): 102-118 (2008)