Learning with complex representations

Model stability, as we define it, studies the sensitivity of the models that feature selection and learning algorithms produce to variations of the training data used for learning, focusing on high dimensional spaces. We have developed a framework for the quantification and estimation of stability; at its core we have similarity measures defined over the results of the Data Mining process. The motivation behind that work was to provide a quantifiable measure of the model stability of different feature selection and learning algorithms to the domain experts. Domain experts tend to have less confidence on algorithms whose models change radically with changes in the training sets. To our knowledge this is the first time that such a framework has been proposed. Currently we are looking for ways to incorporate the results of the stability analysis in order to improve the results of the learning and feature selection processes, both in terms of stability and predictive performance. We applied stability analysis in a number of different problems, such as microarrays and mass spectrometry classification and text mining. Along the same line of research we also examine the definition of distances and similarity measures among classification models, focusing for the moment on decision trees. A more ambitious goal is to define meta-mining operators over the results of the data mining process, i.e. over the learned models whether these are feature models, classification models. Then one can go one and perform standard data analysis but this time over instances which in fact are learned models.

Relevant publications:
Adam Woznica, Phong Nguyen, Alexandros Kalousis: Model mining for robust feature selection. KDD 2012: 913-921 ( pdf).
Irene Ntoutsi, Alexandros Kalousis, Yannis Theodoridis: A general framework for estimating similarity of datasets and decision trees: exploring semantic similarity of decision trees. SDM 2008: 810-821, (pdf).
Alexandros Kalousis, Julien Prados, Melanie Hilario: Stability of feature selection algorithms: a study on high-dimensional spaces. Knowl. Inf. Syst. 12(1): 95-116 (2007), (pdf).
Alexandros Kalousis, Julien Prados, Melanie Hilario: Stability of Feature Selection Algorithms. ICDM 2005: 218-225, (pdf).