Model stability, meta-mining
Model stability, as we define
it, studies the sensitivity of the models that feature selection and learning algorithms
produce to variations of the training data used for learning, focusing on high
dimensional spaces. We have developed a framework for the quantification and
estimation of stability; at its core we have similarity measures defined over
the results of the Data Mining process. The motivation behind that work was to
provide a quantifiable measure of the model stability of different feature
selection and learning algorithms to the domain experts. Domain experts tend to
have less confidence on algorithms whose models change radically with changes
in the training sets. To our knowledge this is the first time that such a
framework has been proposed. Currently we are looking for ways to incorporate
the results of the stability analysis in order to improve the results of the
learning and feature selection processes, both in terms of stability and predictive
performance. We applied stability analysis in a number of different problems, such
as microarrays and mass spectrometry classification and text mining. Along the
same line of research we also examine the definition of distances and similarity
measures among classification models, focusing for the moment on decision trees.
A more ambitious goal is to define meta-mining operators over the results of
the data mining process, i.e. over the learned models whether these are feature
models, classification models. Then one can go one and perform standard data
analysis but this time over instances which in fact are learned models.
Irene Ntoutsi, Alexandros
Kalousis, Yannis Theodoridis: A general framework for estimating similarity of
datasets and decision trees: exploring semantic similarity of decision trees.
SDM 2008: 810-821 (pdf).
Alexandros Kalousis, Julien
Prados, Melanie Hilario: Stability of feature selection algorithms: a study on
high-dimensional spaces. Knowl. Inf. Syst. 12(1): 95-116 (2007) (pdf).
Alexandros Kalousis, Julien
Prados, Melanie Hilario: Stability of Feature Selection Algorithms. ICDM 2005:
218-225 (pdf)