Main Research Themes



Current research activities focus on machine learning and data mining, natural language processing and text mining, information retrieval and knowledge management.

Machine Learning and Data Mining

AI Lab members have a long standing interest in assessing the relative merits and areas of expertise of different machine learning paradigms. The need to automate algorithm and model selection led to investigations on meta-learning conducted in the European METAL Project. Among the major issues explored by the AI Lab in this project are:

Complementary to the problem of model selection is that of model combination. AI Lab members have contributed actively to research in this area, either through the design of modular and interpretable neural networks or through the combination of local experts based on multilayer perceptrons. In addition, intensive research efforts have been devoted to the integration of symbolic (rule- and case-based) and numerical (neural network) methods, applied to medical diagnosis as well as process parameterization in the steelmaking and automobile industries (European MIX Project). Alternatively to building a broader space of models, one can strive to build a broader feature space by applying kernels to the original features: this is the underlying idea of support vector machines (SVMs). Issues currently investigated in the AI Lab are the choice of kernels for SVMs and the development of novel kernels allowing for the integration of prior domain knowledge. Another research direction is upgrading learning approaches. There is a need to go beyond feature vectors for more expressive representations as well as more powerful algorithms for multi-relational learning. The AI Lab has thus recently launched a new FNRS project aimed at overcoming the limitations of propositional learners in two ways: first, by developing first-order learners within the inductive logic programming paradigm; second, by extending propositional learners to allow them to process complex variables such as those representing sets, intervals, or probability distributions.

Natural Language Processing and Text Mining

Knowledge discovery from unstructured data such as text requires the combination of techniques from machine learning, information retrieval, and natural language processing (NLP). The AI Lab has a variety of activities in this area:

Scientific and Engineering Applications

Development of a generic text mining tool for the biological domain. The BioMinT tool searches the online scientific literature and automatically extracts specifically targeted information concerning genes and proteins. This information is used to fill templates in order to (1) accelerate, by partially automating, the annotation and update of genomics and proteomics databases; and (2) generate readable and structured reports in response to queries from biological researchers and practitioners. This work is part of a European project involving the AI Lab, the Swiss Institute of Bioinformatics, and 4 other European partners.
Application of data mining techniques to proteomics research issues such as the detection of biomarkers for diagnosis from protein mass spectra, characterization of complex protein families, prediction of post-translational modifications and their impact on protein function.
The two types of biomedical text repositories used are clinical records produced at the Hospital and biomedical digital libraries (MedLine). Current research projects include named-entity recognition in patient records and text mining for prediction of adverse drug reactions.
Examples of application tasks in this domain include detecting nosocomial infections or predicting the length of hospitalization of an incoming patient or identifying co-occurring diagnoses.
CERN's new accelerator is scheduled to enter operations in 2008 and remain in service for around 25 years thereafter. Construction of the LHC entails a major knowledge management problem-that of capturing and storing all knowledge concerning the LHC and bringing it to builders, suppliers, maintainers and users as the need arises. The AI Lab works closely with CERN to develop an information retrieval system based on the combination of machine learning methods with the elaboration and use of a specialized ontology that integrates prior knowledge about the LHC.


Last update : 12/15/2003 by Melanie Hilario