A Unified Framework for Relational Distance Based Learning Founded on Relational Algebra


National Framework

Funding by the Swiss National Science Foundation.

Abstract

We propose a novel unifying framework for relational distance based learning where learning examples are stored in a relational database. The framework is tailored to and fully supports relational algebra representations. We exploit the notion of foreign keys to perform the leap from a flat attribute-value representation to a structured representation that underlines relational learning. We define a new attribute type which builds on the notion of foreign keys and models sets of tuples. We exploit this enhanced representation to perform distance based learning over relational datasets.

We define relational distances whose building blocks are distances between tuples of relations and distances between sets. Unlike all existing distance-based relational systems our framework is not limited to a single relational distance measure, instead it offers a variety among which the user can choose or even combine the ones that best match the requirements of the problem at hand. The framework gives the possibility to use different types of set distances for different types of sets of tuples. For example the distance of sets defined on tuples from relation X could be computed using set distance A while for another relation Y we could use the set distance B. Moreover to what amounts to model selection the most appropriate relational distance measure can be automatically selected via means of an inner cross validation. We study the formal properties of the relational distance and show how these are related to the properties of its constituent parts. We evaluate our framework in the context of the classification task. We perform a series of experiments on a number of well known classification datasets used in relational learning, analyze the relative performance of the different relational distance measures induced, and try to highlight their similarities and differences. The experiments show that our framework competes favorably and in some cases is better than state of the art systems used in relational learning. We believe that the fact that the framework is completely defined on the basis of relational algebra and relational algebra operators has the potential to make it, and thus relational learning, much more accessible to the large community of relational databases.

On a parallel dimension we defined kernel functions over relational schemata, which are instances of R-Convolution kernels, and use them also as basis for relational distance-based learning. Here also the building blocks of the finally constructed kernel are kernels over tuples and kernels over sets. The resulting kernels on the complex structures can be considered as being defined over typed and unordered trees where elementary kernels are used to compute the graded similarity between nodes.

One limitation of the relational algebra representation is that it does not naturally handle lists of objects. To overcome it we extend its representation power by including the ability to define list of objects and proceed defining distances on lists of complex objects.

Description of work

Publications

WOZ05a - Adam Woznica, Alexandros Kalousis and Melanie Hilario. Kernels over relational algebra structures. In Proceedings of the ninth Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD-2005). Springer. pdf.

WOZ05b - Adam Woznica, Alexandros Kalousis and Melanie Hilario. Distance-based learning over extended relational algebra structures. In Proceedings of the 15th International Conference on Inductive Logic Programming (ILP-2005) (Late Breaking Papers). pdf.

KAL05 - Alexandros Kalousis, Adam Woznica, and Melanie Hilario. A unifying framework for relational distance-based learning founded on relational algebra. (Under preparation).

Local contact: Alexandros.Kalousis@cui.unige.ch

Last update : 04/08/2005 by Alexandros Kalousis