A Unified Framework for Relational Distance Based Learning Founded on Relational Algebra
National Framework
Funding by the Swiss National Science Foundation.
Abstract
We propose a novel unifying framework for relational distance based learning where learning examples are stored in
a relational database. The framework is tailored to and fully supports relational algebra representations.
We exploit the notion of foreign keys to perform the leap from a flat attribute-value
representation to a structured representation that underlines relational learning.
We define a new attribute type which builds on the notion of foreign keys and models
sets of tuples. We exploit this enhanced representation to perform distance based learning over relational datasets.
We define relational distances whose building blocks are distances between tuples of relations
and distances between sets. Unlike all existing distance-based relational systems our
framework is not limited to a single relational distance measure, instead it offers a variety
among which the user can choose or even combine the ones that best match the requirements of the problem at hand.
The framework gives the possibility to use different types of set distances for different types of
sets of tuples. For example the distance of sets defined on tuples from relation X could be computed
using set distance A while for another relation Y we could use the set distance B.
Moreover to what amounts to model selection the most appropriate relational distance measure can be
automatically selected via means of an inner cross validation. We study the formal properties
of the relational distance and show how these are related to the properties
of its constituent parts. We evaluate our framework in the context of the classification task.
We perform a series of experiments on a number of well known classification datasets
used in relational learning, analyze the relative performance of the different relational
distance measures induced, and try to highlight their similarities and differences. The experiments
show that our framework competes favorably and in some cases is better than state of the art systems
used in relational learning. We believe that the fact that the framework is completely defined
on the basis of relational algebra and relational algebra operators has the potential to make it, and
thus relational learning, much more accessible to the large community of relational databases.
On a parallel dimension we defined kernel functions over relational schemata, which are instances of R-Convolution
kernels, and use them also as basis for relational distance-based learning. Here also the building blocks of the finally
constructed kernel are kernels over tuples and kernels over sets. The resulting kernels on the complex structures can
be considered as being defined over typed and unordered trees where elementary kernels are used to compute the graded
similarity between nodes.
One limitation of the relational algebra representation is that it does not naturally handle lists of objects.
To overcome it we extend its representation power by including the ability to define list of objects and proceed
defining distances on lists of complex objects.
Description of work
- Build the general structure allowing to input and manipulate relational datasets
- Implement the relational distance learner, KAL05, the defined set distances include:
- Single Linkage
- Complete Linkage
- Average Linkage
- Sum of minimum distances
- RIBL distance
- Hausdorff
- A modified version of Tanimoto distance
- Surjections
- Linkings
- Fair Surjections
- Matchings
- Implement the kernel-based relational distance based learner, WOZ05a
- Extend the representational language so that it can handle lists of complex objects
- Define a distance on lists of complex objects based on edit distance WOZ05b
Publications
WOZ05a - Adam Woznica, Alexandros Kalousis and Melanie Hilario. Kernels over relational algebra structures. In Proceedings
of the ninth Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD-2005). Springer.
pdf.
WOZ05b - Adam Woznica, Alexandros Kalousis and Melanie Hilario. Distance-based learning over
extended relational algebra structures. In Proceedings
of the 15th International Conference on Inductive Logic Programming (ILP-2005) (Late Breaking Papers).
pdf.
KAL05 - Alexandros Kalousis, Adam Woznica, and Melanie Hilario. A unifying framework for relational
distance-based learning founded on relational algebra. (Under preparation).
Local contact:
Alexandros.Kalousis@cui.unige.ch