Obtaining insights in the tremendous amount of data in which the Big Data era has brought us, requires to develop specific tools, that are not only summaries of data through classical charts and tables, but that allow full navigation and browsing of a dataset. The proper modeling of databases can enable such navigation and we propose in this Thesis a methodology to achieve the browsing of an information space, through its different facets. To achieve the modeling of such an information space, co-occurrences of data instances are built referring to a common reference type. Historically, the co-occurrences were seen as pairwise relationships and developed as such. The move to hypergraphs enables the possibility to take into account the multi-adicity of the relationships, and to have a representation through the incident graph that simplifies deeply its 2-section.
Nonetheless, representing large hypergraphs calls for a coarsening of the information by having insights on important vertices and hyperedges. One classical way to achieve it is to use a diffusion process over the network. Achieving it using an incident matrix is feasible but brings us to a pitfall, as it brings us back to a pairwise relationship. Making proper diffusion requires a tensor approach. This is well known for uniform hypergraphs, where all the hyperedges have same cardinality, but still very challenging for general hypergraphs. After redefining the concept of adjacency in general hypergraphs, we propose a first e-adjacency tensor, that involves a Hypergraph Uniformisation Process and a Polynomial Homogenization Process. This is achieved by uniformisation of the original hypergraph by decomposing it into layers—each of them containing a uniform hypergraph—and filling each layer with additional special vertices and merging them together. This process requires to have as many additional vertices as the number of layers.
In order to reduce the number of special vertices, we need to have the possibility of repeating a vertex when filling, which is not possible with hyperedges as they are sets. We need multisets. It, therefore, requires a new mathematical structure, that we have introduced and called hyper-bag-graph—hb-graph for short—, which is a family of multisets of a given universe.
Co-occurrences can also have repetitions or individual weighting of their vertices inside a given co-occurrence and hb-graphs fit to handle it. Hence, we introduce a hb-graph framework for co-occurrence networks. We then work on diffusion on such structures, using, in a first step, a matrix approach. Aggregating the ranking of vertices and hb-edges of this diffusion on each of the facet of the information space is achieved by using a multi-diffusion scheme. Since different facets might have different focus of interest, we introduce a biased diffusion that enables a tuning on the point of emphasis on the feature we are interested in.
Finally, coming back to e-adjacency tensor, we propose three e-adjacency tensors of hb-graphs, that are based on different ways of filling the hb-edges. The m-uniformisation that is achieved is evaluated and compared to the ones achieved by hb-edge splitting, concluding that any m-uniformisation process has an influence on the exchange-based diffusion that we propose. Hence, we conclude that diffusion using the tensor approach must be done in an informed manner to account for this diffusion change. We finally discuss different possible of achieving it and present a new Laplacian that can help to achieve it.