Soutenance de thèse Behrooz Razeghi
M. Behrooz Razeghi soutiendra en anglais, en vue de l'obtention du grade de docteur ès sciences, mention informatique, sa thèse intitulée:
Bottlenecks CLUB: Unifying Information-Theoretic Trade-offs Among Complexity, Leakage, and Utility
Date: Mercredi 23 novembre 2022 à 14h00
Lieu: Battelle bâtiment B (HEG), salle B 2.15
- Prof Svyatoslav (Slava) Voloshynovskiy, directeur de thèse
- Prof Francois Fleuret, Department of Computer Science, University of Geneva, Switzerland
- Prof Teddy Furon, Institut national de recherche en informatique et en automatique, INRIA Rennes
- Prof Bernhard C. Geiger, Know-Center GmbH, Graz, Austria
- Prof Deniz Gündüz, Department of Electrical and Electronic Engineering, Imperial College London, UK
- Prof Sébastien Marcel, Idiap Research Institute; University of Lausanne, Switzerland
Bottleneck problems are an important class of optimization problems that have recently gained increasing attention in the domain of machine learning and information theory. They are widely used in generative models, fair machine learning algorithms, design of privacy-assuring mechanisms, and appear as information-theoretic performance bounds in various multi-user communication problems. In this dissertation, we propose a general family of optimization problems, termed as complexity-leakage-utility bottleneck (CLUB) model, which (i) provides a unified theoretical framework that generalizes most of the state-of-the-art literature for the information-theoretic privacy models, (ii) establishes a new interpretation of the popular generative and discriminative models, (iii) constructs new insights to the generative compression models, and (iv) can be used in the fair generative models.
We first formulate the CLUB model as a complexity-constrained privacy-utility optimization problem. We then connect it with the closely related bottleneck problems, namely information bottleneck (IB), privacy funnel (PF), deterministic IB (DIB), conditional entropy bottleneck (CEB), and conditional PF (CPF). We show that the CLUB model generalizes all these problems as well as most other information-theoretic privacy models. Then, we construct the deep variational CLUB (DVCLUB) models by employing neural networks to parameterize variational approximations of the associated information quantities. Building upon these information quantities, we present unified objectives of the supervised and unsupervised DVCLUB models. Leveraging the DVCLUB model in an unsupervised setup, we then connect it with state-of-the-art generative models, such as variational auto-encoders (VAEs), generative adversarial networks (GANs), as well as the Wasserstein GAN (WGAN), Wasserstein auto-encoder (WAE), and adversarial auto-encoder (AAE) models through the optimal transport (OT) problem. We then show that the DVCLUB model can also be used in fair representation learning problems, where the goal is to mitigate the undesired bias during the training phase of a machine learning model. We conduct extensive quantitative experiments on colored-MNIST and CelebA datasets, with a public implementation available, to evaluate and analyze the CLUB model.
Focusing on the finite alphabets and considering local information geometry analysis, we develop the notion of perfect obfuscation based on χ2-divergence and Kullback–Leibler (KL) divergence in the Euclidean information space. Under this analysis, we establish the necessary and sufficient condition to obtain representation Z of the original data X that maximizes the mutual information between utility attribute U and released representation Z, while simultaneously revealing no information about sensitive attribute S. We decompose statistical dependence between random variables U, S, X and Z by decomposing the corresponding mutual information I(X; Z), I(U; Z), and I(S; Z) into orthogonal modes.
We also propose a new computationally efficient privacy-assuring mechanism, namely Sparse Coding with Ambiguation (SCA), and address various characterizations and applications for our model. The SCA is a generalization of randomization techniques that integrates ‘sparse lossy coding’ with ‘ambiguization’. The idea of ambiguation is to add (pseudo) random noise to the orthogonal complement, i.e., non-informative components of the sparse code.
Using the SCA mechanism we introduce two practical identification frameworks. We then propose a practical framework to address the problem of privacy-aware image sharing in large-scale setups. We, therefore, encode images, such that, on one hand, representations are stored in the public domain without paying the huge cost of privacy protection, but ambiguated and hence leaking no discernible content from the images, unless a combinatorially-expensive guessing mechanism is available for the attacker. On the other hand, authorized users are provided with very compact keys that can easily be kept secure. This can be used to disambiguate and reconstruct faithfully the corresponding access-granted images. We achieve this with a convolutional autoencoder of our design, where feature maps are passed independently through sparsifying transformations, pro- viding multiple compact codes, each responsible for reconstructing different attributes of the image. The framework is tested on a large-scale database of images with a public implementation available.