Interpretable Algorithms for Regression: Theory and Applications
Mis à jour : mars 5
V. Margot, PhD Thesis, Sorbonne Université, October 2nd, 2020.
Abstract: This thesis was motivated by the desire to make an interpretable algorithm for regression analysis. First, we focused on the most common interpretable algorithms, i.e., rule-based algorithms. Unfortunately, the theoretical conditions on these algorithms generate a loss of interpretability when the dimension increases. Starting from the principle that the fewer the rules, the better the interpretability, we have introduced a new family of algorithms based on a small number of so-called significant rules. This principle has been translated into a measure of interpretability allowing the comparison between algorithms generating rules.
Then, we have introduced a new method to generate interpretable estimator of the regression function, based on data-dependent coverings. The goal is to extract from the data a covering of the explanatory variables space instead of a partition. Each element of the covering is labeled as significant or insignificant. Significant elements are used to describe the model and insignificant elements are used to obtain a covering. Then, apartition is made from the covering to define an estimator. This estimator predicts the empirical conditional expectation on the cells of the partition. Thus, these estimators have the same writing as those resulting from data-dependent partitioning algorithms. We have proven the consistency of such estimators without the cell shrinking condition that appears in the literature, thus reducing the number of elements in the covering.
From this theory, we have developed two algorithms. The first, Covering Algorithm (CA), is an algorithm that makes Random Forests (RF) interpretable, an algorithm seen as a black box that cannot be interpreted. The algorithm extracts from the rules obtained by RF a covering of significant and insignificant rules. The second, Rule Induction Covering Estimator (RICE), designs only significant and insignificant rules unlike (CA). RICE selects a sparse set to form a covering. The significant rules are used to interpret the model and the covering makes it possible to define an estimator of the regression function which, under certain conditions, is consistent. Finally, an open-source version of the code is available on GitHub.