(The springer series on challenges in machine learning) isabelle guyon, alexander statnikov, berna bakir batu cause effect pairs in machine learning springer international publishing (2019)

378 136 0
(The springer series on challenges in machine learning) isabelle guyon, alexander statnikov, berna bakir batu   cause effect pairs in machine learning springer international publishing (2019)

Đang tải... (xem toàn văn)

Tài liệu hạn chế xem trước, để xem đầy đủ mời bạn chọn Tải xuống

Thông tin tài liệu

This book presents groundbreaking advances in the domain of causal structure learning. The problem of distinguishing cause from effect (“Does altitude cause a change in atmospheric pressure, or vice versa?”) is here cast as a binary classification problem, to be tackled by machine learning algorithms. Based on the results of the ChaLearn CauseEffect Pairs Challenge, this book reveals that the joint distribution of two variables can be scrutinized by machine learning algorithms to reveal the possible existence of a “causal mechanism”, in the sense that the values of one variable may have been generated from the values of the other. This book provides both tutorial material on the stateoftheart on causeeffect pairs and exposes the reader to more advanced material, with a collection of selected papers. Supplemental material includes videos, slides, and code which can be found on the workshop website. Discovering causal relationships from observational data will become increasingly important in data science with the increasing amount of available data, as a means of detecting potential triggers in epidemiology, social sciences, economy, biology, medicine, and other sciences.

The Springer Series on Challenges in Machine Learning Isabelle Guyon Alexander Statnikov Berna Bakir Batu Editors Cause Effect Pairs in Machine Learning The Springer Series on Challenges in Machine Learning Series editors Hugo Jair Escalante, Astrofisica Optica y Electronica, INAOE, Puebla, Mexico Isabelle Guyon, ChaLearn, Berkeley, CA, USA Sergio Escalera , University of Barcelona, Barcelona, Spain The books in this innovative series collect papers written in the context of successful competitions in machine learning They also include analyses of the challenges, tutorial material, dataset descriptions, and pointers to data and software Together with the websites of the challenge competitions, they offer a complete teaching toolkit and a valuable resource for engineers and scientists More information about this series at http://www.springer.com/series/15602 Isabelle Guyon • Alexander Statnikov Berna Bakir Batu Editors Cause Effect Pairs in Machine Learning 123 Editors Isabelle Guyon Team TAU - CNRS, INRIA Université Paris Sud Université Paris Saclay Orsay France Alexander Statnikov SoFi San Francisco, CA, USA ChaLearn, Berkeley CA, USA Berna Bakir Batu University of Paris-Sud Paris-Saclay, Paris, France ISSN 2520-131X ISSN 2520-1328 (electronic) The Springer Series on Challenges in Machine Learning ISBN 978-3-030-21809-6 ISBN 978-3-030-21810-2 (eBook) https://doi.org/10.1007/978-3-030-21810-2 © Springer Nature Switzerland AG 2019 This work is subject to copyright All rights are reserved by the Publisher, whether the whole or part of the material is concerned, specifically the rights of translation, reprinting, reuse of illustrations, recitation, broadcasting, reproduction on microfilms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilar methodology now known or hereafter developed The use of general descriptive names, registered names, trademarks, service marks, etc in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant protective laws and regulations and therefore free for general use The publisher, the authors, and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication Neither the publisher nor the authors or the editors give a warranty, express or implied, with respect to the material contained herein or for any errors or omissions that may have been made The publisher remains neutral with regard to jurisdictional claims in published maps and institutional affiliations This Springer imprint is published by the registered company Springer Nature Switzerland AG The registered company address is: Gewerbestrasse 11, 6330 Cham, Switzerland Foreword The problem of distinguishing cause from effect caught my attention, thanks to the ChaLearn Cause-Effect Pairs Challenge organized by Isabelle Guyon and her collaborators in 2013 The seminal contribution of this competition was casting the cause-effect problem (“Does altitude cause a change in atmospheric pressure, or vice versa?”) as a binary classification problem, to be tackled by machine learning algorithms By having access to enough pairs of variables labeled with their causal relation, participants designed distributional features and algorithms able to reveal “causal footprints” from observational data This was a striking realization: Had we discovered some sort of “lost causal signal” lurking in data so far ignored in machine learning practice? Although limited in scope, the cause-effect problem sparked significant interest in the machine learning community The use of machine learning techniques to discover causality synergized these two research areas, which historically struggled to get along, and while the cause-effect problem exemplified “machine learning helping causality,” we are now facing the pressing need for having “causality help machine learning.” Indeed, current machine learning models are untrustworthy when dealing with data obtained under test conditions (or interventions) that differ from those seen during training Examples of these problematic situations include domain adaptation, learning under multiple environments, reinforcement learning, and adversarial learning Fortunately, the long sought-after partnership between machine learning and causality continues to forge slowly but steadily, as can be seen from the bar graph below illustrating the frequency of submissions related to causality at the NeurIPS conference (a premier machine learning conference) v vi Foreword NeurIPS titles containing “caus” 15 10 1987 1988 1989 1990 1991 1992 1993 1994 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 This book is a great reference for those interested in the cause-effect problem Chapter by Dominik Janzing is an excellent motivation that borrows ideas and intuitions matured over a decade of expertise Chapter by Isabelle Guyon deepens into the conundrum of evaluating causal hypotheses from observational data Chapters and 4, led by Olivier Goudet and Diviyan Kalainathan, are fantastic surveys on cause-effect methods, divided into generative and discriminative models, respectively The first part of this book closes with two important extensions of the cause-effect problem: Nicolas Doremus et al discuss time series in Chap 5, while Frederick Eberhardt explores the multivariable case in Chap The second part of the book, Selected Readings, discusses the results of the cause-effect pairs competitions (Chap 6), as well as a selection of algorithms to address this problem (Chaps 8–14) I believe that the robustness and invariance properties of causation will be key to remove the elephant from the room (the “identically and independently distributed” assumption) and move towards a new generation of causal machine learning algorithms This quest begins in the following pages Paris, France April 2019 David Lopez-Paz Preface Discovering causal relationships from observational data will become increasingly important in data science with the increasing amount of available data, as a means of detecting the potential triggers in epidemiology, social sciences, economy, biology, medicine, and other sciences Although causal hypotheses made from observations need further evaluation by experiments, they are still very important to reduce costs and burden by guiding large-scale experimental designs In 2013, we conducted a challenge on the problem of cause-effect pairs, which pushed the state-of-the-art considerably, revealing that the joint distribution of two variables can be scrutinized by machine learning algorithms to reveal the possible existence of a “causal mechanism,” in the sense that the values of one variable may have been generated from the values of the other This milestone event has stimulated a lot of research in this area for the past few years The ambition of this book is to provide both tutorial material on the state-of-the-art on cause-effect pairs and expose the reader to more advanced material, with a collection of selected papers, some of which are reprinted from the JMLR special topic on “large-scale experimental design and the inference of causal mechanisms.” Supplemental material includes videos, slides, and code that can be found on the workshop website In the first part of this book, six tutorial chapters are provided In Chap 1, an introduction to the cause-effect problem is given for the simplest but nontrivial case where the causal relationships are predicted from the observations of only two variables In this chapter, the reader gains a better understanding of the causal discovery problem as well as an intuition about its complexity Common methods and recent achievements are explored besides pointing out some misconceptions In Chap 2, the benchmarking problem of causal inference from observational data is discussed, and a methodology is provided In this chapter, the focus is the methods that produce a coefficient, called causation coefficient, that is used to decide direction of causal relationship By this way, the cause-effect problem becomes a usual classification problem, which can be evaluated by classification accuracy metrics A new notion of “identifiability,” which defines a particular data generation process by bounding type I and type II errors, is also proposed as a validation metric In Chap 3, the reader dives into algorithms that solve the cause-effect pair vii viii Preface problem by modeling the data generating process Such methods allow gaining not only a clue about the causal direction but also information about the mechanism itself, making causal discovery less of a black box decision process In Chap 4, discriminative algorithms are explored A contrario, such algorithms not attempt to reverse engineer the data generating process; they merely classify the empirical joint distribution of two variables X and Y (a scatter plot) as being and X cause Y or a Y cause X (or neither) While throughout Chaps 1–4, the emphasis is on crosssectional data (without explicit reference to time), in Chap 5, the authors investigate the causal discovery methods for time series One interesting contribution compared to the older approaches of Granger causality is the introduction of instantaneous causal relationships Finally, in Chap 6, the authors present research going beyond the treatment of two variables, including triplet and more This put in perspective the effort of the rest of the book, which focuses on two variables only, and reminds the reader of the limitations of the analyses limited to two variables, particularly when it comes to the treatment of the problem of confounding In the second part of the book, we compile articles related to the 2013 ChaLearn Cause-Effect Pairs challenges1 including articles that were part of the proceedings of the NIPS 2013 workshop on causality and the JMLR special topic on large-scale experimental design and the inference of causal mechanisms The cause-effect pairs challenge, described in Chap 7, provided a new point of view to the problem of causal modeling by reformulating it as a classification problem Its purpose was attributing causes to effects by defining a causation coefficient between variables such that positive and negative large values indicate causal relation in one or the other direction, whereas the values close to zero indicates no causal relationship The participants were provided with hundreds of pairs from different domains, such as ecology, medicine, climatology, engineering, etc., as well as artificial data for all of which the ground truth is known (causally related, dependent but not causally related or independent pairs) Because of problem setting, the methods based on conditional independence tests were not applicable Inspired by the starting kit provided by Ben Hamner at Kaggle, the majority of the participants engineered features of the joint empirical distribution of pairs of variables then applied standard classification methods, such as gradient boosting From Chap 8, the approaches used by the top participants of the challenges and their results are given in the second part as selected readings In Chap 8, the authors perform an extensive comparison of methods on data of the challenge, including a method that they propose based on Gaussianity measures that fare well The winner of the challenge, the team ProtoML (Chap 10), proposes a feature extraction method which takes extensive number of algorithms and functions as an input parameters to build many models and extracts features by computing their goodness of fit in many different ways The method achieves 0.84 accuracy for artificial data and 0.70 accuracy for real data If the features are extracted without human intervention, the method is prone to create redundant features It http://www.causality.inf.ethz.ch/cause-effect.php Preface ix also increases computational time since about 9000 features are calculated from the input parameters There is a trade-off between computational time/complexity and automated feature extraction The second ranked participant, jarfo (Chap 12), concentrates on conditional distributions of pairs of random variables, without enforcing a strict independence between the cause and the conditional distribution of effect He defines a Conditional Distribution Score (CDS) measuring variability, based on the assumption that for a given mechanism, there should be a similarity among the conditional distributions of effect, regardless of causal distributions Other features of jarfo are based on information theoretic measures (e.g., entropy, mutual information, etc.) and variability measures (e.g., standard deviation, skewness, kurtosis, etc.) The algorithm achieves 0.83 and 0.69 accuracy for artificial and real data, respectively It has comparable results with the algorithm proposed by the winner in terms of predictive performance, with a better run time It also performs better on novel data, based on post-challenge analyses we report in Chap The team HiDLoN, having the third place in the challenge (Chap 11), defines a causation coefficient as the difference in (estimated) probability of either causal direction They consider two binary classifiers using information theoretic features, each classifying one causal direction versus all other relations By this way, a score representing a causation coefficient can be defined by taking the difference of the probabilities for each sample to be belonging to a certain class Using one classifier for each causal direction makes possible to evaluate feature importance for each case Another participant, mouse, having fifth place, evaluates how features are ranked based on the variable types by using different subsets of training data (Chap 13) He defines 13 groups of features resulting in 211 features in total and determine their importance to estimate causal relation Polynomial regression and information theoretical features are the most important features for all cases; in particular polynomial regression is the best feature to predict causal direction when the type of variables is both numerical, whereas it is information theoretical features if the cause is categorical and the effect is numerical variables Similarly, the method proposed by Bontempi (Chap 9) defines features based on some statistical dependency, such as quantiles of marginal and conditional distributions and learn mapping from features to causal directions In addition to having only pairs of variables to predict their causal structure, He also extends his solution for nvariate distributions In this case, features are defined as a set of descriptors to define dependency between the variables, which are the elements of Markov blankets of two variables of interest Finally, the last chapter (Chap 14) provides a complementary perspective opening up to the treatment of more than two variables with a more conventional Markov blanket approach Berkeley, CA, USA San Francisco, CA, USA Paris, France January 2019 Isabelle Guyon Alexander Statnikov Berna Bakir Batu ... More information about this series at http://www .springer. com /series/ 15602 Isabelle Guyon • Alexander Statnikov Berna Bakir Batu Editors Cause Effect Pairs in Machine Learning 123 Editors Isabelle. .. janzind@amazon.com © Springer Nature Switzerland AG 2019 I Guyon et al (eds.), Cause Effect Pairs in Machine Learning, The Springer Series on Challenges in Machine Learning, https://doi.org/10.1007/978-3-030-21810-2_1... the cause- effect problem exemplified ? ?machine learning helping causality,” we are now facing the pressing need for having “causality help machine learning. ” Indeed, current machine learning models

Ngày đăng: 01/12/2020, 10:01

Từ khóa liên quan

Mục lục

  • Foreword

  • Preface

  • Acknowledgments

  • Contents

  • Contributors

  • Part I Fundamentals

    • 1 The Cause-Effect Problem: Motivation, Ideas, and Popular Misconceptions

      • 1.1 The Cause-Effect Problem: Notation and Introduction

      • 1.2 Why Looking at This ``Toy Problem''?

        • 1.2.1 Easier to Handle Than Detection of Confounders

        • 1.2.2 Falsifiable Causal Statements Are Needed

        • 1.2.3 Binary Classification Problems Admit Simple Benchmarking

        • 1.2.4 Relations to Recent Foundational Questions in Theoretical Physics

        • 1.2.5 Impact for General Machine Learning Tasks

        • 1.2.6 Solving a So-Called `Unsolvable' Problem

        • 1.3 Current Approaches

          • 1.3.1 Complexity of Marginal and Conditional

          • 1.3.2 Independent Mechanisms

          • 1.3.3 Relations Between Independence and Complexity

          • 1.3.4 Supervised Learning

          • 1.4 Human Intuition About the Cause-Effect Problem

            • 1.4.1 Examples Where Our Intuition Seems Right

            • 1.4.2 Be Aware of Too Simple Approaches: Some Misconceptions

            • 1.4.3 Abusing the Second Law: Superficial Entropy Arguments

            • 1.4.4 Comparing Only Conditionals

Tài liệu cùng người dùng

Tài liệu liên quan