Tabular explanations example ============================ Learning and explaining German Credit Dataset --------------------------------------------- .. code:: ipython3 import pandas as pd import numpy as np from sklearn import preprocessing from sklearn.ensemble import RandomForestClassifier from sklearn.model_selection import train_test_split from sklearn.linear_model import LogisticRegression from xailib.data_loaders.dataframe_loader import prepare_dataframe from xailib.explainers.lime_explainer import LimeXAITabularExplainer from xailib.explainers.lore_explainer import LoreTabularExplainer from xailib.explainers.shap_explainer_tab import ShapXAITabularExplainer from xailib.models.sklearn_classifier_wrapper import sklearn_classifier_wrapper Loading and preparation of data ------------------------------- We start by reading from a CSV file the dataset to analyze. The table is loaded by means of the ``DataFrame`` class from the ``pandas`` library. Among all the attributes of the table, we select the ``class_field`` column that contains the observed class for the corresponding row. .. code:: ipython3 source_file = 'datasets/german_credit.csv' class_field = 'default' # Load and transform dataset df = pd.read_csv(source_file, skipinitialspace=True, na_values='?', keep_default_na=True) After the data is loaded in memory, we need to extract metadata information to automatically handle the content withint the table. The method ``prepare_dataframe`` scans the table and extract the following information: \* ``df``: is a trasformed version of the original dataframe, where discrete attributes are transformed into numerical attributes by using one hot encoding strategy; \* ``feature_names``: is a list containint the names of the features after the transformation; \* ``class_values``: the list of all the possible values for the ``class_field`` column; \* ``numeric_columns``: a list of the original features that contain numeric (i.e. continuous) values; \* ``rdf``: the original dataframe, before the transformation; \* ``real_feature_names``: the list of the features of the dataframe before the transformation; \* ``features_map``: it is a dictionary pointing each feature to the original one before the transformation. .. code:: ipython3 df, feature_names, class_values, numeric_columns, rdf, real_feature_names, features_map = prepare_dataframe(df, class_field) Learning a Random Forest classfier ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ We train a RF classifier by using the ``sklearn`` library. We start by splitting the dataset into a train and test subsets. .. code:: ipython3 test_size = 0.3 random_state = 42 X_train, X_test, Y_train, Y_test = train_test_split(df[feature_names], df[class_field], test_size=test_size, random_state=random_state, stratify=df[class_field]) Then we train the model on the training set. Once the model has been learned, we use a wrapper class to get access to the model for ``XAI lib`` .. code:: ipython3 bb = RandomForestClassifier(n_estimators=20, random_state=random_state) bb.fit(X_train.values, Y_train.values) bbox = sklearn_classifier_wrapper(bb) Select a new instance to be classfied by the model and print the predicted class. .. code:: ipython3 inst = X_train.iloc[147].values print('Instance ',inst) print('True class ',Y_train.iloc[8]) print('Predicted class ',bb.predict(inst.reshape(1, -1))) .. parsed-literal:: Instance [ 15 975 2 3 25 2 1 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 1 0 0 0 1 0 0 0 0 1 1 0 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1] True class 0 Predicted class [0] Explaining the prediction ------------------------- We use the explanators of ``XAI lib`` to provide an explantion for the classified instance ``inst``. Every explainer of ``XAI lib`` takes in input the blackbox to be explained with the corresponding feature names, and a configuration object to initialize the explainer. SHAP explainer ~~~~~~~~~~~~~~ .. code:: ipython3 explainer = ShapXAITabularExplainer(bbox, feature_names) config = {'explainer' : 'tree', 'X_train' : X_train.iloc[0:100].values} explainer.fit(config) .. code:: ipython3 exp = explainer.explain(inst) # print(exp.exp) .. code:: ipython3 exp.plot_features_importance() LORE explainer ~~~~~~~~~~~~~~ .. code:: ipython3 explainer = LoreTabularExplainer(bbox) config = {'neigh_type':'rndgen', 'size':1000, 'ocr':0.1, 'ngen':10} explainer.fit(df, class_field, config) exp = explainer.explain(inst) print(exp) .. code:: ipython3 exp.plotRules() .. code:: ipython3 exp.plotCounterfactualRules() LIME explainer ~~~~~~~~~~~~~~ .. code:: ipython3 limeExplainer = LimeXAITabularExplainer(bbox) config = {'feature_selection': 'lasso_path'} limeExplainer.fit(df, class_field, config) lime_exp = limeExplainer.explain(inst) print(lime_exp.exp.as_list()) .. parsed-literal:: [('account_check_status=no checking account', -0.03792512128083548), ('duration_in_month', 0.03701527256562679), ('account_check_status=< 0 DM', 0.03144299031649348), ('savings=... < 100 DM', 0.020051934530021572), ('age', -0.019751080001761446), ('credit_history=critical account/ other credits existing (not at this bank)', -0.018970043296280513), ('other_installment_plans=none', -0.018869997928840695), ('other_installment_plans=bank', 0.017658677626390982), ('housing=own', -0.014948467979451343), ('credit_history=delay in paying off in the past', 0.012221985897781883)] .. code:: ipython3 # limeExplainer.plot_lime_values(lime_exp.as_list(), 5, 10) lime_exp.plot_features_importance() .. raw:: html .. raw:: html
Learning a different model -------------------------- Learning a Logistic Regressor ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ We train a Logistic Regression by using the ``sklearn`` library. We transform the dataset by using a ``Scaler`` to normalize all the attributes. .. code:: ipython3 scaler = preprocessing.StandardScaler().fit(X_train) X_scaled = scaler.transform(X_train) bb = LogisticRegression(C=1, penalty='l2') bb.fit(X_scaled, Y_train.values) # pass the model to the wrapper to use it in the XAI lib bbox = sklearn_classifier_wrapper(bb) .. code:: ipython3 # select a record to explain inst = X_scaled[182] print('Instance ',inst) print('Predicted class ',bb.predict(inst.reshape(1, -1))) .. parsed-literal:: Instance [ 2.27797454 3.35504085 0.94540357 1.07634233 0.04854891 -0.72456474 -0.43411405 1.65027399 -0.61477862 -0.25898489 -0.80681063 4.17385345 -0.6435382 -0.32533856 -1.03489416 -0.20412415 -0.22941573 -0.33068147 1.75885396 -0.34899122 -0.60155441 -0.15294382 -0.09298136 -0.46852129 -0.12038585 -0.08481889 -0.23623492 -1.21387736 -0.36174054 -0.24943031 2.15526362 -0.59715086 -0.45485883 -0.73610476 -0.43875307 4.23307441 -0.65242771 -0.23958675 -0.32533856 0.90192655 4.72581563 -0.2259448 -3.15238005 -0.54212562 -0.70181003 -0.63024248 2.30354212 -0.40586384 0.49329429 -0.23958675 2.88675135 -1.59227935 -0.46170508 2.46388049 -1.33747696 -0.13206764 -0.5 -1.21387736 1.21387736 -0.20412415 0.20412415] Predicted class [1] Explaining the prediction ------------------------- We use the same explainators as for the previous model. In this case, a few adjustments are necessary for the initialization of the explanators. For example, SHAP needs a specific configuration for the linear model we are using. ### SHAP Explainer .. code:: ipython3 explainer = ShapXAITabularExplainer(bbox, feature_names) config = {'explainer' : 'linear', 'X_train' : X_scaled[0:100], 'feature_pert' : 'interventional'} explainer.fit(config) .. code:: ipython3 exp = explainer.explain(inst) print(exp) .. parsed-literal::Because all the following conditions happen:
.. raw:: htmlage <= 20.726173400878906credit amount > -439.6443485021591purpose=retraining <= 0.11524588242173195duration in month > -1.9407005310058594purpose=furniture/equipment <= 0.18370826542377472foreign worker=no <= 0.7168410122394562purpose=domestic appliances <= 1.015466570854187savings=.. >= 1000 DM <= 0.7176859378814697purpose=(vacation - does not exist?) <= 0.4622504562139511credit history=critical account/ other credits existing (not at this bank) <= 0.9085964262485504
.. code:: ipython3 exp.plotCounterfactualRules() .. raw:: html .. raw:: html