Tabular explanations example

Learning and explaining German Credit Dataset

import pandas as pd
import numpy as np

from sklearn import preprocessing
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression

from xailib.data_loaders.dataframe_loader import prepare_dataframe

from xailib.explainers.lime_explainer import LimeXAITabularExplainer
from xailib.explainers.lore_explainer import LoreTabularExplainer
from xailib.explainers.shap_explainer_tab import ShapXAITabularExplainer

from xailib.models.sklearn_classifier_wrapper import sklearn_classifier_wrapper

Loading and preparation of data

We start by reading from a CSV file the dataset to analyze. The table is loaded by means of the DataFrame class from the pandas library.

Among all the attributes of the table, we select the class_field column that contains the observed class for the corresponding row.

source_file = 'datasets/german_credit.csv'
class_field = 'default'
# Load and transform dataset
df = pd.read_csv(source_file, skipinitialspace=True, na_values='?', keep_default_na=True)

After the data is loaded in memory, we need to extract metadata information to automatically handle the content withint the table.

The method prepare_dataframe scans the table and extract the following information: * df: is a trasformed version of the original dataframe, where discrete attributes are transformed into numerical attributes by using one hot encoding strategy; * feature_names: is a list containint the names of the features after the transformation; * class_values: the list of all the possible values for the class_field column; * numeric_columns: a list of the original features that contain numeric (i.e. continuous) values; * rdf: the original dataframe, before the transformation; * real_feature_names: the list of the features of the dataframe before the transformation; * features_map: it is a dictionary pointing each feature to the original one before the transformation.

df, feature_names, class_values, numeric_columns, rdf, real_feature_names, features_map = prepare_dataframe(df, class_field)

Learning a Random Forest classfier

We train a RF classifier by using the sklearn library. We start by splitting the dataset into a train and test subsets.

test_size = 0.3
random_state = 42
X_train, X_test, Y_train, Y_test = train_test_split(df[feature_names], df[class_field],
                                                        test_size=test_size,
                                                        random_state=random_state,
                                                        stratify=df[class_field])

Then we train the model on the training set. Once the model has been learned, we use a wrapper class to get access to the model for XAI lib

bb = RandomForestClassifier(n_estimators=20, random_state=random_state)
bb.fit(X_train.values, Y_train.values)
bbox = sklearn_classifier_wrapper(bb)

Select a new instance to be classfied by the model and print the predicted class.

inst = X_train.iloc[147].values
print('Instance ',inst)
print('True class ',Y_train.iloc[8])
print('Predicted class ',bb.predict(inst.reshape(1, -1)))

Instance  [ 15 975   2   3  25   2   1   0   1   0   0   0   1   0   0   0   0   0
   0   0   0   0   0   1   0   0   0   1   0   0   0   0   0   1   0   0
   0   1   0   0   0   0   1   1   0   0   0   0   1   0   0   1   0   0
   1   0   0   1   0   0   1]
True class  0
Predicted class  [0]

Explaining the prediction

We use the explanators of XAI lib to provide an explantion for the classified instance inst. Every explainer of XAI lib takes in input the blackbox to be explained with the corresponding feature names, and a configuration object to initialize the explainer.

SHAP explainer

explainer = ShapXAITabularExplainer(bbox, feature_names)
config = {'explainer' : 'tree', 'X_train' : X_train.iloc[0:100].values}
explainer.fit(config)

exp = explainer.explain(inst)
# print(exp.exp)

exp.plot_features_importance()

LORE explainer

explainer = LoreTabularExplainer(bbox)
config = {'neigh_type':'rndgen', 'size':1000, 'ocr':0.1, 'ngen':10}
explainer.fit(df, class_field, config)
exp = explainer.explain(inst)
print(exp)

exp.plotRules()

exp.plotCounterfactualRules()

LIME explainer

limeExplainer = LimeXAITabularExplainer(bbox)
config = {'feature_selection': 'lasso_path'}
limeExplainer.fit(df, class_field, config)
lime_exp = limeExplainer.explain(inst)
print(lime_exp.exp.as_list())

[('account_check_status=no checking account', -0.03792512128083548), ('duration_in_month', 0.03701527256562679), ('account_check_status=< 0 DM', 0.03144299031649348), ('savings=... < 100 DM', 0.020051934530021572), ('age', -0.019751080001761446), ('credit_history=critical account/ other credits existing (not at this bank)', -0.018970043296280513), ('other_installment_plans=none', -0.018869997928840695), ('other_installment_plans=bank', 0.017658677626390982), ('housing=own', -0.014948467979451343), ('credit_history=delay in paying off in the past', 0.012221985897781883)]

# limeExplainer.plot_lime_values(lime_exp.as_list(), 5, 10)
lime_exp.plot_features_importance()

Learning a different model

Learning a Logistic Regressor

We train a Logistic Regression by using the sklearn library. We transform the dataset by using a Scaler to normalize all the attributes.

scaler = preprocessing.StandardScaler().fit(X_train)
X_scaled = scaler.transform(X_train)

bb = LogisticRegression(C=1, penalty='l2')
bb.fit(X_scaled, Y_train.values)
# pass the model to the wrapper to use it in the XAI lib
bbox = sklearn_classifier_wrapper(bb)

# select a record to explain
inst = X_scaled[182]
print('Instance ',inst)
print('Predicted class ',bb.predict(inst.reshape(1, -1)))

Instance  [ 2.27797454  3.35504085  0.94540357  1.07634233  0.04854891 -0.72456474
 -0.43411405  1.65027399 -0.61477862 -0.25898489 -0.80681063  4.17385345
 -0.6435382  -0.32533856 -1.03489416 -0.20412415 -0.22941573 -0.33068147
  1.75885396 -0.34899122 -0.60155441 -0.15294382 -0.09298136 -0.46852129
 -0.12038585 -0.08481889 -0.23623492 -1.21387736 -0.36174054 -0.24943031
  2.15526362 -0.59715086 -0.45485883 -0.73610476 -0.43875307  4.23307441
 -0.65242771 -0.23958675 -0.32533856  0.90192655  4.72581563 -0.2259448
 -3.15238005 -0.54212562 -0.70181003 -0.63024248  2.30354212 -0.40586384
  0.49329429 -0.23958675  2.88675135 -1.59227935 -0.46170508  2.46388049
 -1.33747696 -0.13206764 -0.5        -1.21387736  1.21387736 -0.20412415
  0.20412415]
Predicted class  [1]

Explaining the prediction

We use the same explainators as for the previous model. In this case, a few adjustments are necessary for the initialization of the explanators. For example, SHAP needs a specific configuration for the linear model we are using. ### SHAP Explainer

explainer = ShapXAITabularExplainer(bbox, feature_names)
config = {'explainer' : 'linear', 'X_train' : X_scaled[0:100], 'feature_pert' : 'interventional'}
explainer.fit(config)

exp = explainer.explain(inst)
print(exp)

<xailib.explainers.shap_explainer_tab.ShapXAITabularExplanation object at 0x12a72dac8>

exp.plot_features_importance()

LORE explainer

explainer = LoreTabularExplainer(bbox)
config = {'neigh_type':'geneticp', 'size':1000, 'ocr':0.1, 'ngen':10}
explainer.fit(df, class_field, config)
exp = explainer.explain(inst)
print(exp)

<xailib.explainers.lore_explainer.LoreTabularExplanation object at 0x12bc41a90>

exp.plotRules()

Why the predicted value for class default is 1 ?

Because all the following conditions happen:

age <= 20.726173400878906credit amount > -439.6443485021591purpose=retraining <= 0.11524588242173195duration in month > -1.9407005310058594purpose=furniture/equipment <= 0.18370826542377472foreign worker=no <= 0.7168410122394562purpose=domestic appliances <= 1.015466570854187savings=.. >= 1000 DM <= 0.7176859378814697purpose=(vacation - does not exist?) <= 0.4622504562139511credit history=critical account/ other credits existing (not at this bank) <= 0.9085964262485504

exp.plotCounterfactualRules()

The predicted value for class default is 1.

It would have been:

0 if the following condition holds

age <= 20.726173400878906credit amount <= -439.6443485021591

0 if the following condition holds

age > 20.726173400878906credit amount <= 26.468921303749084duration in month <= 5.795059680938721installment as income perc <= 4.603440999984741

LIME explainer

limeExplainer = LimeXAITabularExplainer(bbox)
config = {'feature_selection': 'lasso_path'}
limeExplainer.fit(df, class_field, config)
lime_exp = limeExplainer.explain(inst)
print(lime_exp.exp.as_list())

[('other_debtors=co-applicant', -1.3046177878918616e-09), ('credit_history=all credits at this bank paid back duly', -1.0114574629252053e-09), ('present_emp_since=unemployed', -8.87554096296626e-10), ('other_debtors=none', 7.43754044231906e-10), ('housing=for free', -4.4157786564097103e-10), ('property=unknown / no property', -3.275710719845092e-10), ('credit_amount', 3.271233788564153e-10), ('job=management/ self-employed/ highly qualified employee/ officer', -3.164190703926506e-10), ('housing=own', 2.8902027822084106e-10), ('savings=unknown/ no savings account', -2.604277452741881e-10), ('job=skilled employee / official', 2.3808188198617575e-10), ('foreign_worker=yes', 2.365347360238489e-10), ('telephone=none', 2.2048259721367863e-10), ('age', 2.171945479826713e-10), ('savings=... < 100 DM', 2.1116662177987812e-10), ('credits_this_bank', 1.9999632029038067e-10), ('credit_history=existing credits paid back duly till now', 1.9243622007776865e-10), ('people_under_maintenance', 1.902008911572941e-10), ('purpose=car (new)', -1.7104663723358493e-10), ('account_check_status=no checking account', 1.6584313433238958e-10), ('account_check_status=0 <= ... < 200 DM', -1.639544710042764e-10), ('credit_history=critical account/ other credits existing (not at this bank)', 1.317487567892989e-10), ('job=unskilled - resident', 1.307761159896724e-10), ('other_installment_plans=stores', 1.2347569776391545e-10), ('foreign_worker=no', 1.1825353902253505e-10), ('present_emp_since=1 <= ... < 4 years', 1.1478921168922655e-10), ('property=if not A121/A122 : car or other, not in attribute 6', 1.1222769011436428e-10), ('personal_status_sex=female : divorced/separated/married', 1.1002871894681165e-10), ('savings=100 <= ... < 500 DM', 1.0982251402773794e-10), ('purpose=domestic appliances', 1.0567984890752028e-10), ('present_res_since', 9.869484730455045e-11), ('account_check_status=>= 200 DM / salary assignments for at least 1 year', 9.721716212812873e-11), ('present_emp_since=.. >= 7 years', 9.327030468700815e-11), ('installment_as_income_perc', 9.192261925231111e-11), ('property=real estate', 9.180043418264463e-11), ('purpose=(vacation - does not exist?)', 8.974505020571898e-11), ('account_check_status=< 0 DM', 8.848004118893571e-11), ('purpose=retraining', 8.80910843922895e-11), ('purpose=education', 8.803520453193465e-11), ('purpose=business', 8.330599059469541e-11), ('housing=rent', 7.975475868460632e-11), ('property=if not A121 : building society savings agreement/ life insurance', 7.826524390749874e-11), ('other_debtors=guarantor', 7.385760952840171e-11), ('present_emp_since=... < 1 year ', 7.338094381227495e-11), ('duration_in_month', 6.689756440260244e-11), ('purpose=car (used)', 6.582965568284186e-11), ('job=unemployed/ unskilled - non-resident', 6.473736018584135e-11), ('present_emp_since=4 <= ... < 7 years', 6.230002403518189e-11), ('purpose=furniture/equipment', 5.974714318917145e-11), ('purpose=radio/television', 5.909852887925919e-11), ('credit_history=delay in paying off in the past', 5.620862803354922e-11), ('savings=500 <= ... < 1000 DM ', 5.582941358078461e-11), ('other_installment_plans=none', 5.501318386790144e-11), ('personal_status_sex=male : married/widowed', 5.500125372750834e-11), ('telephone=yes, registered under the customers name ', -5.495252929908006e-11), ('purpose=repairs', 5.2177896575440796e-11), ('savings=.. >= 1000 DM ', 4.0557757647139625e-11), ('personal_status_sex=male : divorced/separated', 3.627184253632623e-11), ('personal_status_sex=male : single', -2.9862189862658355e-11), ('credit_history=no credits taken/ all credits paid back duly', 2.8131802175589855e-11), ('other_installment_plans=bank', 1.9548368945624186e-11)]

lime_exp.plot_features_importance()