Tabular explanations example
Learning and explaining German Credit Dataset
import pandas as pd
import numpy as np
from sklearn import preprocessing
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from xailib.data_loaders.dataframe_loader import prepare_dataframe
from xailib.explainers.lime_explainer import LimeXAITabularExplainer
from xailib.explainers.lore_explainer import LoreTabularExplainer
from xailib.explainers.shap_explainer_tab import ShapXAITabularExplainer
from xailib.models.sklearn_classifier_wrapper import sklearn_classifier_wrapper
Loading and preparation of data
We start by reading from a CSV file the dataset to analyze. The table is
loaded by means of the DataFrame
class from the pandas
library.
Among all the attributes of the table, we select the class_field
column that contains the observed class for the corresponding row.
source_file = 'datasets/german_credit.csv'
class_field = 'default'
# Load and transform dataset
df = pd.read_csv(source_file, skipinitialspace=True, na_values='?', keep_default_na=True)
After the data is loaded in memory, we need to extract metadata information to automatically handle the content withint the table.
The method prepare_dataframe
scans the table and extract the
following information: * df
: is a trasformed version of the
original dataframe, where discrete attributes are transformed into
numerical attributes by using one hot encoding strategy; *
feature_names
: is a list containint the names of the features after
the transformation; * class_values
: the list of all the possible
values for the class_field
column; * numeric_columns
: a list of
the original features that contain numeric (i.e. continuous) values; *
rdf
: the original dataframe, before the transformation; *
real_feature_names
: the list of the features of the dataframe before
the transformation; * features_map
: it is a dictionary pointing
each feature to the original one before the transformation.
df, feature_names, class_values, numeric_columns, rdf, real_feature_names, features_map = prepare_dataframe(df, class_field)
Learning a Random Forest classfier
We train a RF classifier by using the sklearn
library. We start by
splitting the dataset into a train and test subsets.
test_size = 0.3
random_state = 42
X_train, X_test, Y_train, Y_test = train_test_split(df[feature_names], df[class_field],
test_size=test_size,
random_state=random_state,
stratify=df[class_field])
Then we train the model on the training set. Once the model has been
learned, we use a wrapper class to get access to the model for
XAI lib
bb = RandomForestClassifier(n_estimators=20, random_state=random_state)
bb.fit(X_train.values, Y_train.values)
bbox = sklearn_classifier_wrapper(bb)
Select a new instance to be classfied by the model and print the predicted class.
inst = X_train.iloc[147].values
print('Instance ',inst)
print('True class ',Y_train.iloc[8])
print('Predicted class ',bb.predict(inst.reshape(1, -1)))
Instance [ 15 975 2 3 25 2 1 0 1 0 0 0 1 0 0 0 0 0
0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 1 0 0
0 1 0 0 0 0 1 1 0 0 0 0 1 0 0 1 0 0
1 0 0 1 0 0 1]
True class 0
Predicted class [0]
Explaining the prediction
We use the explanators of XAI lib
to provide an explantion for the
classified instance inst
. Every explainer of XAI lib
takes in
input the blackbox to be explained with the corresponding feature names,
and a configuration object to initialize the explainer.
SHAP explainer
explainer = ShapXAITabularExplainer(bbox, feature_names)
config = {'explainer' : 'tree', 'X_train' : X_train.iloc[0:100].values}
explainer.fit(config)
exp = explainer.explain(inst)
# print(exp.exp)
exp.plot_features_importance()
LORE explainer
explainer = LoreTabularExplainer(bbox)
config = {'neigh_type':'rndgen', 'size':1000, 'ocr':0.1, 'ngen':10}
explainer.fit(df, class_field, config)
exp = explainer.explain(inst)
print(exp)
exp.plotRules()
exp.plotCounterfactualRules()
LIME explainer
limeExplainer = LimeXAITabularExplainer(bbox)
config = {'feature_selection': 'lasso_path'}
limeExplainer.fit(df, class_field, config)
lime_exp = limeExplainer.explain(inst)
print(lime_exp.exp.as_list())
[('account_check_status=no checking account', -0.03792512128083548), ('duration_in_month', 0.03701527256562679), ('account_check_status=< 0 DM', 0.03144299031649348), ('savings=... < 100 DM', 0.020051934530021572), ('age', -0.019751080001761446), ('credit_history=critical account/ other credits existing (not at this bank)', -0.018970043296280513), ('other_installment_plans=none', -0.018869997928840695), ('other_installment_plans=bank', 0.017658677626390982), ('housing=own', -0.014948467979451343), ('credit_history=delay in paying off in the past', 0.012221985897781883)]
# limeExplainer.plot_lime_values(lime_exp.as_list(), 5, 10)
lime_exp.plot_features_importance()
Learning a different model
Learning a Logistic Regressor
We train a Logistic Regression by using the sklearn
library. We
transform the dataset by using a Scaler
to normalize all the
attributes.
scaler = preprocessing.StandardScaler().fit(X_train)
X_scaled = scaler.transform(X_train)
bb = LogisticRegression(C=1, penalty='l2')
bb.fit(X_scaled, Y_train.values)
# pass the model to the wrapper to use it in the XAI lib
bbox = sklearn_classifier_wrapper(bb)
# select a record to explain
inst = X_scaled[182]
print('Instance ',inst)
print('Predicted class ',bb.predict(inst.reshape(1, -1)))
Instance [ 2.27797454 3.35504085 0.94540357 1.07634233 0.04854891 -0.72456474
-0.43411405 1.65027399 -0.61477862 -0.25898489 -0.80681063 4.17385345
-0.6435382 -0.32533856 -1.03489416 -0.20412415 -0.22941573 -0.33068147
1.75885396 -0.34899122 -0.60155441 -0.15294382 -0.09298136 -0.46852129
-0.12038585 -0.08481889 -0.23623492 -1.21387736 -0.36174054 -0.24943031
2.15526362 -0.59715086 -0.45485883 -0.73610476 -0.43875307 4.23307441
-0.65242771 -0.23958675 -0.32533856 0.90192655 4.72581563 -0.2259448
-3.15238005 -0.54212562 -0.70181003 -0.63024248 2.30354212 -0.40586384
0.49329429 -0.23958675 2.88675135 -1.59227935 -0.46170508 2.46388049
-1.33747696 -0.13206764 -0.5 -1.21387736 1.21387736 -0.20412415
0.20412415]
Predicted class [1]
Explaining the prediction
We use the same explainators as for the previous model. In this case, a few adjustments are necessary for the initialization of the explanators. For example, SHAP needs a specific configuration for the linear model we are using. ### SHAP Explainer
explainer = ShapXAITabularExplainer(bbox, feature_names)
config = {'explainer' : 'linear', 'X_train' : X_scaled[0:100], 'feature_pert' : 'interventional'}
explainer.fit(config)
exp = explainer.explain(inst)
print(exp)
<xailib.explainers.shap_explainer_tab.ShapXAITabularExplanation object at 0x12a72dac8>
exp.plot_features_importance()
LORE explainer
explainer = LoreTabularExplainer(bbox)
config = {'neigh_type':'geneticp', 'size':1000, 'ocr':0.1, 'ngen':10}
explainer.fit(df, class_field, config)
exp = explainer.explain(inst)
print(exp)
<xailib.explainers.lore_explainer.LoreTabularExplanation object at 0x12bc41a90>
exp.plotRules()
Why the predicted value for class default is 1 ?
Because all the following conditions happen:
age <= 20.726173400878906credit amount > -439.6443485021591purpose=retraining <= 0.11524588242173195duration in month > -1.9407005310058594purpose=furniture/equipment <= 0.18370826542377472foreign worker=no <= 0.7168410122394562purpose=domestic appliances <= 1.015466570854187savings=.. >= 1000 DM <= 0.7176859378814697purpose=(vacation - does not exist?) <= 0.4622504562139511credit history=critical account/ other credits existing (not at this bank) <= 0.9085964262485504
exp.plotCounterfactualRules()
The predicted value for class default is 1.
It would have been:
0 if the following condition holds
age <= 20.726173400878906credit amount <= -439.64434850215910 if the following condition holds
age > 20.726173400878906credit amount <= 26.468921303749084duration in month <= 5.795059680938721installment as income perc <= 4.603440999984741LIME explainer
limeExplainer = LimeXAITabularExplainer(bbox)
config = {'feature_selection': 'lasso_path'}
limeExplainer.fit(df, class_field, config)
lime_exp = limeExplainer.explain(inst)
print(lime_exp.exp.as_list())
[('other_debtors=co-applicant', -1.3046177878918616e-09), ('credit_history=all credits at this bank paid back duly', -1.0114574629252053e-09), ('present_emp_since=unemployed', -8.87554096296626e-10), ('other_debtors=none', 7.43754044231906e-10), ('housing=for free', -4.4157786564097103e-10), ('property=unknown / no property', -3.275710719845092e-10), ('credit_amount', 3.271233788564153e-10), ('job=management/ self-employed/ highly qualified employee/ officer', -3.164190703926506e-10), ('housing=own', 2.8902027822084106e-10), ('savings=unknown/ no savings account', -2.604277452741881e-10), ('job=skilled employee / official', 2.3808188198617575e-10), ('foreign_worker=yes', 2.365347360238489e-10), ('telephone=none', 2.2048259721367863e-10), ('age', 2.171945479826713e-10), ('savings=... < 100 DM', 2.1116662177987812e-10), ('credits_this_bank', 1.9999632029038067e-10), ('credit_history=existing credits paid back duly till now', 1.9243622007776865e-10), ('people_under_maintenance', 1.902008911572941e-10), ('purpose=car (new)', -1.7104663723358493e-10), ('account_check_status=no checking account', 1.6584313433238958e-10), ('account_check_status=0 <= ... < 200 DM', -1.639544710042764e-10), ('credit_history=critical account/ other credits existing (not at this bank)', 1.317487567892989e-10), ('job=unskilled - resident', 1.307761159896724e-10), ('other_installment_plans=stores', 1.2347569776391545e-10), ('foreign_worker=no', 1.1825353902253505e-10), ('present_emp_since=1 <= ... < 4 years', 1.1478921168922655e-10), ('property=if not A121/A122 : car or other, not in attribute 6', 1.1222769011436428e-10), ('personal_status_sex=female : divorced/separated/married', 1.1002871894681165e-10), ('savings=100 <= ... < 500 DM', 1.0982251402773794e-10), ('purpose=domestic appliances', 1.0567984890752028e-10), ('present_res_since', 9.869484730455045e-11), ('account_check_status=>= 200 DM / salary assignments for at least 1 year', 9.721716212812873e-11), ('present_emp_since=.. >= 7 years', 9.327030468700815e-11), ('installment_as_income_perc', 9.192261925231111e-11), ('property=real estate', 9.180043418264463e-11), ('purpose=(vacation - does not exist?)', 8.974505020571898e-11), ('account_check_status=< 0 DM', 8.848004118893571e-11), ('purpose=retraining', 8.80910843922895e-11), ('purpose=education', 8.803520453193465e-11), ('purpose=business', 8.330599059469541e-11), ('housing=rent', 7.975475868460632e-11), ('property=if not A121 : building society savings agreement/ life insurance', 7.826524390749874e-11), ('other_debtors=guarantor', 7.385760952840171e-11), ('present_emp_since=... < 1 year ', 7.338094381227495e-11), ('duration_in_month', 6.689756440260244e-11), ('purpose=car (used)', 6.582965568284186e-11), ('job=unemployed/ unskilled - non-resident', 6.473736018584135e-11), ('present_emp_since=4 <= ... < 7 years', 6.230002403518189e-11), ('purpose=furniture/equipment', 5.974714318917145e-11), ('purpose=radio/television', 5.909852887925919e-11), ('credit_history=delay in paying off in the past', 5.620862803354922e-11), ('savings=500 <= ... < 1000 DM ', 5.582941358078461e-11), ('other_installment_plans=none', 5.501318386790144e-11), ('personal_status_sex=male : married/widowed', 5.500125372750834e-11), ('telephone=yes, registered under the customers name ', -5.495252929908006e-11), ('purpose=repairs', 5.2177896575440796e-11), ('savings=.. >= 1000 DM ', 4.0557757647139625e-11), ('personal_status_sex=male : divorced/separated', 3.627184253632623e-11), ('personal_status_sex=male : single', -2.9862189862658355e-11), ('credit_history=no credits taken/ all credits paid back duly', 2.8131802175589855e-11), ('other_installment_plans=bank', 1.9548368945624186e-11)]
lime_exp.plot_features_importance()