Tabular explanations example ============================ Learning and explaining German Credit Dataset --------------------------------------------- .. code:: ipython3 import pandas as pd import numpy as np from sklearn import preprocessing from sklearn.ensemble import RandomForestClassifier from sklearn.model_selection import train_test_split from sklearn.linear_model import LogisticRegression from xailib.data_loaders.dataframe_loader import prepare_dataframe from xailib.explainers.lime_explainer import LimeXAITabularExplainer from xailib.explainers.lore_explainer import LoreTabularExplainer from xailib.explainers.shap_explainer_tab import ShapXAITabularExplainer from xailib.models.sklearn_classifier_wrapper import sklearn_classifier_wrapper Loading and preparation of data ------------------------------- We start by reading from a CSV file the dataset to analyze. The table is loaded by means of the ``DataFrame`` class from the ``pandas`` library. Among all the attributes of the table, we select the ``class_field`` column that contains the observed class for the corresponding row. .. code:: ipython3 source_file = 'datasets/german_credit.csv' class_field = 'default' # Load and transform dataset df = pd.read_csv(source_file, skipinitialspace=True, na_values='?', keep_default_na=True) After the data is loaded in memory, we need to extract metadata information to automatically handle the content withint the table. The method ``prepare_dataframe`` scans the table and extract the following information: \* ``df``: is a trasformed version of the original dataframe, where discrete attributes are transformed into numerical attributes by using one hot encoding strategy; \* ``feature_names``: is a list containint the names of the features after the transformation; \* ``class_values``: the list of all the possible values for the ``class_field`` column; \* ``numeric_columns``: a list of the original features that contain numeric (i.e. continuous) values; \* ``rdf``: the original dataframe, before the transformation; \* ``real_feature_names``: the list of the features of the dataframe before the transformation; \* ``features_map``: it is a dictionary pointing each feature to the original one before the transformation. .. code:: ipython3 df, feature_names, class_values, numeric_columns, rdf, real_feature_names, features_map = prepare_dataframe(df, class_field) Learning a Random Forest classfier ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ We train a RF classifier by using the ``sklearn`` library. We start by splitting the dataset into a train and test subsets. .. code:: ipython3 test_size = 0.3 random_state = 42 X_train, X_test, Y_train, Y_test = train_test_split(df[feature_names], df[class_field], test_size=test_size, random_state=random_state, stratify=df[class_field]) Then we train the model on the training set. Once the model has been learned, we use a wrapper class to get access to the model for ``XAI lib`` .. code:: ipython3 bb = RandomForestClassifier(n_estimators=20, random_state=random_state) bb.fit(X_train.values, Y_train.values) bbox = sklearn_classifier_wrapper(bb) Select a new instance to be classfied by the model and print the predicted class. .. code:: ipython3 inst = X_train.iloc[147].values print('Instance ',inst) print('True class ',Y_train.iloc[8]) print('Predicted class ',bb.predict(inst.reshape(1, -1))) .. parsed-literal:: Instance [ 15 975 2 3 25 2 1 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 1 0 0 0 1 0 0 0 0 1 1 0 0 0 0 1 0 0 1 0 0 1 0 0 1 0 0 1] True class 0 Predicted class [0] Explaining the prediction ------------------------- We use the explanators of ``XAI lib`` to provide an explantion for the classified instance ``inst``. Every explainer of ``XAI lib`` takes in input the blackbox to be explained with the corresponding feature names, and a configuration object to initialize the explainer. SHAP explainer ~~~~~~~~~~~~~~ .. code:: ipython3 explainer = ShapXAITabularExplainer(bbox, feature_names) config = {'explainer' : 'tree', 'X_train' : X_train.iloc[0:100].values} explainer.fit(config) .. code:: ipython3 exp = explainer.explain(inst) # print(exp.exp) .. code:: ipython3 exp.plot_features_importance() LORE explainer ~~~~~~~~~~~~~~ .. code:: ipython3 explainer = LoreTabularExplainer(bbox) config = {'neigh_type':'rndgen', 'size':1000, 'ocr':0.1, 'ngen':10} explainer.fit(df, class_field, config) exp = explainer.explain(inst) print(exp) .. code:: ipython3 exp.plotRules() .. code:: ipython3 exp.plotCounterfactualRules() LIME explainer ~~~~~~~~~~~~~~ .. code:: ipython3 limeExplainer = LimeXAITabularExplainer(bbox) config = {'feature_selection': 'lasso_path'} limeExplainer.fit(df, class_field, config) lime_exp = limeExplainer.explain(inst) print(lime_exp.exp.as_list()) .. parsed-literal:: [('account_check_status=no checking account', -0.03792512128083548), ('duration_in_month', 0.03701527256562679), ('account_check_status=< 0 DM', 0.03144299031649348), ('savings=... < 100 DM', 0.020051934530021572), ('age', -0.019751080001761446), ('credit_history=critical account/ other credits existing (not at this bank)', -0.018970043296280513), ('other_installment_plans=none', -0.018869997928840695), ('other_installment_plans=bank', 0.017658677626390982), ('housing=own', -0.014948467979451343), ('credit_history=delay in paying off in the past', 0.012221985897781883)] .. code:: ipython3 # limeExplainer.plot_lime_values(lime_exp.as_list(), 5, 10) lime_exp.plot_features_importance() .. raw:: html .. raw:: html
Learning a different model -------------------------- Learning a Logistic Regressor ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ We train a Logistic Regression by using the ``sklearn`` library. We transform the dataset by using a ``Scaler`` to normalize all the attributes. .. code:: ipython3 scaler = preprocessing.StandardScaler().fit(X_train) X_scaled = scaler.transform(X_train) bb = LogisticRegression(C=1, penalty='l2') bb.fit(X_scaled, Y_train.values) # pass the model to the wrapper to use it in the XAI lib bbox = sklearn_classifier_wrapper(bb) .. code:: ipython3 # select a record to explain inst = X_scaled[182] print('Instance ',inst) print('Predicted class ',bb.predict(inst.reshape(1, -1))) .. parsed-literal:: Instance [ 2.27797454 3.35504085 0.94540357 1.07634233 0.04854891 -0.72456474 -0.43411405 1.65027399 -0.61477862 -0.25898489 -0.80681063 4.17385345 -0.6435382 -0.32533856 -1.03489416 -0.20412415 -0.22941573 -0.33068147 1.75885396 -0.34899122 -0.60155441 -0.15294382 -0.09298136 -0.46852129 -0.12038585 -0.08481889 -0.23623492 -1.21387736 -0.36174054 -0.24943031 2.15526362 -0.59715086 -0.45485883 -0.73610476 -0.43875307 4.23307441 -0.65242771 -0.23958675 -0.32533856 0.90192655 4.72581563 -0.2259448 -3.15238005 -0.54212562 -0.70181003 -0.63024248 2.30354212 -0.40586384 0.49329429 -0.23958675 2.88675135 -1.59227935 -0.46170508 2.46388049 -1.33747696 -0.13206764 -0.5 -1.21387736 1.21387736 -0.20412415 0.20412415] Predicted class [1] Explaining the prediction ------------------------- We use the same explainators as for the previous model. In this case, a few adjustments are necessary for the initialization of the explanators. For example, SHAP needs a specific configuration for the linear model we are using. ### SHAP Explainer .. code:: ipython3 explainer = ShapXAITabularExplainer(bbox, feature_names) config = {'explainer' : 'linear', 'X_train' : X_scaled[0:100], 'feature_pert' : 'interventional'} explainer.fit(config) .. code:: ipython3 exp = explainer.explain(inst) print(exp) .. parsed-literal:: .. code:: ipython3 exp.plot_features_importance() .. raw:: html .. raw:: html
LORE explainer ~~~~~~~~~~~~~~ .. code:: ipython3 explainer = LoreTabularExplainer(bbox) config = {'neigh_type':'geneticp', 'size':1000, 'ocr':0.1, 'ngen':10} explainer.fit(df, class_field, config) exp = explainer.explain(inst) print(exp) .. parsed-literal:: .. code:: ipython3 exp.plotRules() .. raw:: html .. raw:: html

Why the predicted value for class default is 1 ?

.. raw:: html

Because all the following conditions happen:

.. raw:: html

age <= 20.726173400878906credit amount > -439.6443485021591purpose=retraining <= 0.11524588242173195duration in month > -1.9407005310058594purpose=furniture/equipment <= 0.18370826542377472foreign worker=no <= 0.7168410122394562purpose=domestic appliances <= 1.015466570854187savings=.. >= 1000 DM <= 0.7176859378814697purpose=(vacation - does not exist?) <= 0.4622504562139511credit history=critical account/ other credits existing (not at this bank) <= 0.9085964262485504

.. code:: ipython3 exp.plotCounterfactualRules() .. raw:: html .. raw:: html

The predicted value for class default is 1.

It would have been:

.. raw:: html

0 if the following condition holds


age <= 20.726173400878906credit amount <= -439.6443485021591
.. raw:: html

0 if the following condition holds


age > 20.726173400878906credit amount <= 26.468921303749084duration in month <= 5.795059680938721installment as income perc <= 4.603440999984741
LIME explainer ~~~~~~~~~~~~~~ .. code:: ipython3 limeExplainer = LimeXAITabularExplainer(bbox) config = {'feature_selection': 'lasso_path'} limeExplainer.fit(df, class_field, config) lime_exp = limeExplainer.explain(inst) print(lime_exp.exp.as_list()) .. parsed-literal:: [('other_debtors=co-applicant', -1.3046177878918616e-09), ('credit_history=all credits at this bank paid back duly', -1.0114574629252053e-09), ('present_emp_since=unemployed', -8.87554096296626e-10), ('other_debtors=none', 7.43754044231906e-10), ('housing=for free', -4.4157786564097103e-10), ('property=unknown / no property', -3.275710719845092e-10), ('credit_amount', 3.271233788564153e-10), ('job=management/ self-employed/ highly qualified employee/ officer', -3.164190703926506e-10), ('housing=own', 2.8902027822084106e-10), ('savings=unknown/ no savings account', -2.604277452741881e-10), ('job=skilled employee / official', 2.3808188198617575e-10), ('foreign_worker=yes', 2.365347360238489e-10), ('telephone=none', 2.2048259721367863e-10), ('age', 2.171945479826713e-10), ('savings=... < 100 DM', 2.1116662177987812e-10), ('credits_this_bank', 1.9999632029038067e-10), ('credit_history=existing credits paid back duly till now', 1.9243622007776865e-10), ('people_under_maintenance', 1.902008911572941e-10), ('purpose=car (new)', -1.7104663723358493e-10), ('account_check_status=no checking account', 1.6584313433238958e-10), ('account_check_status=0 <= ... < 200 DM', -1.639544710042764e-10), ('credit_history=critical account/ other credits existing (not at this bank)', 1.317487567892989e-10), ('job=unskilled - resident', 1.307761159896724e-10), ('other_installment_plans=stores', 1.2347569776391545e-10), ('foreign_worker=no', 1.1825353902253505e-10), ('present_emp_since=1 <= ... < 4 years', 1.1478921168922655e-10), ('property=if not A121/A122 : car or other, not in attribute 6', 1.1222769011436428e-10), ('personal_status_sex=female : divorced/separated/married', 1.1002871894681165e-10), ('savings=100 <= ... < 500 DM', 1.0982251402773794e-10), ('purpose=domestic appliances', 1.0567984890752028e-10), ('present_res_since', 9.869484730455045e-11), ('account_check_status=>= 200 DM / salary assignments for at least 1 year', 9.721716212812873e-11), ('present_emp_since=.. >= 7 years', 9.327030468700815e-11), ('installment_as_income_perc', 9.192261925231111e-11), ('property=real estate', 9.180043418264463e-11), ('purpose=(vacation - does not exist?)', 8.974505020571898e-11), ('account_check_status=< 0 DM', 8.848004118893571e-11), ('purpose=retraining', 8.80910843922895e-11), ('purpose=education', 8.803520453193465e-11), ('purpose=business', 8.330599059469541e-11), ('housing=rent', 7.975475868460632e-11), ('property=if not A121 : building society savings agreement/ life insurance', 7.826524390749874e-11), ('other_debtors=guarantor', 7.385760952840171e-11), ('present_emp_since=... < 1 year ', 7.338094381227495e-11), ('duration_in_month', 6.689756440260244e-11), ('purpose=car (used)', 6.582965568284186e-11), ('job=unemployed/ unskilled - non-resident', 6.473736018584135e-11), ('present_emp_since=4 <= ... < 7 years', 6.230002403518189e-11), ('purpose=furniture/equipment', 5.974714318917145e-11), ('purpose=radio/television', 5.909852887925919e-11), ('credit_history=delay in paying off in the past', 5.620862803354922e-11), ('savings=500 <= ... < 1000 DM ', 5.582941358078461e-11), ('other_installment_plans=none', 5.501318386790144e-11), ('personal_status_sex=male : married/widowed', 5.500125372750834e-11), ('telephone=yes, registered under the customers name ', -5.495252929908006e-11), ('purpose=repairs', 5.2177896575440796e-11), ('savings=.. >= 1000 DM ', 4.0557757647139625e-11), ('personal_status_sex=male : divorced/separated', 3.627184253632623e-11), ('personal_status_sex=male : single', -2.9862189862658355e-11), ('credit_history=no credits taken/ all credits paid back duly', 2.8131802175589855e-11), ('other_installment_plans=bank', 1.9548368945624186e-11)] .. code:: ipython3 lime_exp.plot_features_importance() .. raw:: html .. raw:: html