lore_sa.surrogate.Surrogate
- class lore_sa.surrogate.Surrogate(kind=None, preprocessing=None)[source]
Abstract base class for interpretable surrogate models.
A surrogate model is an interpretable machine learning model (like a decision tree) that approximates the behavior of a complex black box model in a local region around a specific instance. LORE uses surrogates to extract interpretable rules that explain black box predictions.
The surrogate model is trained on a synthetic neighborhood of instances generated around the instance to explain, with labels provided by the black box model. This creates a local approximation that is both accurate and interpretable.
Key responsibilities: 1. Train an interpretable model on the neighborhood 2. Extract factual rules explaining the prediction 3. Generate counterfactual rules showing alternative scenarios 4. Measure fidelity (how well it approximates the black box)
- kind
Type of surrogate model (e.g., ‘decision_tree’, ‘supertree’)
- Type:
str
- preprocessing
Preprocessing method to apply before training
- fidelity
Score indicating how well the surrogate approximates the black box (computed during training)
- Type:
float
Example
>>> from lore_sa.surrogate import DecisionTreeSurrogate >>> >>> surrogate = DecisionTreeSurrogate() >>> surrogate.train(neighborhood_X, neighborhood_y) >>> rule = surrogate.get_rule(instance, encoder) >>> counterfactuals = surrogate.get_counterfactual_rules(instance, ... neighborhood_X, ... neighborhood_y, ... encoder)
See also
DecisionTreeSurrogate: Concrete implementation using scikit-learn decision trees
- __init__(kind=None, preprocessing=None)[source]
Initialize the surrogate model.
- Parameters:
kind (str, optional) – Type of surrogate model (e.g., ‘decision_tree’, ‘supertree’)
preprocessing (optional) – Preprocessing method to apply to the data before training
Methods
__init__([kind, preprocessing])Initialize the surrogate model.
get_counterfactual_rules(x, ...[, encoder, ...])Generate counterfactual rules showing alternative scenarios.
get_rule(x[, encdec])Extract the decision rule for a specific instance.
train(Z, Yb, weights)Train the surrogate model on neighborhood data.
- abstract get_counterfactual_rules(x: numpy.array, neighborhood_train_X: numpy.array, neighborhood_train_Y: numpy.array, encoder: Optional[EncDec] = None, filter_crules=None, constraints: Optional[dict] = None, unadmittible_features: Optional[list] = None)[source]
Generate counterfactual rules showing alternative scenarios.
Counterfactual rules describe what changes to the instance would result in a different prediction. They answer “what if” questions like: “What if the age was lower? Would the prediction change?”
This method finds paths in the surrogate model that lead to different classes and extracts the minimal changes (deltas) needed to reach those predictions.
- Parameters:
x (np.array) – Instance to explain, in encoded space, shape (n_encoded_features,)
neighborhood_train_X (np.array) – Neighborhood instances in encoded space, shape (n_samples, n_encoded_features)
neighborhood_train_Y (np.array) – Labels for neighborhood instances from the black box, shape (n_samples,)
encoder (EncDec, optional) – Encoder/decoder for converting rules to original space
filter_crules (optional) – Function to filter counterfactual rules
constraints (dict, optional) – Constraints on which features can be changed
unadmittible_features (list, optional) – List of features that cannot be changed (e.g., immutable features like age, gender)
- Returns:
- (counterfactual_rules, deltas) where:
counterfactual_rules (list): List of Rule objects for different classes
deltas (list): List of lists of Expression objects showing minimal changes needed for each counterfactual
- Return type:
tuple
Example
>>> crules, deltas = surrogate.get_counterfactual_rules( ... encoded_instance, neighborhood_X, neighborhood_y, encoder ... ) >>> print(f"Counterfactual: {crules[0]}") >>> print(f"Changes needed: {deltas[0]}") # Changes needed: [age >= 40, income > 60000]
- abstract get_rule(x: numpy.array, encdec: Optional[EncDec] = None)[source]
Extract the decision rule for a specific instance.
This method traverses the trained surrogate model to extract the decision rule that applies to the given instance. The rule describes the conditions (premises) that lead to the predicted class (consequence).
- Parameters:
x (np.array) – Instance to explain, in encoded space, shape (n_encoded_features,)
encdec (EncDec, optional) – Encoder/decoder to convert the rule back to original feature space for interpretability
- Returns:
- Rule object containing premises (conditions) and consequence (prediction)
that explain why the surrogate (and by extension, the black box) predicts a specific class for this instance
- Return type:
Example
>>> rule = surrogate.get_rule(encoded_instance, encoder) >>> print(rule) # Output: IF age > 30 AND income <= 50000 THEN class = 0
- abstract train(Z, Yb, weights)[source]
Train the surrogate model on neighborhood data.
This method trains the interpretable surrogate model on a synthetic neighborhood of instances, where the labels are provided by the black box model. The goal is to create a local approximation that captures the black box’s decision boundaries.
- Parameters:
Z (np.array) – Training data in encoded space, shape (n_samples, n_encoded_features). These are the synthetic instances generated around the instance to explain.
Yb (np.array) – Target labels from the black box model, shape (n_samples,). These are the predictions made by the black box on the neighborhood.
weights (np.array, optional) – Sample weights for training, shape (n_samples,). Can be used to emphasize certain instances in the neighborhood.
Note
The fidelity of the surrogate is typically computed during training to assess how well it approximates the black box model in the local neighborhood.