lore_sa.neighgen.GeneticGenerator
- class lore_sa.neighgen.GeneticGenerator(bbox=None, dataset=None, encoder=None, ocr=0.1, alpha1=0.5, alpha2=0.5, metric=<function neuclidean>, ngen=30, mutpb=0.2, cxpb=0.5, tournsize=3, halloffame_ratio=0.1, random_seed=None)[source]
Random Generator creates neighbor instances by generating random values starting from an input instance and pruning the generation around a fitness function based on proximity to the instance to explain
- __init__(bbox=None, dataset=None, encoder=None, ocr=0.1, alpha1=0.5, alpha2=0.5, metric=<function neuclidean>, ngen=30, mutpb=0.2, cxpb=0.5, tournsize=3, halloffame_ratio=0.1, random_seed=None)[source]
- Parameters:
bbox – the Black Box model to explain
dataset – the dataset with the descriptor of the original dataset
encoder – an encoder to transfrom the data from/to the black box model
ocr – acronym for One Class Ratio, it is the ratio of the number of instances of the minority class
alpha1 – the weight of the similarity of the features from the given instance. The sum of alpha1 and alpha2 must be 1
alpha2 – the weight of the similiarity of the target class from the given instance. The sum of alpha1 and alpha2 must be 1
metric – the distance metric to use to compute the distance between instances
ngen – the number of generations to run
mutpb – probability of mutation of a specific feature
cxpb –
tournsize –
halloffame_ratio –
random_seed – initial seed for the random number generator
Methods
__init__([bbox, dataset, encoder, ocr, ...])- param bbox:
the Black Box model to explain
add_halloffame(population, halloffame)balance_neigh(z, Z, num_samples)check_generated([filter_function, check_fuction])It contains the logic to check the requirements for generated data
clone(x)eaSimple(toolbox, cxpb, mutpb, ngen[, ...])This algorithm reproduce the simplest evolutionary algorithm as presented in chapter 7 of [Back2000].
fit(toolbox, population_size)fitness_equal(z, z1)fitness_notequal(z, z1)generate(z, num_instances, descriptor, encoder)The generation is based on the strategy of generating a number of instances for the same class as the input instance and a number of instances for a different class.
generate_synthetic_instance([from_z, mutpb])Generate a single synthetic instance.
mate(ind1, ind2)Executes a two-point crossover on the input sequence individuals.
mutate(toolbox, x)This fitness function evaluate the feature_similarity and the target_similarity of a population against a given instance z.
population_fitness_notequal(z)random_init()record_init(x)This function is used to generate a random instance to start the evolutionary algorithm.
setup_toolbox(x, evaluate, population_size)setup_toolbox_noteq(x, x1, evaluate, ...)- abstract check_generated(filter_function=None, check_fuction=None)
It contains the logic to check the requirements for generated data
- eaSimple(toolbox, cxpb, mutpb, ngen, stats=None, halloffame=None, verbose=True)[source]
This algorithm reproduce the simplest evolutionary algorithm as presented in chapter 7 of [Back2000].
- Parameters:
population – A list of individuals.
toolbox – A
Toolboxthat contains the evolution operators.cxpb – The probability of mating two individuals.
mutpb – The probability of mutating an individual.
ngen – The number of generation.
stats – A
Statisticsobject that is updated inplace, optional.halloffame – A
HallOfFameobject that will contain the best individuals, optional.verbose – Whether or not to log the statistics.
- Returns:
The final population
- Returns:
A class:~deap.tools.Logbook with the statistics of the evolution
This implementation is an adaptation of the original algorithm implemented in the DEAP library.
- generate(z, num_instances, descriptor, encoder)[source]
The generation is based on the strategy of generating a number of instances for the same class as the input instance and a number of instances for a different class. The generation of the instances for each subgroup is done through a genetic algorithm based on two fitness fuctions: one for the same class and one for the different class. :param z: the input instance, from which the generation starts :param num_instances: how many elements to generate :param descriptor: the descriptor of the dataset. This provides the metadata of each feature to guide the generation :param encoder: the encoder to transform the data from/to the black box model
- Returns:
a new set of instances generated from the input instance. The first element is the input instance
- generate_synthetic_instance(from_z=None, mutpb=1.0)
Generate a single synthetic instance.
This method creates one synthetic instance by randomly sampling or mutating feature values. For categorical features, it randomly selects from valid values. For numerical features, it samples from the feature’s range.
- Parameters:
from_z (np.array, optional) – Starting instance in encoded space to mutate. If None, generates a completely random instance. If provided, features are mutated with probability mutpb.
mutpb (float, optional) – Mutation probability for each feature (0 to 1). Only used when from_z is provided. Default is 1.0 (mutate all features).
- Returns:
A single synthetic instance in encoded space, shape (n_encoded_features,)
- Return type:
np.array
Note
The method respects feature types and valid ranges from the dataset descriptor. For categorical features, it ensures the one-hot encoding constraint (exactly one category is active).
- mate(ind1, ind2)
Executes a two-point crossover on the input sequence individuals. The two individuals are modified in place and both keep their original length. This implementation uses the original implementation of the DEAP library. It adds a special case for the one-hot encoding, where the crossover is done taking into account the intervals of values imposed by the one-hot encoding.
- Parameters:
ind1 – The first individual participating in the crossover.
ind2 – The second individual participating in the crossover.
- Returns:
A tuple of two individuals.
This function uses the
randint()function from the Python baserandommodule.
- population_fitness_equal(z)[source]
This fitness function evaluate the feature_similarity and the target_similarity of a population against a given instance z. The two similarities are computed using optimezed functions of numpy and scipy libraries. This improves the performance of the algorithm.
- record_init(x)
This function is used to generate a random instance to start the evolutionary algorithm. In this case we repeat the input instance x for all the initial population
- Returns:
a (not so) random instance