lore_sa.encoder_decoder.ColumnTransformerEnc

class lore_sa.encoder_decoder.ColumnTransformerEnc(descriptor: dict)[source]

It provides an interface to access One Hot enconding (https://en.wikipedia.org/wiki/One-hot) functions. It relies on OneHotEncoder class from sklearn

__init__(descriptor: dict)[source]

Initialize the encoder/decoder.

Parameters:

dataset_descriptor (dict) – Dictionary containing feature information including ‘numeric’, ‘categorical’, and ‘ordinal’ feature descriptors

Methods

__init__(descriptor)

Initialize the encoder/decoder.

decode(Z)

Decode the array staring from the original descriptor

decode_target_class(Z)

Decode the target class

encode(X)

It applies the encoder to the input features

encode_target_class(X)

Encode the target class :param X: :return:

get_encoded_features()

Get a mapping of encoded feature indices to feature names.

get_encoded_intervals()

Get index intervals for each original feature in the encoded space.

decode(Z: numpy.array)[source]

Decode the array staring from the original descriptor

Parameters:

x ([Numpy array]) – Array to decode

Return [Numpy array]:

Decoded array

decode_target_class(Z: numpy.array)[source]

Decode the target class

Parameters:

x ([Numpy array]) – Array containing the target class values to be decoded

encode(X: numpy.array)[source]

It applies the encoder to the input features

Parameters:

x ([Numpy array]) – Array to encode

Return [Numpy array]:

Encoded array

encode_target_class(X: numpy.array)[source]

Encode the target class :param X: :return:

get_encoded_features()[source]

Get a mapping of encoded feature indices to feature names.

Returns:

Dictionary mapping encoded feature indices to descriptive names.

For one-hot encoded features, names include the category value (e.g., ‘color=red’, ‘color=blue’).

Return type:

dict

Example

>>> features = encoder.get_encoded_features()
>>> # {0: 'age', 1: 'color=red', 2: 'color=blue', 3: 'color=green'}
get_encoded_intervals()[source]

Get index intervals for each original feature in the encoded space.

This method returns a list of (start, end) tuples indicating the range of encoded indices that correspond to each original feature. This is useful when an original categorical feature is one-hot encoded into multiple columns.

Returns:

List of (start_idx, end_idx) tuples, one for each original feature.

For numerical features, start_idx == end_idx. For one-hot encoded categorical features, the interval spans multiple indices.

Return type:

list

Example

>>> intervals = encoder.get_encoded_intervals()
>>> # [(0, 1), (1, 4), (4, 5)]  # age (1 col), color (3 cols), income (1 col)