lore_sa.dataset.TabularDataset

class lore_sa.dataset.TabularDataset(data: DataFrame, class_name: str = None)[source]

It provides an interface to handle datasets, including some essential information on the structure and semantic of the dataset.

df

dataframe containing the whole dataset

Type

pandas.DataFrame

descriptor

it contains the essential informationregarding each feature. Format:

>>> {'numeric': {'feature name' :
                {
                    'index' : <index of feature column>,
                    'min' : <min value>,
                    'max' : <max value>,
                    'mean': <mean value>,
                    'std': <standard deviation>,
                    'median': <median value>,
                    'q1': <first quartile of the distribution>,
                    'q3': <third quartile of the distribution,
                },
            ...,
            ...,
            },
'categorical: {'feature name':
                    {
                        'index' : <index of feature column>,
                        'distinct_values' : <distinct categorical values>,
                        'value_counts' : {'distinct value' : <elements count>,
                                        ... }
                    }
                },
                ...
                ...
                ...
}
Type

dict

__init__(data: DataFrame, class_name: str = None)[source]

Methods

__init__(data[, class_name])

from_csv(filename[, class_name])

Read a comma-separated values (csv) file into Dataset object.

from_dict(data[, class_name])

From dicts of Series, arrays, or dicts.

get_categorical_columns()

get_class_values()

return the list of values of the target column :return:

get_feature_name(index)

Get the feature name by index :param index: :return: the name of the corresponding feature

get_feature_names()

get_features_names()

get_number_of_features()

get_numeric_columns()

set_class_name(class_name)

Set the class name.

set_descriptor(descriptor)

set_target_label(descriptor)

Set the target column into the dataset descriptor

update_descriptor()

it creates the dataset descriptor dictionary

classmethod from_csv(filename: str, class_name: str = None)[source]

Read a comma-separated values (csv) file into Dataset object. :param [str] filename: :param class_name: optional :return:

classmethod from_dict(data: dict, class_name: str = None)[source]

From dicts of Series, arrays, or dicts. :param [dict] data: :param class_name: optional :return:

get_class_values()[source]

return the list of values of the target column :return:

get_feature_name(index)

Get the feature name by index :param index: :return: the name of the corresponding feature

set_class_name(class_name: str)[source]

Set the class name. Only the column name string :param [str] class_name: :return:

set_target_label(descriptor)[source]

Set the target column into the dataset descriptor

Parameters

descriptor

Returns

update_descriptor()[source]

it creates the dataset descriptor dictionary