lore_sa.dataset.TabularDataset

class lore_sa.dataset.TabularDataset(data: pandas.DataFrame, class_name: Optional[str] = None, categorial_columns: Optional[list] = None, ordinal_columns: Optional[list] = None)[source]

It provides an interface to handle datasets, including some essential information on the structure and semantic of the dataset.

df

dataframe containing the whole dataset

Type:

pandas.DataFrame

descriptor

it contains the essential informationregarding each feature. Format:

>>>   {'numeric': {'feature name' :
>>>                   {
>>>                       'index' : <index of feature column>,
>>>                       'min' : <min value>,
>>>                       'max' : <max value>,
>>>                       'mean': <mean value>,
>>>                       'std': <standard deviation>,
>>>                       'median': <median value>,
>>>                       'q1': <first quartile of the distribution>,
>>>                       'q3': <third quartile of the distribution,
>>>                   },
>>>               ...,
>>>               ...,
>>>               },
>>>   'categorical: {'feature name':
>>>                       {
>>>                           'index' : <index of feature column>,
>>>                           'distinct_values' : <distinct categorical values>,
>>>                           'value_counts' : {'distinct value' : <elements count>,
>>>                                           ... }
>>>                       }
>>>                   },
>>>   'ordinal: {'feature name':
>>>                       {
>>>                           'index' : <index of feature column>,
>>>                           'distinct_values' : <distinct categorical values>,
>>>                           'value_counts' : {'distinct value' : <elements count>,
>>>                                           ... }
>>>                       }
>>>                   },
>>>                   ...
>>>                   ...
>>>                   ...
>>>   }
Type:

dict

__init__(data: pandas.DataFrame, class_name: Optional[str] = None, categorial_columns: Optional[list] = None, ordinal_columns: Optional[list] = None)[source]

Methods

__init__(data[, class_name, ...])

from_csv(filename[, class_name, dropna])

Read a comma-separated values (csv) file into Dataset object.

from_dict(data[, class_name])

From dicts of Series, arrays, or dicts.

get_categorical_columns()

get_class_values()

Provides the class_name :return:

get_feature_name(index)

Get the feature name by index :param index: :return: the name of the corresponding feature

get_feature_names()

get_features_names()

get_number_of_features()

get_numeric_columns()

set_class_name(class_name)

Set the class name.

set_descriptor(descriptor)

set_target_label(descriptor)

Set the target column into the dataset descriptor

update_descriptor([categorial_columns, ...])

it creates the dataset descriptor dictionary

classmethod from_csv(filename: str, class_name: Optional[str] = None, dropna: bool = True)[source]

Read a comma-separated values (csv) file into Dataset object. :param [str] filename: :param class_name: optional :return:

classmethod from_dict(data: dict, class_name: Optional[str] = None)[source]

From dicts of Series, arrays, or dicts. :param [dict] data: :param class_name: optional :return:

get_class_values()[source]

Provides the class_name :return:

get_feature_name(index)[source]

Get the feature name by index :param index: :return: the name of the corresponding feature

set_class_name(class_name: str)[source]

Set the class name. Only the column name string :param [str] class_name: :return:

set_target_label(descriptor)[source]

Set the target column into the dataset descriptor

Parameters:

descriptor

Returns:

update_descriptor(categorial_columns: Optional[list] = None, ordinal_columns: Optional[list] = None)[source]

it creates the dataset descriptor dictionary