lore_sa.dataset.TabularDataset

class lore_sa.dataset.TabularDataset(data: pandas.DataFrame, class_name: Optional[str] = None, categorial_columns: Optional[list] = None, ordinal_columns: Optional[list] = None)[source]

It provides an interface to handle datasets, including some essential information on the structure and semantic of the dataset.

df

dataframe containing the whole dataset

Type:: pandas.DataFrame

descriptor

it contains the essential informationregarding each feature. Format:

>>>   {'numeric': {'feature name' :
>>>                   {
>>>                       'index' : <index of feature column>,
>>>                       'min' : <min value>,
>>>                       'max' : <max value>,
>>>                       'mean': <mean value>,
>>>                       'std': <standard deviation>,
>>>                       'median': <median value>,
>>>                       'q1': <first quartile of the distribution>,
>>>                       'q3': <third quartile of the distribution,
>>>                   },
>>>               ...,
>>>               ...,
>>>               },
>>>   'categorical: {'feature name':
>>>                       {
>>>                           'index' : <index of feature column>,
>>>                           'distinct_values' : <distinct categorical values>,
>>>                           'value_counts' : {'distinct value' : <elements count>,
>>>                                           ... }
>>>                       }
>>>                   },
>>>   'ordinal: {'feature name':
>>>                       {
>>>                           'index' : <index of feature column>,
>>>                           'distinct_values' : <distinct categorical values>,
>>>                           'value_counts' : {'distinct value' : <elements count>,
>>>                                           ... }
>>>                       }
>>>                   },
>>>                   ...
>>>                   ...
>>>                   ...
>>>   }

Type:: dict

__init__(data: pandas.DataFrame, class_name: Optional[str] = None, categorial_columns: Optional[list] = None, ordinal_columns: Optional[list] = None)[source]

Methods

`__init__`(data[, class_name, ...])
`from_csv`(filename[, class_name, dropna])	Read a comma-separated values (csv) file into Dataset object.
`from_dict`(data[, class_name])	From dicts of Series, arrays, or dicts.
`get_categorical_columns`()
`get_class_values`()	Provides the class_name :return:
`get_feature_name`(index)	Get the feature name by index :param index: :return: the name of the corresponding feature
`get_feature_names`()
`get_features_names`()
`get_number_of_features`()
`get_numeric_columns`()
`set_class_name`(class_name)	Set the class name.
`set_descriptor`(descriptor)
`set_target_label`(descriptor)	Set the target column into the dataset descriptor
`update_descriptor`([categorial_columns, ...])	it creates the dataset descriptor dictionary

classmethod from_csv(filename: str, class_name: Optional[str] = None, dropna: bool = True)[source]: Read a comma-separated values (csv) file into Dataset object. :param [str] filename: :param class_name: optional :return:

classmethod from_dict(data: dict, class_name: Optional[str] = None)[source]: From dicts of Series, arrays, or dicts. :param [dict] data: :param class_name: optional :return:

get_class_values()[source]: Provides the class_name :return:

get_feature_name(index)[source]: Get the feature name by index :param index: :return: the name of the corresponding feature

set_class_name(class_name: str)[source]: Set the class name. Only the column name string :param [str] class_name: :return:

set_target_label(descriptor)[source]

Set the target column into the dataset descriptor

Parameters:: descriptor –
Returns:

update_descriptor(categorial_columns: Optional[list] = None, ordinal_columns: Optional[list] = None)[source]: it creates the dataset descriptor dictionary