lore_sa.dataset.TabularDataset
- class lore_sa.dataset.TabularDataset(data: pandas.DataFrame, class_name: Optional[str] = None, categorial_columns: Optional[list] = None, ordinal_columns: Optional[list] = None)[source]
It provides an interface to handle datasets, including some essential information on the structure and semantic of the dataset.
- df
dataframe containing the whole dataset
- Type:
pandas.DataFrame
- descriptor
it contains the essential informationregarding each feature. Format:
>>> {'numeric': {'feature name' : >>> { >>> 'index' : <index of feature column>, >>> 'min' : <min value>, >>> 'max' : <max value>, >>> 'mean': <mean value>, >>> 'std': <standard deviation>, >>> 'median': <median value>, >>> 'q1': <first quartile of the distribution>, >>> 'q3': <third quartile of the distribution, >>> }, >>> ..., >>> ..., >>> }, >>> 'categorical: {'feature name': >>> { >>> 'index' : <index of feature column>, >>> 'distinct_values' : <distinct categorical values>, >>> 'value_counts' : {'distinct value' : <elements count>, >>> ... } >>> } >>> }, >>> 'ordinal: {'feature name': >>> { >>> 'index' : <index of feature column>, >>> 'distinct_values' : <distinct categorical values>, >>> 'value_counts' : {'distinct value' : <elements count>, >>> ... } >>> } >>> }, >>> ... >>> ... >>> ... >>> }
- Type:
dict
- __init__(data: pandas.DataFrame, class_name: Optional[str] = None, categorial_columns: Optional[list] = None, ordinal_columns: Optional[list] = None)[source]
Methods
__init__(data[, class_name, ...])from_csv(filename[, class_name, dropna])Read a comma-separated values (csv) file into Dataset object.
from_dict(data[, class_name])From dicts of Series, arrays, or dicts.
get_categorical_columns()Provides the class_name :return:
get_feature_name(index)Get the feature name by index :param index: :return: the name of the corresponding feature
get_feature_names()get_features_names()get_number_of_features()get_numeric_columns()set_class_name(class_name)Set the class name.
set_descriptor(descriptor)set_target_label(descriptor)Set the target column into the dataset descriptor
update_descriptor([categorial_columns, ...])it creates the dataset descriptor dictionary
- classmethod from_csv(filename: str, class_name: Optional[str] = None, dropna: bool = True)[source]
Read a comma-separated values (csv) file into Dataset object. :param [str] filename: :param class_name: optional :return:
- classmethod from_dict(data: dict, class_name: Optional[str] = None)[source]
From dicts of Series, arrays, or dicts. :param [dict] data: :param class_name: optional :return:
- get_feature_name(index)[source]
Get the feature name by index :param index: :return: the name of the corresponding feature
- set_class_name(class_name: str)[source]
Set the class name. Only the column name string :param [str] class_name: :return: