titus.producer.cart.Dataset

class titus.producer.cart.Dataset(fields, names)[source]

Bases: object

Canonical format for providing a dataset to the tree-builder.

Constructors are __init__ and fromIterable.

class Field(t)

Bases: object

Represents a field of the dataset; usually created by a Dataset constructor.

Dataset.Field objects may be in one of two states: Numpy and Python. The Dataset constructors produce Dataset.Fields in their Numpy representation, which is required by the tree-builder. In the Numpy representation, categorical string data are represented as integers from 0 to N-1, where N is the number of unique input strings, with each distinct integer representing a distinct input string. Strings and integers can be converted through the intToStr and strToInt dictionaries, or by converting the whole array into a Pythonic form with the toPython method.

__init__(t)
add(v)
select(selection)

Creates a new Dataset.Field from this one by applying a boolean-valued Numpy array of the same length.

The new Dataset.Field is independent of the old one (this is a purely functional method).

Assumes that the Dataset.Field is currently in a Numpy representation.

Parameters:selection (1-d Numpy array of bool) – data points to select
Return type:1-d Numpy array
Returns:subset of the original self.data
toNumpy()

Changes this field into a Numpy representation in-place (destructively replaces the old representation).

toPython()

Changes this field into a Python representation in-place (destructively replaces the old representation).

Dataset.__init__(fields, names)
classmethod Dataset.fromIterable(iterable, limit=None, names=None)

Constructor for Dataset that takes a Python iterable (rows) of iterables (columns).

Each row must have the same number of fields with the same types (numbers.Real or basestring).

Parameters:
  • iterable (Python iterable) – input dataset
  • limit (positive integer or None) – maximum number of input rows
  • names (list of strings or None) – names of the fields; if not provided, names like var0, var1, etc. will be generated.
Return type:

titus.producer.cart.Dataset

Returns:

a dataset