Bases: object
Represents a tree node and applies the CART algorithm to build decision and regression trees.
The constructors are __init__ and fromWholeDataset.
Tree-building is initiated by calling splitUntil(condition), where condition(node, depth) is a user-supplied function that takes a node (titus.producer.cart.TreeNode) and depth (integer) and returns bool (True: continue splitting; False: stop splitting).
Constructor for a tree from a dataset of regressors (that which we split) and a predictand (that which we try to purify in the leaves).
Parameters: |
|
---|
Returns True if it is possible to split the predictand; False otherwise.
Split a categorical predictor in such a way that maximizes entropic gain inside and outside of a subset of predictor values.
Parameters: |
|
---|---|
Return type: | (number, list of strings) |
Returns: | (best gain term, best combination of regressor categories) |
Split a categorical predictor in such a way that maximizes n-times-variance gain inside and outside of a subset of predictor values.
Parameters: |
|
---|---|
Return type: | (number, list of strings) |
Returns: | (best gain term, best combination of regressor categories) |
Constructor for a tree from a dataset that includes the predictand (that which we try to purify in the leaves) as one of its fields.
Parameters: |
|
---|---|
Return type: | titus.producer.cart.TreeNode |
Returns: | an unsplit tree |
Split a numerical predictor in such a way that maximizes entropic gain above and below the threshold of the split.
Split a numerical predictor in such a way that maximizes n-times-variance gain above and below the threshold of the split.
Parameters: | field (titus.producer.cart.Dataset.Field) – the field to consider when calculating the n-times variance gain term |
---|---|
Return type: | (number, number) |
Returns: | (best gain term, best cut value) |
Create a PFA document to score with this tree.
Parameters: |
|
---|---|
Return type: | Pythonized JSON |
Returns: | complete PFA document for running tree classification or regression |
Create an Avro schema representing the score type.
Return type: | Pythonized JSON |
---|---|
Returns: | score type (part of the pass and fail unions of the PFA TreeNode) |
Create a PFA type schema representing this tree.
Parameters: |
|
---|---|
Return type: | Pythonized JSON |
Returns: | Avro schema for the tree node type |
Create a PFA data structure representing this tree.
Parameters: |
|
---|---|
Return type: | Pythonized JSON |
Returns: | PFA data structure for the tree, to be inserted into the cell or pool’s init field |
Create an Avro schema representing the comparison value type.
Parameters: | dataType (Pythonized JSON) – Avro record schema of the input data |
---|---|
Return type: | Pythonized JSON |
Returns: | value type (value field of the PFA TreeNode) |
Returns the best score at this TreeNode, which might or might not be a leaf.
Convenience function for building up a tree until each leaf has only one unique value. Calls splitUntil.
Return the name of the input field at this split or None if this is a leaf node.
Convenience function for building up trees until each leaf has only one unique value or the depth reaches maxDepth. Calls splitUntil.
Parameters: | maxDepth (positive integer) – maximum allowed depth of the tree |
---|
Compute an optimized split in one field, adding two new TreeNodes below this one.
If the predictand is numerical (numbers.Real), the split minimizes entropy; if categorical (basestring), it minimizes n-times-variance.
Performs a recursive tree-split, calling the user-supplied condition(node, depth) at each new node.
If the predictand is numerical (numbers.Real), the node has attributes: datasetSize, predictandUnique, nTimesVariance, and gain.
If the predictand is categorical (basestring), the node has attributes: datasetSize, predictandDistribution, entropy, and gain.
Splits are performed in-place, changing this TreeNode.
Parameters: |
|
---|
Return a generator that walks over all leaves in the tree, yielding a 2-tuple of node and depth.
Return type: | generator of (titus.producer.cart.TreeNode, int) |
---|---|
Returns: | generator of (node, depth) |
Return a generator that walks over all nodes in the tree, yielding a 2-tuple of node and depth.
Parameters: | |
---|---|
Return type: | generator of (titus.producer.cart.TreeNode, int) |
Returns: | generator of (node, depth) |