Bases: object
Represents a tree node and applies the CART algorithm to build decision and regression trees.
The constructors are __init__ and fromWholeDataset.
Tree-building is initiated by calling splitUntil(condition), where condition(node, depth) is a user-supplied function that takes a node (titus.producer.cart.TreeNode) and depth (integer) and returns bool (True: continue splitting; False: stop splitting).
Constructor for a tree from a dataset of regressors (that which we split) and a predictand (that which we try to purify in the leaves).
Parameters: |
|
---|
Returns True if it is possible to split the predictand; False otherwise.
Split a categorical predictor in such a way that maximizes entropic gain inside and outside of a subset of predictor values.
Parameters: |
|
---|---|
Return type: | (number, list of strings) |
Returns: | (best gain term, best combination of regressor categories) |
Split a categorical predictor in such a way that maximizes n-times-variance gain inside and outside of a subset of predictor values.
Parameters: |
|
---|---|
Return type: | (number, list of strings) |
Returns: | (best gain term, best combination of regressor categories) |
Constructor for a tree from a dataset that includes the predictand (that which we try to purify in the leaves) as one of its fields.
Parameters: |
|
---|---|
Return type: | titus.producer.cart.TreeNode |
Returns: | an unsplit tree |
Split a numerical predictor in such a way that maximizes entropic gain above and below the threshold of the split.
Split a numerical predictor in such a way that maximizes n-times-variance gain above and below the threshold of the split.
Parameters: | field (titus.producer.cart.Dataset.Field) – the field to consider when calculating the n-times variance gain term |
---|---|
Return type: | (number, number) |
Returns: | (best gain term, best cut value) |
Create a PFA document to score with this tree.
Parameters: |
|
---|---|
Return type: | Pythonized JSON |
Returns: | complete PFA document for running tree classification or regression |
Create an Avro schema representing the score type.
Return type: | Pythonized JSON |
---|---|
Returns: | score type (part of the pass and fail unions of the PFA TreeNode) |
Create a PFA type schema representing this tree.
Parameters: |
|
---|---|
Return type: | Pythonized JSON |
Returns: | Avro schema for the tree node type |
Create a PFA data structure representing this tree.
Parameters: |
|
---|---|
Return type: | Pythonized JSON |
Returns: | PFA data structure for the tree, to be inserted into the cell or pool’s init field |
Create an Avro schema representing the comparison value type.
Parameters: | dataType (Pythonized JSON) – Avro record schema of the input data |
---|---|
Return type: | Pythonized JSON |
Returns: | value type (value field of the PFA TreeNode) |
Returns the best score at this TreeNode, which might or might not be a leaf.
Convenience function for building up a tree until each leaf has only one unique value. Calls splitUntil.
Convenience function for building up trees until each leaf has only one unique value or the depth reaches maxDepth. Calls splitUntil.
Parameters: | maxDepth (positive integer) – maximum allowed depth of the tree |
---|
Compute an optimized split in one field, adding two new TreeNodes below this one.
If the predictand is numerical (numbers.Real), the split minimizes entropy; if categorical (basestring), it minimizes n-times-variance.
Performs a recursive tree-split, calling the user-supplied condition(node, depth) at each new node.
If the predictand is numerical (numbers.Real), the node has attributes: datasetSize, predictandUnique, nTimesVariance, and gain.
If the predictand is categorical (basestring), the node has attributes: datasetSize, predictandDistribution, entropy, and gain.
Splits are performed in-place, changing this TreeNode.
Parameters: |
|
---|