CostSensitive¶
This is the documentation page for the python package costsensitive. For more details, see the project’s GitHub page:
Documentation¶

class
costsensitive.
CostProportionateClassifier
(base_classifier, n_samples=10, extra_rej_const=0.1, njobs=1)¶ Bases:
object
CostProportionate Rejection Sampling
Turns a binary classifier with no native sample weighting method into a binary classifier that supports sample weights.
Parameters: base_classifier (object) –
 Binary classifier used for predicting in each sample. Must have:
 A fit method of the form ‘base_classifier.fit(X, y)’.
 A predict method.
n_samples (int) – Number of samples taken. One classifier is fit per sample.
njobs (int) – Number of parallel jobs to run. If it’s a negative number, will take the maximum available number of CPU cores.
Variables:  n_samples (int) – Number of samples taken. One classifier is fit per sample.
 classifiers (list of objects) – Classifier that was fit to each sample.
 base_classifier (object) – Unfitted base classifier that was originally passed.
 extra_rej_const (float) – Extra rejection constant used for sampling (see ‘fit’ method).
References
 [1] Beygelzimer, A., Langford, J., & Zadrozny, B. (2008).
 Machine learning techniquesreductions between prediction quality metrics.

decision_function
(X, aggregation='raw')¶ Calculate how preferred is positive class according to classifiers
Note
If passing aggregation = ‘raw’, it will output the proportion of the classifiers that voted for the positive class. If passing aggregation = ‘weighted’, it will output the average predicted probability for the positive class for each classifier.
Calculating it with aggregation = ‘weighted’ requires the base classifier to have a ‘predict_proba’ method.
Parameters:  X (array (n_samples, n_features):) – Observations for which to determine class likelihood.
 aggregation (str, either ‘raw’ or ‘weighted’) – How to compute the ‘goodness’ of the positive class (see Note)
Returns: pred – Score for the positive class (see Note)
Return type: array (n_samples,)

fit
(X, y, sample_weight=None)¶ Fit a binary classifier with sample weights to data.
Note
Examples at each sample are accepted with probability = weight/Z, where Z = max(weight) + extra_rej_const. Larger values for extra_rej_const ensure that no example gets selected in every single sample, but results in smaller sample sizes as more examples are rejected.
Parameters:  X (array (n_samples, n_features)) – Data on which to fit the model.
 y (array (n_samples,) or (n_samples, 1)) – Class of each observation.
 sample_weight (array (n_samples,) or (n_samples, 1)) – Weights indicating how important is each observation in the loss function.

predict
(X, aggregation='raw')¶ Predict the class of an observation
Note
If passing aggregation = ‘raw’, it will output the class that most classifiers outputted, breaking ties by predicting the positive class. If passing aggregation = ‘weighted’, it will weight each vote from a classifier according to the probabilities predicted.
Predicting with aggregation = ‘weighted’ requires the base classifier to have a ‘predict_proba’ method.
Parameters:  X (array (n_samples, n_features):) – Observations for which to predict their class.
 aggregation (str, either ‘raw’ or ‘weighted’) – How to compute the ‘goodness’ of the positive class (see Note)
Returns: pred – Predicted class for each observation.
Return type: array (n_samples,)

class
costsensitive.
FilterTree
(base_classifier, njobs=1)¶ Bases:
object
FilterTree for CostSensitive MultiClass classification
Parameters: base_classifier (object) –
 Base binary classification algorithm. Must have:
 A fit method of the form ‘base_classifier.fit(X, y, sample_weights = w)’.
 A predict method.
njobs (int) – Number of parallel jobs to run. If it’s a negative number, will take the maximum available number of CPU cores. Parallelization is only for predictions, not for training.
Variables:  nclasses (int) – Number of classes on the data in which it was fit.
 classifiers (list of objects) – Classifier that compares each two classes belonging to a node.
 tree (object) – Binary tree with attributes childs and parents. Nonnegative numbers for children indicate nonterminal nodes, while negative and zero indicates a class (terminal node). Root is the node zero.
 base_classifier (object) – Unfitted base regressor that was originally passed.
References
 [1] Beygelzimer, A., Langford, J., & Ravikumar, P. (2007).
 Multiclass classification with filter trees.

fit
(X, C)¶ Fit a filter tree classifier
Note
Shifting the order of the classes within the cost array will produce different results, as it will build a different binary tree comparing different classes at each node.
Parameters:  X (array (n_samples, n_features)) – The data on which to fit a costsensitive classifier.
 C (array (n_samples, n_classes)) – The cost of predicting each label for each observation (more means worse).

predict
(X)¶ Predict the less costly class for a given observation
Note
The implementation here happens in a Python loop rather than in some NumPy array operations, thus it will be slower than the other algorithms here, even though in theory it implies fewer comparisons.
Parameters:  X (array (n_samples, n_features)) – Data for which to predict minimum cost label.
 method (str, either ‘mostwins’ or ‘goodness’:) – How to decide the best label (see Note)
Returns: y_hat – Label with expected minimum cost for each observation.
Return type: array (n_samples,)

class
costsensitive.
RegressionOneVsRest
(base_regressor, njobs=1)¶ Bases:
object
Regression OneVsRest
Fits one regressor trying to predict the cost of each class. Predictions are the class with the minimum predicted cost across regressors.
Parameters: base_regressor (object) –
 Regressor to be used for the subproblems. Must have:
 A fit method of the form ‘base_classifier.fit(X, y)’.
 A predict method.
njobs (int) – Number of parallel jobs to run. If it’s a negative number, will take the maximum available number of CPU cores.
Variables:  nclasses (int) – Number of classes on the data in which it was fit.
 regressors (list of objects) – Regressor that predicts the cost of each class.
 base_regressor (object) – Unfitted base regressor that was originally passed.
References
 [1] Beygelzimer, A., Langford, J., & Zadrozny, B. (2008).
 Machine learning techniquesreductions between prediction quality metrics.

decision_function
(X, apply_softmax=True)¶ Get cost estimates for each observation
Note
If called with apply_softmax = False, this will output the predicted COST rather than the ‘goodness’  meaning, more is worse.
If called with apply_softmax = True, it will output one minus the softmax on the costs, producing a distribution over the choices summing up to 1 where more is better.
Parameters:  X (array (n_samples, n_features)) – Data for which to predict the cost of each label.
 apply_softmax (bool) – Whether to apply a softmax transform to the costs (see Note).
Returns: pred – Either predicted cost or a distribution of ‘goodness’ over the choices, according to the apply_softmax argument.
Return type: array (n_samples, n_classes)

fit
(X, C)¶ Fit one regressor per class
Parameters:  X (array (n_samples, n_features)) – The data on which to fit a costsensitive classifier.
 C (array (n_samples, n_classes)) – The cost of predicting each label for each observation (more means worse).

predict
(X)¶ Predict the less costly class for a given observation
Parameters: X (array (n_samples, n_features)) – Data for which to predict minimum cost labels. Returns: y_hat – Label with expected minimum cost for each observation. Return type: array (n_samples,)

class
costsensitive.
WeightedAllPairs
(base_classifier, weigh_by_cost_diff=True, njobs=1)¶ Bases:
object
Weighted AllPairs for CostSensitive Classification
Note
This implementation also offers the option of weighting each observation in a pairwise comparison according to the absolute difference in costs between the two labels. Even though such a method might not enjoy theoretical bounds on its regret or error, in practice, it can produce better results than the weighting schema proposed in [1] and [2]
Parameters: base_classifier (object) –
 Base binary classification algorithm. Must have:
 A fit method of the form ‘base_classifier.fit(X, y, sample_weights = w)’.
 A predict method.
weight_simple_diff (bool) – Whether to weight each subproblem according to the absolute difference in costs between labels, or according to the formula described in [1] (See Note)
njobs (int) – Number of parallel jobs to run. If it’s a negative number, will take the maximum available number of CPU cores. Note that making predictions with multiple jobs will require a lot more memory. Can also be set after the object has already been initialized.
Variables:  nclasses (int) – Number of classes on the data in which it was fit.
 classifiers (list of objects) – Classifier that compares each two classes. Classes i and j out of n classes, with i<j, are compared by the classifier at index i*(n(i+1)/2)+ji1.
 weight_simple_diff (bool) – Whether each subproblem was weighted according to the absolute difference in costs between labels, or according to the formula described in [1]
 base_classifier (object) – Unfitted base regressor that was originally passed.
References
 [1] Beygelzimer, A., Dani, V., Hayes, T., Langford, J., & Zadrozny, B. (2005)
 Error limiting reductions between classification tasks.
 [2] Beygelzimer, A., Langford, J., & Zadrozny, B. (2008).
 Machine learning techniquesreductions between prediction quality metrics.

decision_function
(X, method='mostwins')¶ Calculate a ‘goodness’ distribution over labels
Note
Predictions can be calculated either by counting which class wins the most pairwise comparisons (as in [1] and [2]), or  for classifiers with a ‘predict_proba’ method  by taking into account also the margins of the prediction difference for one class over the other for each comparison.
If passing method = ‘mostwins’, this ‘decision_function’ will output the proportion of comparisons that each class won. If passing method = ‘goodness’, it sums the outputs from ‘predict_proba’ from each pairwise comparison and divides it by the number of comparisons.
Using method = ‘goodness’ requires the base classifier to have a ‘predict_proba’ method.
Parameters:  X (array (n_samples, n_features)) – Data for which to predict the cost of each label.
 method (str, either ‘mostwins’ or ‘goodness’:) – How to decide the best label (see Note)
Returns: pred – A goodness score (more is better) for each label and observation. If passing method=’mostwins’, it counts the proportion of comparisons that each class won. If passing method=’goodness’, it sums the outputs from ‘predict_proba’ from each pairwise comparison and divides it by the number of comparisons.
Return type: array (n_samples, n_classes)
References
 [1] Beygelzimer, A., Dani, V., Hayes, T., Langford, J., & Zadrozny, B. (2005)
 Error limiting reductions between classification tasks.
 [2] Beygelzimer, A., Langford, J., & Zadrozny, B. (2008).
 Machine learning techniquesreductions between prediction quality metrics.

fit
(X, C)¶ Fit one classifier comparing each pair of classes
Parameters:  X (array (n_samples, n_features)) – The data on which to fit a costsensitive classifier.
 C (array (n_samples, n_classes)) – The cost of predicting each label for each observation (more means worse).

predict
(X, method='mostwins')¶ Predict the less costly class for a given observation
Note
Predictions can be calculated either by counting which class wins the most pairwise comparisons (as in [1] and [2]), or  for classifiers with a ‘predict_proba’ method  by taking into account also the margins of the prediction difference for one class over the other for each comparison.
Using method = ‘goodness’ requires the base classifier to have a ‘predict_proba’ method.
Parameters:  X (array (n_samples, n_features)) – Data for which to predict minimum cost label.
 method (str, either ‘mostwins’ or ‘goodness’:) – How to decide the best label (see Note)
Returns: y_hat – Label with expected minimum cost for each observation.
Return type: array (n_samples,)
References
 [1] Beygelzimer, A., Dani, V., Hayes, T., Langford, J., & Zadrozny, B. (2005)
 Error limiting reductions between classification tasks.
 [2] Beygelzimer, A., Langford, J., & Zadrozny, B. (2008).
 Machine learning techniquesreductions between prediction quality metrics.

class
costsensitive.
WeightedOneVsRest
(base_classifier, weight_simple_diff=False, njobs=1)¶ Bases:
object
Weighted OneVsRest CostSensitive Classification
Note
This will convert the problem into one subproblem per class.
If passing weight_simple_diff=True, the observations for each subproblem will be weighted according to the difference between the cost of the label being predicted and the minimum cost of any other label.
If passing weight_simple_diff=False, they will be weighted according to the formula described in [1], originally meant for the AllPairs variant.
The predictions are taken to be the maximum value of the decision functions of each OneVsRest classifier. If the classifier has no method ‘decision_function’ or ‘predict_proba’, it will output the class that whatever classifier considered correct, breaking ties by choosing the smallest index.
Parameters: base_classifier (object) –
 Base binary classification algorithm. Must have:
 A fit method of the form ‘base_classifier.fit(X, y, sample_weight = w)’.
 A predict method.
weight_simple_diff (bool) – Whether to weight each subproblem according to the absolute difference in costs between labels, or according to the formula described in [1] (See Note)
njobs (int) – Number of parallel jobs to run. If it’s a negative number, will take the maximum available number of CPU cores.
Variables:  nclasses (int) – Number of classes on the data in which it was fit.
 classifiers (list of objects) – Classifier that predicts each class.
 weight_simple_diff (bool) – Whether each subproblem was weighted according to the absolute difference in costs between labels, or according to the formula described in [1].
 base_classifier (object) – Unfitted base regressor that was originally passed.
References
 [1] Beygelzimer, A., Dani, V., Hayes, T., Langford, J., & Zadrozny, B. (2005, August).
 Error limiting reductions between classification tasks.

decision_function
(X)¶ Calculate a ‘goodness’ distribution over labels
Parameters: X (array (n_samples, n_features)) – Data for which to predict the cost of each label. Returns: pred – A goodness score (more is better) for each label and observation. If passing apply_softmax=True, these are standardized to sum up to 1 (per row). Return type: array (n_samples, n_classes)

decision_function_predict
(c, preds, X)¶

fit
(X, C)¶ Fit one weighted classifier per class
Parameters:  X (array (n_samples, n_features)) – The data on which to fit a costsensitive classifier.
 C (array (n_samples, n_classes)) – The cost of predicting each label for each observation (more means worse).

predict
(X)¶ Predict the less costly class for a given observation
Parameters: X (array (n_samples, n_features)) – Data for which to predict minimum cost label. Returns: y_hat – Label with expected minimum cost for each observation. Return type: array (n_samples,)