CostSensitive

This is the documentation page for the python package costsensitive. For more details, see the project’s GitHub page:

https://www.github.com/david-cortes/costsensitive/

Installation

Package is available on PyPI, can be installed with

pip install costsensitive

Documentation

class costsensitive.CostProportionateClassifier(base_classifier, n_samples=10, extra_rej_const=0.1, njobs=-1)

Bases: object

Cost-Proportionate Rejection Sampling

Turns a binary classifier with no native sample weighting method into a binary classifier that supports sample weights.

Parameters:
  • base_classifier (object) –

    Binary classifier used for predicting in each sample. Must have:
    • A fit method of the form ‘base_classifier.fit(X, y)’.
    • A predict method.
  • n_samples (int) – Number of samples taken. One classifier is fit per sample.

  • njobs (int) – Number of parallel jobs to run. If it’s a negative number, will take the maximum available number of CPU cores.

Variables:
  • n_samples (int) – Number of samples taken. One classifier is fit per sample.
  • classifiers (list of objects) – Classifier that was fit to each sample.
  • base_classifier (object) – Unfitted base classifier that was originally passed.
  • extra_rej_const (float) – Extra rejection constant used for sampling (see ‘fit’ method).

References

[1] Beygelzimer, A., Langford, J., & Zadrozny, B. (2008).
Machine learning techniques-reductions between prediction quality metrics.
decision_function(X, aggregation='raw')

Calculate how preferred is positive class according to classifiers

Note

If passing aggregation = ‘raw’, it will output the proportion of the classifiers that voted for the positive class. If passing aggregation = ‘weighted’, it will output the average predicted probability for the positive class for each classifier.

Calculating it with aggregation = ‘weighted’ requires the base classifier to have a ‘predict_proba’ method.

Parameters:
  • X (array (n_samples, n_features):) – Observations for which to determine class likelihood.
  • aggregation (str, either ‘raw’ or ‘weighted’) – How to compute the ‘goodness’ of the positive class (see Note)
Returns:

pred – Score for the positive class (see Note)

Return type:

array (n_samples,)

fit(X, y, sample_weight=None)

Fit a binary classifier with sample weights to data.

Note

Examples at each sample are accepted with probability = weight/Z, where Z = max(weight) + extra_rej_const. Larger values for extra_rej_const ensure that no example gets selected in every single sample, but results in smaller sample sizes as more examples are rejected.

Parameters:
  • X (array (n_samples, n_features)) – Data on which to fit the model.
  • y (array (n_samples,) or (n_samples, 1)) – Class of each observation.
  • sample_weight (array (n_samples,) or (n_samples, 1)) – Weights indicating how important is each observation in the loss function.
predict(X, aggregation='raw')

Predict the class of an observation

Note

If passing aggregation = ‘raw’, it will output the class that most classifiers outputted, breaking ties by predicting the positive class. If passing aggregation = ‘weighted’, it will weight each vote from a classifier according to the probabilities predicted.

Predicting with aggregation = ‘weighted’ requires the base classifier to have a ‘predict_proba’ method.

Parameters:
  • X (array (n_samples, n_features):) – Observations for which to predict their class.
  • aggregation (str, either ‘raw’ or ‘weighted’) – How to compute the ‘goodness’ of the positive class (see Note)
Returns:

pred – Predicted class for each observation.

Return type:

array (n_samples,)

class costsensitive.FilterTree(base_classifier, njobs=-1)

Bases: object

Filter-Tree for Cost-Sensitive Multi-Class classification

Parameters:
  • base_classifier (object) –

    Base binary classification algorithm. Must have:
    • A fit method of the form ‘base_classifier.fit(X, y, sample_weights = w)’.
    • A predict method.
  • njobs (int) – Number of parallel jobs to run. If it’s a negative number, will take the maximum available number of CPU cores. Parallelization is only for predictions, not for training.

Variables:
  • nclasses (int) – Number of classes on the data in which it was fit.
  • classifiers (list of objects) – Classifier that compares each two classes belonging to a node.
  • tree (object) – Binary tree with attributes childs and parents. Non-negative numbers for children indicate non-terminal nodes, while negative and zero indicates a class (terminal node). Root is the node zero.
  • base_classifier (object) – Unfitted base regressor that was originally passed.

References

[1] Beygelzimer, A., Langford, J., & Ravikumar, P. (2007).
Multiclass classification with filter trees.
fit(X, C)

Fit a filter tree classifier

Note

Shifting the order of the classes within the cost array will produce different results, as it will build a different binary tree comparing different classes at each node.

Parameters:
  • X (array (n_samples, n_features)) – The data on which to fit a cost-sensitive classifier.
  • C (array (n_samples, n_classes)) – The cost of predicting each label for each observation (more means worse).
predict(X)

Predict the less costly class for a given observation

Note

The implementation here happens in a Python loop rather than in some NumPy array operations, thus it will be slower than the other algorithms here, even though in theory it implies fewer comparisons.

Parameters:
  • X (array (n_samples, n_features)) – Data for which to predict minimum cost label.
  • method (str, either ‘most-wins’ or ‘goodness’:) – How to decide the best label (see Note)
Returns:

y_hat – Label with expected minimum cost for each observation.

Return type:

array (n_samples,)

class costsensitive.RegressionOneVsRest(base_regressor, njobs=-1)

Bases: object

Regression One-Vs-Rest

Fits one regressor trying to predict the cost of each class. Predictions are the class with the minimum predicted cost across regressors.

Parameters:
  • base_regressor (object) –

    Regressor to be used for the sub-problems. Must have:
    • A fit method of the form ‘base_classifier.fit(X, y)’.
    • A predict method.
  • njobs (int) – Number of parallel jobs to run. If it’s a negative number, will take the maximum available number of CPU cores.

Variables:
  • nclasses (int) – Number of classes on the data in which it was fit.
  • regressors (list of objects) – Regressor that predicts the cost of each class.
  • base_regressor (object) – Unfitted base regressor that was originally passed.

References

[1] Beygelzimer, A., Langford, J., & Zadrozny, B. (2008).
Machine learning techniques-reductions between prediction quality metrics.
decision_function(X, apply_softmax=True)

Get cost estimates for each observation

Note

If called with apply_softmax = False, this will output the predicted COST rather than the ‘goodness’ - meaning, more is worse.

If called with apply_softmax = True, it will output one minus the softmax on the costs, producing a distribution over the choices summing up to 1 where more is better.

Parameters:
  • X (array (n_samples, n_features)) – Data for which to predict the cost of each label.
  • apply_softmax (bool) – Whether to apply a softmax transform to the costs (see Note).
Returns:

pred – Either predicted cost or a distribution of ‘goodness’ over the choices, according to the apply_softmax argument.

Return type:

array (n_samples, n_classes)

fit(X, C)

Fit one regressor per class

Parameters:
  • X (array (n_samples, n_features)) – The data on which to fit a cost-sensitive classifier.
  • C (array (n_samples, n_classes)) – The cost of predicting each label for each observation (more means worse).
predict(X)

Predict the less costly class for a given observation

Parameters:X (array (n_samples, n_features)) – Data for which to predict minimum cost labels.
Returns:y_hat – Label with expected minimum cost for each observation.
Return type:array (n_samples,)
class costsensitive.WeightedAllPairs(base_classifier, weigh_by_cost_diff=True, njobs=-1)

Bases: object

Weighted All-Pairs for Cost-Sensitive Classification

Note

This implementation also offers the option of weighting each observation in a pairwise comparison according to the absolute difference in costs between the two labels. Even though such a method might not enjoy theoretical bounds on its regret or error, in practice, it can produce better results than the weighting schema proposed in [1] and [2]

Parameters:
  • base_classifier (object) –

    Base binary classification algorithm. Must have:
    • A fit method of the form ‘base_classifier.fit(X, y, sample_weights = w)’.
    • A predict method.
  • weight_simple_diff (bool) – Whether to weight each sub-problem according to the absolute difference in costs between labels, or according to the formula described in [1] (See Note)

  • njobs (int) – Number of parallel jobs to run. If it’s a negative number, will take the maximum available number of CPU cores. Note that making predictions with multiple jobs will require a lot more memory. Can also be set after the object has already been initialized.

Variables:
  • nclasses (int) – Number of classes on the data in which it was fit.
  • classifiers (list of objects) – Classifier that compares each two classes. Classes i and j out of n classes, with i<j, are compared by the classifier at index i*(n-(i+1)/2)+j-i-1.
  • weight_simple_diff (bool) – Whether each sub-problem was weighted according to the absolute difference in costs between labels, or according to the formula described in [1]
  • base_classifier (object) – Unfitted base regressor that was originally passed.

References

[1] Beygelzimer, A., Dani, V., Hayes, T., Langford, J., & Zadrozny, B. (2005)
Error limiting reductions between classification tasks.
[2] Beygelzimer, A., Langford, J., & Zadrozny, B. (2008).
Machine learning techniques-reductions between prediction quality metrics.
decision_function(X, method='most-wins')

Calculate a ‘goodness’ distribution over labels

Note

Predictions can be calculated either by counting which class wins the most pairwise comparisons (as in [1] and [2]), or - for classifiers with a ‘predict_proba’ method - by taking into account also the margins of the prediction difference for one class over the other for each comparison.

If passing method = ‘most-wins’, this ‘decision_function’ will output the proportion of comparisons that each class won. If passing method = ‘goodness’, it sums the outputs from ‘predict_proba’ from each pairwise comparison and divides it by the number of comparisons.

Using method = ‘goodness’ requires the base classifier to have a ‘predict_proba’ method.

Parameters:
  • X (array (n_samples, n_features)) – Data for which to predict the cost of each label.
  • method (str, either ‘most-wins’ or ‘goodness’:) – How to decide the best label (see Note)
Returns:

pred – A goodness score (more is better) for each label and observation. If passing method=’most-wins’, it counts the proportion of comparisons that each class won. If passing method=’goodness’, it sums the outputs from ‘predict_proba’ from each pairwise comparison and divides it by the number of comparisons.

Return type:

array (n_samples, n_classes)

References

[1] Beygelzimer, A., Dani, V., Hayes, T., Langford, J., & Zadrozny, B. (2005)
Error limiting reductions between classification tasks.
[2] Beygelzimer, A., Langford, J., & Zadrozny, B. (2008).
Machine learning techniques-reductions between prediction quality metrics.
fit(X, C)

Fit one classifier comparing each pair of classes

Parameters:
  • X (array (n_samples, n_features)) – The data on which to fit a cost-sensitive classifier.
  • C (array (n_samples, n_classes)) – The cost of predicting each label for each observation (more means worse).
predict(X, method='most-wins')

Predict the less costly class for a given observation

Note

Predictions can be calculated either by counting which class wins the most pairwise comparisons (as in [1] and [2]), or - for classifiers with a ‘predict_proba’ method - by taking into account also the margins of the prediction difference for one class over the other for each comparison.

Using method = ‘goodness’ requires the base classifier to have a ‘predict_proba’ method.

Parameters:
  • X (array (n_samples, n_features)) – Data for which to predict minimum cost label.
  • method (str, either ‘most-wins’ or ‘goodness’:) – How to decide the best label (see Note)
Returns:

y_hat – Label with expected minimum cost for each observation.

Return type:

array (n_samples,)

References

[1] Beygelzimer, A., Dani, V., Hayes, T., Langford, J., & Zadrozny, B. (2005)
Error limiting reductions between classification tasks.
[2] Beygelzimer, A., Langford, J., & Zadrozny, B. (2008).
Machine learning techniques-reductions between prediction quality metrics.
class costsensitive.WeightedOneVsRest(base_classifier, weight_simple_diff=False, njobs=-1)

Bases: object

Weighted One-Vs-Rest Cost-Sensitive Classification

Note

This will convert the problem into one sub-problem per class.

If passing weight_simple_diff=True, the observations for each subproblem will be weighted according to the difference between the cost of the label being predicted and the minimum cost of any other label.

If passing weight_simple_diff=False, they will be weighted according to the formula described in [1], originally meant for the All-Pairs variant.

The predictions are taken to be the maximum value of the decision functions of each One-Vs-Rest classifier. If the classifier has no method ‘decision_function’ or ‘predict_proba’, it will output the class that whatever classifier considered correct, breaking ties by choosing the smallest index.

Parameters:
  • base_classifier (object) –

    Base binary classification algorithm. Must have:
    • A fit method of the form ‘base_classifier.fit(X, y, sample_weight = w)’.
    • A predict method.
  • weight_simple_diff (bool) – Whether to weight each sub-problem according to the absolute difference in costs between labels, or according to the formula described in [1] (See Note)

  • njobs (int) – Number of parallel jobs to run. If it’s a negative number, will take the maximum available number of CPU cores.

Variables:
  • nclasses (int) – Number of classes on the data in which it was fit.
  • classifiers (list of objects) – Classifier that predicts each class.
  • weight_simple_diff (bool) – Whether each sub-problem was weighted according to the absolute difference in costs between labels, or according to the formula described in [1].
  • base_classifier (object) – Unfitted base regressor that was originally passed.

References

[1] Beygelzimer, A., Dani, V., Hayes, T., Langford, J., & Zadrozny, B. (2005, August).
Error limiting reductions between classification tasks.
decision_function(X)

Calculate a ‘goodness’ distribution over labels

Parameters:X (array (n_samples, n_features)) – Data for which to predict the cost of each label.
Returns:pred – A goodness score (more is better) for each label and observation. If passing apply_softmax=True, these are standardized to sum up to 1 (per row).
Return type:array (n_samples, n_classes)
decision_function_predict(c, preds, X)
fit(X, C)

Fit one weighted classifier per class

Parameters:
  • X (array (n_samples, n_features)) – The data on which to fit a cost-sensitive classifier.
  • C (array (n_samples, n_classes)) – The cost of predicting each label for each observation (more means worse).
predict(X)

Predict the less costly class for a given observation

Parameters:X (array (n_samples, n_features)) – Data for which to predict minimum cost label.
Returns:y_hat – Label with expected minimum cost for each observation.
Return type:array (n_samples,)

Indices and tables