sklearn.preprocessing.MultiLabelBinarizer

class sklearn.preprocessing.MultiLabelBinarizer(classes=None, sparse_output=False)[source]

Transform between iterable of iterables and a multilabel format

Although a list of sets or tuples is a very intuitive format for multilabel data, it is unwieldy to process. This transformer converts between this intuitive format and the supported multilabel format: a (samples x classes) binary matrix indicating the presence of a class label.

Parameters
classesarray-like of shape [n_classes] (optional)

Indicates an ordering for the class labels. All entries should be unique (cannot contain duplicate classes).

sparse_outputboolean (default: False),

Set to true if output binary array is desired in CSR sparse format

Attributes
classes_array of labels

A copy of the classes parameter where provided, or otherwise, the sorted set of classes found when fitting.

See also

sklearn.preprocessing.OneHotEncoder

encode categorical features using a one-hot aka one-of-K scheme.

Examples

>>> from sklearn.preprocessing import MultiLabelBinarizer
>>> mlb = MultiLabelBinarizer()
>>> mlb.fit_transform([(1, 2), (3,)])
array([[1, 1, 0],
       [0, 0, 1]])
>>> mlb.classes_
array([1, 2, 3])
>>> mlb.fit_transform([{'sci-fi', 'thriller'}, {'comedy'}])
array([[0, 1, 1],
       [1, 0, 0]])
>>> list(mlb.classes_)
['comedy', 'sci-fi', 'thriller']

A common mistake is to pass in a list, which leads to the following issue:

>>> mlb = MultiLabelBinarizer()
>>> mlb.fit(['sci-fi', 'thriller', 'comedy'])
MultiLabelBinarizer()
>>> mlb.classes_
array(['-', 'c', 'd', 'e', 'f', 'h', 'i', 'l', 'm', 'o', 'r', 's', 't',
    'y'], dtype=object)

To correct this, the list of labels should be passed in as:

>>> mlb = MultiLabelBinarizer()
>>> mlb.fit([['sci-fi', 'thriller', 'comedy']])
MultiLabelBinarizer()
>>> mlb.classes_
array(['comedy', 'sci-fi', 'thriller'], dtype=object)

Methods

fit(self, y)

Fit the label sets binarizer, storing classes_

fit_transform(self, y)

Fit the label sets binarizer and transform the given label sets

get_params(self[, deep])

Get parameters for this estimator.

inverse_transform(self, yt)

Transform the given indicator matrix into label sets

set_params(self, \*\*params)

Set the parameters of this estimator.

transform(self, y)

Transform the given label sets

__init__(self, classes=None, sparse_output=False)[source]

Initialize self. See help(type(self)) for accurate signature.

fit(self, y)[source]

Fit the label sets binarizer, storing classes_

Parameters
yiterable of iterables

A set of labels (any orderable and hashable object) for each sample. If the classes parameter is set, y will not be iterated.

Returns
selfreturns this MultiLabelBinarizer instance
fit_transform(self, y)[source]

Fit the label sets binarizer and transform the given label sets

Parameters
yiterable of iterables

A set of labels (any orderable and hashable object) for each sample. If the classes parameter is set, y will not be iterated.

Returns
y_indicatorarray or CSR matrix, shape (n_samples, n_classes)

A matrix such that y_indicator[i, j] = 1 iff classes_[j] is in y[i], and 0 otherwise.

get_params(self, deep=True)[source]

Get parameters for this estimator.

Parameters
deepboolean, optional

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns
paramsmapping of string to any

Parameter names mapped to their values.

inverse_transform(self, yt)[source]

Transform the given indicator matrix into label sets

Parameters
ytarray or sparse matrix of shape (n_samples, n_classes)

A matrix containing only 1s ands 0s.

Returns
ylist of tuples

The set of labels for each sample such that y[i] consists of classes_[j] for each yt[i, j] == 1.

set_params(self, **params)[source]

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as pipelines). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Returns
self
transform(self, y)[source]

Transform the given label sets

Parameters
yiterable of iterables

A set of labels (any orderable and hashable object) for each sample. If the classes parameter is set, y will not be iterated.

Returns
y_indicatorarray or CSR matrix, shape (n_samples, n_classes)

A matrix such that y_indicator[i, j] = 1 iff classes_[j] is in y[i], and 0 otherwise.