The qcut feature is a method of pandas that is used to bin numerical data into quantiles. To integrate this feature into polars, we need to add a similar method to the polars library. Here is one way to do it:
qcut
in the DataFrame
class in polars.def qcut(self, column: str, q: int, labels=None, duplicates='raise'):
"""
Bin values based on quantiles.
Parameters
----------
column : str
Name of the column to be binned.
q : int
Number of quantiles to be created.
labels : list, optional
Labels for the created bins.
Length must match the number of quantiles.
duplicates : {'raise', 'drop'}, optional
If 'raise' will raise an exception if there are identical edges in the quantiles.
If 'drop' will drop duplicates in the bucketing edges.
Returns
-------
polars.DataFrame
A new DataFrame with specified column transformed by the quantiles.
Examples
--------
>>> df = pl.DataFrame({
>>> 'A': [0.1, 0.2, 0.3, 0.4, 0.5]
>>> })
>>> df.qcut(column='A', q=3)
A
0 [0.09999999999999999, 0.2]
1 (0.2, 0.3]
2 (0.2, 0.3]
3 (0.3, 0.4]
4 (0.4, 0.5]
"""
...
ndarray::quantile
from numpy and Series::map
from polars.import numpy as np
def qcut(self, column: str, q: int, labels=None, duplicates='raise'):
"""
Bin values based on quantiles.
Parameters
----------
column : str
Name of the column to be binned.
q : int
Number of quantiles to be created.
labels : list, optional
Labels for the created bins.
Length must match the number of quantiles.
duplicates : {'raise', 'drop'}, optional
If 'raise' will raise an exception if there are identical edges in the quantiles.
If 'drop' will drop duplicates in the bucketing edges.
Returns
-------
polars.DataFrame
A new DataFrame with specified column transformed by the quantiles.
Examples
--------
>>> df = pl.DataFrame({
>>> 'A': [0.1, 0.2, 0.3, 0.4, 0.5]
>>> })
>>> df.qcut(column='A', q=3)
A
0 [0.09999999999999999, 0.2]
1 (0.2, 0.3]
2 (0.2, 0.3]
3 (0.3, 0.4]
4 (0.4, 0.5]
"""
s = self[column]
edges = np.linspace(0, 1, q+1).tolist()
quantiles = s.quantile(edges, interpolation='midpoint', duplicates=duplicates)
quantiles = quantiles.drop_duplicates(ignore_index=True)
labels = labels or range(1, len(quantiles)+1)
result = s.map(lambda x: pd.cut([x], bins=quantiles, labels=labels, include_lowest=True)[0])
return self.assign(**{f'{column}_qcut': result})
import polars as pl
df = pl.DataFrame({
'A': [0.1, 0.2, 0.3, 0.4, 0.5]
})
print(df.qcut(column='A', q=3))
This should output:
A_qcut
0 ['0.1', '0.2']
1 2
2 2
3 3
4 4
Note: This is just one possible implementation of the qcut
feature in polars. The actual implementation may differ based on the specific needs and requirements of the project.
Asked: 2021-06-04 11:00:00 +0000
Seen: 1 times
Last updated: Mar 10 '22