Python 按数据中的值对数据进行分类_Python_Pandas

Python 按数据中的值对数据进行分类

python pandas

Python 按数据中的值对数据进行分类,python,pandas,Python,Pandas,我有一个表的pandas.DataFrame low_bound high_bound name 0 10 'a' 10 20 'b' 20 30 'c' 30 40 'd' 40 50 'e' 我有一个非常长的熊猫系列，其形式如下： value 5.7 30.4 21 35.1 我想给序列的每

我有一个表的

pandas.DataFrame

low_bound   high_bound   name
0           10           'a'
10          20           'b'
20          30           'c'
30          40           'd'
40          50           'e'

我有一个非常长的熊猫系列，其形式如下：

value
5.7
30.4
21
35.1

我想给序列的每个值赋予它相对于low_bound/high_bound/name数据帧的对应名称。以下是我的预期结果：

value         name
5.7           'a'
30.4          'd'
21            'c'
35.1          'd'

实际上，5.7名称是“a”，因为5.7不包括在0和10之间

最有效的代码是什么？我知道我可以通过迭代序列来解决这个问题，但也许有一个更快的向量解正在逃离我

最后请注意，我的边界可以是自定义的，也可以是不规则的。出于示例的考虑，它们在这里是常规的。

熊猫有一个名为

cut

的方法，可以执行您想要的操作：

import pandas as pd

data = [{"low": 0, "high": 10, "name": "a"},
        {"low": 10, "high": 20, "name": "b"},
        {"low": 20, "high": 30, "name": "c"},
        {"low": 30, "high": 40, "name": "d"},
        {"low": 40, "high": 50, "name": "e"},]

myDF = pd.DataFrame(data)

#data to be binned
mySeries = pd.Series([5.7, 30.4, 21, 35.1])

#create bins from original data
bins = list(myDF["high"])
bins.insert(0,0)

print pd.cut(mySeries, bins, labels = myDF["name"])

这将为您提供以下信息，然后您可以将其放回某个数据帧中，或者以您希望的方式保存数据：

0    a
1    d
2    c
3    d
dtype: category
Categories (5, object): [a < b < c < d < e]

0a
一维
2 c
三维
数据类型：类别
类别（5，对象）：[a


根据你的垃圾箱有多不规则（以及你所说的“定制/不规则”的确切含义），你可能不得不求助于在系列中循环。我想不出有什么内置设备可以帮你解决这个问题，特别是考虑到它取决于垃圾箱中不规则的程度/类型
在循环方面，如果你有一个下限和上限，不管“规则性”如何，这种方法都会起作用：
mySeries中el的：
打印myDF[“名称”][（myDF[“低”]el）]

我知道您可能不想循环浏览一个庞大的系列，但至少我们没有手动索引到数据帧，这可能会使事情变得更慢
您可以这样做
buckets = [0, 10, 20, 30, 40]
buckets_name = ['a', 'b', 'c', 'd']

pd.cut(your_series, buckets , labels = buckets_name)

这个答案要简单得多。
buckets = [0, 10, 20, 30, 40]
buckets_name = ['a', 'b', 'c', 'd']

pd.cut(your_series, buckets , labels = buckets_name)