Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/330.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 将值组合到成员组中_Python_Pandas - Fatal编程技术网

Python 将值组合到成员组中

Python 将值组合到成员组中,python,pandas,Python,Pandas,我正在尝试将两列中的数字关联到成员组中。 以下是我到目前为止的情况: import pandas as pd df = pd.DataFrame({'A':[0, 1, 3, 4, 6, 7, 8, 8, 8, 9, 9, 9, 9, 9, 11, 12, 13, 14, 15, 15, 15, 16, 16, 16, 16, 17, 17, 17, 17, 18, 18, 18, 18, 18, 19, 19, 19, 19, 20, 20, 21, 22, 24, 25, 26, 27, 2

我正在尝试将两列中的数字关联到成员组中。 以下是我到目前为止的情况:

import pandas as pd
df = pd.DataFrame({'A':[0, 1, 3, 4, 6, 7, 8, 8, 8, 9, 9, 9, 9, 9, 11, 12, 13, 14, 15, 15, 15, 16, 16, 16, 16, 17, 17, 17, 17, 18, 18, 18, 18, 18, 19, 19, 19, 19, 20, 20, 21, 22, 24, 25, 26, 27, 28, 29, 29],
               'B':[1, 0, 4, 3, 7, 6, 112, 9, 114, 134, 135, 112, 8, 114, 14, 13, 12, 11, 16, 17, 18, 17, 15, 18, 19, 16, 18, 15, 19, 17, 16, 15, 19, 20, 20, 18, 17, 16, 19, 18, 22, 21, 25, 24, 27, 26, 29, 28, 30]})   

df = df.groupby('A')['B'].apply(lambda x: list(set(x))).reset_index()
^耶斯雷尔信贷银行

df['A']=df['A'].apply(lambda x : [x])
df_new=pd.DataFrame((df['A'] + df['B']),columns=["Combined"])
df_new["Combined"]=df_new["Combined"].sort_values().apply(lambda x: sorted(x))
将A列中的数字和B列中分组的值进行组合和排序

                       Combined
0                       [0, 1]
1                       [0, 1]
2                       [3, 4]
3                       [3, 4]
4                       [6, 7]
5                       [6, 7]
6             [8, 9, 112, 114]
7   [8, 9, 112, 114, 134, 135]
8                     [11, 14]
9                     [12, 13]
10                    [12, 13]
11                    [11, 14]
12            [15, 16, 17, 18]
13        [15, 16, 17, 18, 19]
14        [15, 16, 17, 18, 19]
15    [15, 16, 17, 18, 19, 20]
16        [16, 17, 18, 19, 20]
17                [18, 19, 20]
18                    [21, 22]
19                    [21, 22]
20                    [24, 25]
21                    [24, 25]
22                    [26, 27]
23                    [26, 27]
24                    [28, 29]
25                [28, 29, 30]
如何删除df_new中的重复列表。可能可以将列表转换为字符串值

最重要的是,我想从原始列中获取每个值,并将其与它所属的组合列表中最具包容性的一个相关联。 因此,df的col_A中的数字8应该与df_new中组合列的第7行相关联,该列包含数字8-[8,9,112,114,134,135]的最全列表


感谢您的帮助

我建议您将数据帧转换为numpy矩阵,使用
np.unique
方法获取唯一列表矩阵,然后转换回数据帧,如下所示:

df_new["Combined"] = pd.DataFrame(np.unique(df_new.as_matrix()))

#                              0
# 0                       [0, 1]
# 1                       [3, 4]
# 2                       [6, 7]
# 3             [8, 9, 112, 114]
# 4   [8, 9, 112, 114, 134, 135]
# 5                     [11, 14]
# 6                     [12, 13]
# 7             [15, 16, 17, 18]
# 8         [15, 16, 17, 18, 19]
# 9     [15, 16, 17, 18, 19, 20]
# 10        [16, 17, 18, 19, 20]
# 11                [18, 19, 20]
# 12                    [21, 22]
# 13                    [24, 25]
# 14                    [26, 27]
# 15                    [28, 29]
# 16                [28, 29, 30]

我建议通过将数据帧转换为numpy矩阵,使用
np.unique
方法获得唯一列表的矩阵,然后转换回数据帧,如下所示:

df_new["Combined"] = pd.DataFrame(np.unique(df_new.as_matrix()))

#                              0
# 0                       [0, 1]
# 1                       [3, 4]
# 2                       [6, 7]
# 3             [8, 9, 112, 114]
# 4   [8, 9, 112, 114, 134, 135]
# 5                     [11, 14]
# 6                     [12, 13]
# 7             [15, 16, 17, 18]
# 8         [15, 16, 17, 18, 19]
# 9     [15, 16, 17, 18, 19, 20]
# 10        [16, 17, 18, 19, 20]
# 11                [18, 19, 20]
# 12                    [21, 22]
# 13                    [24, 25]
# 14                    [26, 27]
# 15                    [28, 29]
# 16                [28, 29, 30]

您可以转换为
元组
,使用
删除重复项
,然后转换回
列表

之所以有必要这样做,是因为
pandas
使用的哈希表要求元素是不可变的。元组是不可变的,而列表不是

res = df_new['Combined'].map(tuple).drop_duplicates().map(list)

# 0                         [0, 1]
# 2                         [3, 4]
# 4                         [6, 7]
# 6               [8, 9, 112, 114]
# 7     [8, 9, 112, 114, 134, 135]
# 8                       [11, 14]
# 9                       [12, 13]
# 12              [15, 16, 17, 18]
# 13          [15, 16, 17, 18, 19]
# 15      [15, 16, 17, 18, 19, 20]
# 16          [16, 17, 18, 19, 20]
# 17                  [18, 19, 20]
# 18                      [21, 22]
# 20                      [24, 25]
# 22                      [26, 27]
# 24                      [28, 29]
# 25                  [28, 29, 30]
# Name: Combined, dtype: object

您可以转换为
元组
,使用
删除重复项
,然后转换回
列表

之所以有必要这样做,是因为
pandas
使用的哈希表要求元素是不可变的。元组是不可变的,而列表不是

res = df_new['Combined'].map(tuple).drop_duplicates().map(list)

# 0                         [0, 1]
# 2                         [3, 4]
# 4                         [6, 7]
# 6               [8, 9, 112, 114]
# 7     [8, 9, 112, 114, 134, 135]
# 8                       [11, 14]
# 9                       [12, 13]
# 12              [15, 16, 17, 18]
# 13          [15, 16, 17, 18, 19]
# 15      [15, 16, 17, 18, 19, 20]
# 16          [16, 17, 18, 19, 20]
# 17                  [18, 19, 20]
# 18                      [21, 22]
# 20                      [24, 25]
# 22                      [26, 27]
# 24                      [28, 29]
# 25                  [28, 29, 30]
# Name: Combined, dtype: object