Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/360.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何在pandas groupby中组合连接的字符串_Python_Pandas_Pandas Groupby - Fatal编程技术网

Python 如何在pandas groupby中组合连接的字符串

Python 如何在pandas groupby中组合连接的字符串,python,pandas,pandas-groupby,Python,Pandas,Pandas Groupby,我试图弄清楚如何计算两个字符串的给定组合,而不管哪个字符串是第一个/第二个 这是我的密码: import pandas as pd mylist = [[('Smith JR', 'Kim YY'), ('Smith JR', 'Ron AA'), ('Kim YY', 'Ron AA')], [('Kim YY', 'Smith JR')], [('Smith JR', 'Ron AA')]] flat_list = [item for sublist in mylis

我试图弄清楚如何计算两个字符串的给定组合,而不管哪个字符串是第一个/第二个

这是我的密码:

import pandas as pd

mylist = [[('Smith JR', 'Kim YY'), ('Smith JR', 'Ron AA'), ('Kim YY', 'Ron AA')],
          [('Kim YY', 'Smith JR')], [('Smith JR', 'Ron AA')]]

flat_list = [item for sublist in mylist for item in sublist]

df = pd.DataFrame(flat_list, columns=["From", "To"])
df_graph = df.groupby(["From", "To"]).size().reset_index()
df_graph.columns = ["From", "To", "Count"]

print(df_graph)
其中:

       From        To  Count
0    Kim YY    Ron AA      1
1    Kim YY    Smith JR    1
2  Smith JR    Kim YY      1
3  Smith JR    Ron AA      2
但是由于Kim YY Smith JR和Smith JR Kim YY在同两个人之间形成了一种联系,我希望它能给出:

       From        To  Count
0    Kim YY    Ron AA      1
1    Kim YY    Smith JR    2
2  Smith JR    Ron AA      2
我已经看到了许多解决方案,它们删除了重复的行,但没有按照我的意愿合并每行的计数。我似乎想不出如何组合这些元素

1    Kim YY    Smith JR    1
2  Smith JR    Kim YY      1

行,这样只剩下Kim YY-Smith JR行,计数为2。此外,在我的实际数据中,给定行的计数可以大于1

在添加到数据帧之前,将两列排序在一起,这样可以保证一对只按特定顺序出现。然后应用你的计数方法。使用中的方法进行排序:

import pandas as pd
import networkx as nx

mylist = [[('Smith JR','Kim YY'),('Smith JR','Ron AA'),('Kim YY','Ron AA')],[('Kim YY','Smith JR')],[('Smith JR','Ron AA')]]

flat_list = [item for sublist in mylist for item in sublist]

df = pd.DataFrame(flat_list, columns=["From", "To"])
#create a new dataframe with the value pairs sorted. You can also sort earlier if you prefer.
df = pd.DataFrame(np.sort(df[["From", "To"]]), columns = ["From", "To"])
#now, just apply the groupby.
df_graph = df.groupby(["From", "To"], axis=0).size().reset_index()
#Output:
     From        To  0
0  Kim YY    Ron AA  1
1  Kim YY  Smith JR  2
2  Ron AA  Smith JR  2
快,脏 但没那么脏

pd.value_counts([*map(frozenset, zip(df.From, df.To))])

(Smith JR, Ron AA)    2
(Kim YY, Smith JR)    2
(Kim YY, Ron AA)      1
dtype: int64