Python 从数据帧中删除不同的对
我有一个熊猫栏,它有两个带文本值的栏:Python 从数据帧中删除不同的对,python,pandas,Python,Pandas,我有一个熊猫栏,它有两个带文本值的栏: import pandas as pd df = pd.DataFrame({"text": ["how are you", "this is an apple", "how are you", "hello my friend", "how are you", "this is an apple", "are you ok", "are you ok"], "type": ["question", "statemen
import pandas as pd
df = pd.DataFrame({"text": ["how are you", "this is an apple", "how are you", "hello my friend", "how are you", "this is an apple", "are you ok", "are you ok"],
"type": ["question", "statement", "question", "statement", "statement", "question", "question", "question"]})
print(df)
text type
0 how are you question
1 this is an apple statement
2 how are you question
3 hello my friend statement
4 how are you statement
5 this is an apple question
6 are you ok question
7 are you ok question
我想找到具有不同“type”列值的对(来自“text”列的2个或更多值)。
例如,您可以看到值“你好”有“问题”和“陈述”。因此,我的结果应该是:
text type
3 hello my friend statement
6 are you ok question
7 are you ok question
因为“你还好吗”
和“你好,我的朋友”
的文本值对于“type”
具有唯一的值
我试图删除重复项()
,但效果不佳。
我正在考虑按“text”
列进行分组,但我不知道如何检查组是否具有不同的/非唯一的“type”
列值。这是groupby().nunique()
:
输出:
text type
3 hello my friend statement
6 are you ok question
7 are you ok question
尝试不同的
pd.crosstab
s=(~pd.crosstab(df.text,df.type).ne(0).all(1))
df.loc[df.text.isin(s.index[s])]
text type
3 hello my friend statement
6 are you ok question
7 are you ok question
s=(~pd.crosstab(df.text,df.type).ne(0).all(1))
df.loc[df.text.isin(s.index[s])]
text type
3 hello my friend statement
6 are you ok question
7 are you ok question