Python 关于数据操作的查询_Python_Pandas

Python 关于数据操作的查询

python pandas

Python 关于数据操作的查询,python,pandas,Python,Pandas,我想删除水果和颜色观察的重复组合，其中response=“error”您可以使用删除重复项 Ex: import pandas as pd df = [{'fruit': 'apple', 'color': 'red', 'response': 'right'}, {'fruit': 'apple', 'color': 'red', 'response': 'wrong'}, {'fruit': 'pineapple', 'color': 'green', 'resp

我想删除水果和颜色观察的重复组合，其中response=“error”

您可以使用

删除重复项

Ex:

import pandas as pd

df = [{'fruit': 'apple', 'color': 'red', 'response': 'right'},
     {'fruit': 'apple',  'color': 'red', 'response': 'wrong'},
     {'fruit': 'pineapple',  'color': 'green',  'response': 'True' },
     {'fruit': 'pineapple',  'color': 'green',  'response': 'wrong' },
     {'fruit': 'orange',  'color': 'orange',  'response': 'wrong' }]



df = pd.DataFrame(df)

import pandas as pd
df = [{'fruit': 'apple', 'color': 'red', 'response': 'right'},
     {'fruit': 'apple',  'color': 'red', 'response': 'wrong'},
     {'fruit': 'pineapple',  'color': 'green',  'response': 'True' },
     {'fruit': 'pineapple',  'color': 'green',  'response': 'wrong' },
     {'fruit': 'orange',  'color': 'orange',  'response': 'wrong' }]

df = pd.DataFrame(df)
print(df.drop_duplicates(['fruit','color']))

输出：

import pandas as pd

df = [{'fruit': 'apple', 'color': 'red', 'response': 'right'},
     {'fruit': 'apple',  'color': 'red', 'response': 'wrong'},
     {'fruit': 'pineapple',  'color': 'green',  'response': 'True' },
     {'fruit': 'pineapple',  'color': 'green',  'response': 'wrong' },
     {'fruit': 'orange',  'color': 'orange',  'response': 'wrong' }]



df = pd.DataFrame(df)

import pandas as pd
df = [{'fruit': 'apple', 'color': 'red', 'response': 'right'},
     {'fruit': 'apple',  'color': 'red', 'response': 'wrong'},
     {'fruit': 'pineapple',  'color': 'green',  'response': 'True' },
     {'fruit': 'pineapple',  'color': 'green',  'response': 'wrong' },
     {'fruit': 'orange',  'color': 'orange',  'response': 'wrong' }]

df = pd.DataFrame(df)
print(df.drop_duplicates(['fruit','color']))

首先对“response”列进行排序

    color      fruit response
0     red      apple    right
2   green  pineapple     True
4  orange     orange    wrong

输出

df.sort_values(['response'], inplace=True)

df.drop_duplicates(['color','fruit'], inplace = True)

df.sort_index(axis=0, inplace= True)

然后可以使用删除重复的值

   color      fruit response 
2   green  pineapple     True
0     red      apple    right
1     red      apple    wrong
3   green  pineapple    wrong
4  orange     orange    wrong

输出

df.sort_values(['response'], inplace=True)

df.drop_duplicates(['color','fruit'], inplace = True)

df.sort_index(axis=0, inplace= True)

您可以使用-

    color      fruit response
2   green  pineapple     True
0     red      apple    right
4  orange     orange    wrong

输出

df.sort_values(['response'], inplace=True)

df.drop_duplicates(['color','fruit'], inplace = True)

df.sort_index(axis=0, inplace= True)

这将为您提供所需的输出

预期结果：df=[{'fruit'：'apple'，'color'：'red'，'response'：'right'}，{'fruit'：'菠萝'，'color'：'green'，'response'：'True'}，{'fruit'：'orange'，'color'：'orange'，'response'：'error'}]如果我改变观察的顺序，那么上面的语法将删除right并保留'error'df=[{'fruit'：'apple'，'color'：'red'，'response'：'right'}，{'fruit'：'apple'，'color'：'red'，'response'：'error'}，{'fruit'：'菠萝'，'color'：'green'，'response'：'True'}，{‘水果’：‘菠萝’，‘颜色’：‘绿色’，‘响应’：‘错误’，{‘水果’：‘橙色’，‘颜色’：‘橙色’，‘响应’：‘错误’，{‘水果’，‘颜色’：‘橙色’，‘响应’：‘正确’}预期结果：df=[{‘水果’：‘苹果’，‘颜色’：‘红色’，‘响应’：‘正确’，{‘水果’：‘菠萝’，‘颜色’：‘绿色’，‘响应’：‘正确’，{'fruit'：'orange'，'color'：'orange'，'response'：'right'}]尝试：

df.drop\u duplicates（['fruit'，'color']，keep='first'）

df.drop\u duplicates（['fruit'，'color']，keep='last'）df drop\u duplicates（['fruit'，'color']，keep='last'））排序后，但没有用。似乎我必须按字母顺序对标签进行硬编码。我想避免在实际数据集中出现这种情况，因为没有顺序