Python 关于数据操作的查询
我想删除水果和颜色观察的重复组合,其中response=“error”您可以使用Python 关于数据操作的查询,python,pandas,Python,Pandas,我想删除水果和颜色观察的重复组合,其中response=“error”您可以使用删除重复项 Ex: import pandas as pd df = [{'fruit': 'apple', 'color': 'red', 'response': 'right'}, {'fruit': 'apple', 'color': 'red', 'response': 'wrong'}, {'fruit': 'pineapple', 'color': 'green', 'resp
删除重复项
Ex:
import pandas as pd
df = [{'fruit': 'apple', 'color': 'red', 'response': 'right'},
{'fruit': 'apple', 'color': 'red', 'response': 'wrong'},
{'fruit': 'pineapple', 'color': 'green', 'response': 'True' },
{'fruit': 'pineapple', 'color': 'green', 'response': 'wrong' },
{'fruit': 'orange', 'color': 'orange', 'response': 'wrong' }]
df = pd.DataFrame(df)
import pandas as pd
df = [{'fruit': 'apple', 'color': 'red', 'response': 'right'},
{'fruit': 'apple', 'color': 'red', 'response': 'wrong'},
{'fruit': 'pineapple', 'color': 'green', 'response': 'True' },
{'fruit': 'pineapple', 'color': 'green', 'response': 'wrong' },
{'fruit': 'orange', 'color': 'orange', 'response': 'wrong' }]
df = pd.DataFrame(df)
print(df.drop_duplicates(['fruit','color']))
输出:
import pandas as pd
df = [{'fruit': 'apple', 'color': 'red', 'response': 'right'},
{'fruit': 'apple', 'color': 'red', 'response': 'wrong'},
{'fruit': 'pineapple', 'color': 'green', 'response': 'True' },
{'fruit': 'pineapple', 'color': 'green', 'response': 'wrong' },
{'fruit': 'orange', 'color': 'orange', 'response': 'wrong' }]
df = pd.DataFrame(df)
import pandas as pd
df = [{'fruit': 'apple', 'color': 'red', 'response': 'right'},
{'fruit': 'apple', 'color': 'red', 'response': 'wrong'},
{'fruit': 'pineapple', 'color': 'green', 'response': 'True' },
{'fruit': 'pineapple', 'color': 'green', 'response': 'wrong' },
{'fruit': 'orange', 'color': 'orange', 'response': 'wrong' }]
df = pd.DataFrame(df)
print(df.drop_duplicates(['fruit','color']))
首先对“response”列进行排序
color fruit response
0 red apple right
2 green pineapple True
4 orange orange wrong
输出
df.sort_values(['response'], inplace=True)
df.drop_duplicates(['color','fruit'], inplace = True)
df.sort_index(axis=0, inplace= True)
然后可以使用删除重复的值
color fruit response
2 green pineapple True
0 red apple right
1 red apple wrong
3 green pineapple wrong
4 orange orange wrong
输出
df.sort_values(['response'], inplace=True)
df.drop_duplicates(['color','fruit'], inplace = True)
df.sort_index(axis=0, inplace= True)
您可以使用-
color fruit response
2 green pineapple True
0 red apple right
4 orange orange wrong
输出
df.sort_values(['response'], inplace=True)
df.drop_duplicates(['color','fruit'], inplace = True)
df.sort_index(axis=0, inplace= True)
这将为您提供所需的输出预期结果:df=[{'fruit':'apple','color':'red','response':'right'},{'fruit':'菠萝','color':'green','response':'True'},{'fruit':'orange','color':'orange','response':'error'}]如果我改变观察的顺序,那么上面的语法将删除right并保留'error'df=[{'fruit':'apple','color':'red','response':'right'},{'fruit':'apple','color':'red','response':'error'},{'fruit':'菠萝','color':'green','response':'True'},{‘水果’:‘菠萝’,‘颜色’:‘绿色’,‘响应’:‘错误’,{‘水果’:‘橙色’,‘颜色’:‘橙色’,‘响应’:‘错误’,{‘水果’,‘颜色’:‘橙色’,‘响应’:‘正确’}预期结果:df=[{‘水果’:‘苹果’,‘颜色’:‘红色’,‘响应’:‘正确’,{‘水果’:‘菠萝’,‘颜色’:‘绿色’,‘响应’:‘正确’,{'fruit':'orange','color':'orange','response':'right'}]尝试:
df.drop\u duplicates(['fruit','color'],keep='first')
df.drop\u duplicates(['fruit','color'],keep='last')df drop\u duplicates(['fruit','color'],keep='last'))排序后,但没有用。似乎我必须按字母顺序对标签进行硬编码。我想避免在实际数据集中出现这种情况,因为没有顺序