Python 合并返回奇数长度
我有一个相对简单的任务问题 我有两个数据帧:Python 合并返回奇数长度,python,python-3.x,pandas,Python,Python 3.x,Pandas,我有一个相对简单的任务问题 我有两个数据帧: df_样本我从csv中读取 +------+-----------+-------+-----------+ | key | Full Text | Date | Publisher | +------+-----------+-------+-----------+ | abcd | foofoo | date1 | a | | bcde | barbar | date2 | b | | cdef |
df_样本
我从csv中读取
+------+-----------+-------+-----------+
| key | Full Text | Date | Publisher |
+------+-----------+-------+-----------+
| abcd | foofoo | date1 | a |
| bcde | barbar | date2 | b |
| cdef | foobar | date3 | c |
+------+-----------+-------+-----------+
len(df_sample) = 20000
df_标签
我从excel中读取
+------+----------+--------+--------+
| key | relevant | other | other2 |
+------+----------+--------+--------+
| abcd | yes | blabla | blabla |
| bcde | no | blabla | blabla |
| cdef | no | blabla | blabla |
| defg | yes | blabla | blabla |
+------+----------+--------+--------+
len(df_labels) = 219000
我想加入两个表,为第一个数据帧中的每个键分配相关的值。所需的输出如下所示:
+------+-----------+-------+-----------+----------+
| key | Full Text | Date | Publisher | relevant |
+------+-----------+-------+-----------+----------+
| abcd | foofoo | date1 | a | yes |
| bcde | barbar | date2 | b | no |
| cdef | foobar | date3 | c | no |
+------+-----------+-------+-----------+----------+
我似乎做到了这一点,但为什么下面给出的结果是27377而不是20000(如原来的左表所示):
您会看到更多的行,因为键在两个df中不是唯一的,在您的示例中是第二个df。您需要决定是要重复当前行为的行,还是要在第二个df中删除重复的行:
df_labels = df_labels.drop_duplicates(subset='key')
默认情况下,这将只保留第一个重复项,如果您想要保留最后一个这样的替代行为,那么您可以传递:keep='last'
查看是否检查键列值在第二个df中是否唯一,如果重复,则会得到重复的行,另外,在任一关键列中都有NaN
?当然,在第二个df中有一些重复项。。。非常感谢你为我指明了正确的方向!
df_labels = df_labels.drop_duplicates(subset='key')