Python 关于合并数据帧的问题_Python_Pandas_Dataframe

Python 关于合并数据帧的问题

python pandas dataframe

Python 关于合并数据帧的问题,python,pandas,dataframe,Python,Pandas,Dataframe,我有两个来自.csv文件的数据帧，我根据它们共享的一个公共列名称“name”来组合它们，我试图做的是在另一列上显示两个因子的差异。然而，我得到的错误是 Traceback (most recent call last): File "C:\Users\nhoss\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\sit

我有两个来自.csv文件的数据帧，我根据它们共享的一个公共列名称“name”来组合它们，我试图做的是在另一列上显示两个因子的差异。然而，我得到的错误是

Traceback (most recent call last):
  File "C:\Users\nhoss\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\pandas\core\indexes\base.py", line 2891, in get_loc
    return self._engine.get_loc(casted_key)
  File "pandas\_libs\index.pyx", line 70, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\index.pyx", line 101, in pandas._libs.index.IndexEngine.get_loc
  File "pandas\_libs\hashtable_class_helper.pxi", line 1675, in pandas._libs.hashtable.PyObjectHashTable.get_item
  File "pandas\_libs\hashtable_class_helper.pxi", line 1683, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: '\ufeff2010'

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "C:\Users\nhoss\OneDrive\Desktop\Senior_Project\responserate.py", line 22, in <module>
    combinedresponse['DIFFERENCE'] = combinedresponse['\ufeff2010'] - combinedresponse['2000']
  File "C:\Users\nhoss\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\pandas\core\frame.py", line 2902, in __getitem__
    indexer = self.columns.get_loc(key)
  File "C:\Users\nhoss\AppData\Local\Packages\PythonSoftwareFoundation.Python.3.8_qbz5n2kfra8p0\LocalCache\local-packages\Python38\site-packages\pandas\core\indexes\base.py", line 2893, in get_loc
    raise KeyError(key) from err
KeyError: '\ufeff2010'
[Finished in 0.933s]

CSV文件：

回应2010.csv

2010,NAME,STATE,COUNTY_ID
52,Allegany County,36,3
64,Bronx County,36,5
68,Broome County,36,7
57,Cattaraugus County,36,9
64,Cayuga County,36,11
61,Chautauqua County,36,13
71,Chemung County,36,15
58,Chenango County,36,17
62,Clinton County,36,19
50,Columbia County,36,21
67,Cortland County,36,23
50,Delaware County,36,25
66,Dutchess County,36,27
70,Erie County,36,29
63,Fulton County,36,35
52,Essex County,36,31
59,Franklin County,36,33

2000ResponseRates.csv：

SS,CCC,NAME,2000
36,001,Albany County,70
36,003005,Allegany County,60
36,005,Bronx County,56
36,007,Broome County,72
36,009,Cattaraugus County,64
36,011,Cayuga County,60
36,013,Chautauqua County,66
36,015,Chemung County,75
36,017,Chenango County,65
36,019,Clinton County,68
36,021,Columbia County,62
36,023,Cortland County,64
36,025,Delaware County,53
36,027,Dutchess County,68
36,029,Erie County,74
36,031,Essex County,58
36,033,Franklin County,67

请按以下方式尝试：

combinedresponse  = pd.merge(respose2000, response2010, on="NAME", how="inner)
combinedresponse['DIFFERENCE'] = combinedresponse['2010'] - combinedresponse['2000']

您的

responsere2010.csv

文件可能存在一些格式问题。它可能正在以另一种编码方式读取字符串？

combinedresponse  = pd.merge(respose2000, response2010, on="NAME", how="inner)
combinedresponse['DIFFERENCE'] = combinedresponse['2010'] - combinedresponse['2000']