Python 2.7 无法重新索引重复轴
我正在尝试将多个csv文件合并到一个文件夹中 它们看起来像这样(实际上有两个以上的df): df1 df2 我想把所有的数据帧放在一个列表中,然后用reduce合并它们。要做到这一点,它们需要有相同的索引 我正在尝试以下代码:Python 2.7 无法重新索引重复轴,python-2.7,pandas,Python 2.7,Pandas,我正在尝试将多个csv文件合并到一个文件夹中 它们看起来像这样(实际上有两个以上的df): df1 df2 我想把所有的数据帧放在一个列表中,然后用reduce合并它们。要做到这一点,它们需要有相同的索引 我正在尝试以下代码: combined = [] reindex = [2,3,4,5,6] folder = r'C:\path_to_files' for f in os.listdir(folder): #read each file df = pd.read_cs
combined = []
reindex = [2,3,4,5,6]
folder = r'C:\path_to_files'
for f in os.listdir(folder):
#read each file
df = pd.read_csv(os.path.join(folder,f))
#check for duplicates - returns empty lists
print df[df.index.duplicated()]
#reindex
df.set_index([df.columns[0]], inplace=True)
df=df.reindex(reindex, fill_value=0)
#append
combined.append(df)
#merge on 'LCC' column
final = reduce(lambda left, right: pd.merge(left, right, on=['LCC'], how='outer'), combined)
但这仍然返回:
Traceback (most recent call last):
File "<ipython-input-31-45f925f6d48d>", line 9, in <module>
df=df.reindex(reindex, fill_value=0)
File "C:\Users\spotter\AppData\Local\Continuum\Anaconda2_2\lib\site-packages\pandas\core\frame.py", line 2741, in reindex
**kwargs)
File "C:\Users\spotter\AppData\Local\Continuum\Anaconda2_2\lib\site-packages\pandas\core\generic.py", line 2229, in reindex
fill_value, copy).__finalize__(self)
File "C:\Users\spotter\AppData\Local\Continuum\Anaconda2_2\lib\site-packages\pandas\core\frame.py", line 2687, in _reindex_axes
fill_value, limit, tolerance)
File "C:\Users\spotter\AppData\Local\Continuum\Anaconda2_2\lib\site-packages\pandas\core\frame.py", line 2698, in _reindex_index
allow_dups=False)
File "C:\Users\spotter\AppData\Local\Continuum\Anaconda2_2\lib\site-packages\pandas\core\generic.py", line 2341, in _reindex_with_indexers
copy=copy)
File "C:\Users\spotter\AppData\Local\Continuum\Anaconda2_2\lib\site-packages\pandas\core\internals.py", line 3586, in reindex_indexer
self.axes[axis]._can_reindex(indexer)
File "C:\Users\spotter\AppData\Local\Continuum\Anaconda2_2\lib\site-packages\pandas\indexes\base.py", line 2293, in _can_reindex
raise ValueError("cannot reindex from a duplicate axis")
ValueError: cannot reindex from a duplicate axis
回溯(最近一次呼叫最后一次):
文件“”,第9行,在
df=df.reindex(reindex,fill_值=0)
文件“C:\Users\spotter\AppData\Local\Continuum\Anaconda2\u 2\lib\site packages\pandas\core\frame.py”,第2741行,reindex格式
**kwargs)
文件“C:\Users\spotter\AppData\Local\Continuum\Anaconda2\u 2\lib\site packages\pandas\core\generic.py”,第2229行,reindex格式
填写值,复制)。\uuuuu完成\uuuuuu(自我)
文件“C:\Users\spotter\AppData\Local\Continuum\Anaconda2\u 2\lib\site packages\pandas\core\frame.py”,第2687行,在reindex\u轴中
填充(值、限制、公差)
文件“C:\Users\spotter\AppData\Local\Continuum\Anaconda2\u 2\lib\site packages\pandas\core\frame.py”,第2698行,在索引中
允许(dups=False)
文件“C:\Users\spotter\AppData\Local\Continuum\Anaconda2\u 2\lib\site packages\pandas\core\generic.py”,第2341行,位于带有索引器的reindex\u中
复制=复制)
reindex\U索引器中的文件“C:\Users\spotter\AppData\Local\Continuum\Anaconda2\u 2\lib\site packages\pandas\core\internals.py”,第3586行
自身轴[轴]。\u可以\u重新索引(索引器)
文件“C:\Users\spotter\AppData\Local\Continuum\Anaconda2\u 2\lib\site packages\pandas\index\base.py”,第2293行,位于重新索引中
raise VALUERROR(“无法从重复轴重新索引”)
ValueError:无法从重复轴重新编制索引
将第一列设置为索引后,您需要检查索引的重复项
#set index by first column
df.set_index([df.columns[0]], inplace=True)
#check for duplicates - returns NO empty lists
print df[df.index.duplicated()]
#reindex
df=df.reindex(reindex, fill_value=0)
或者检查第一列中的重复项而不是索引,同时参数keep=False
返回所有重复项(如有必要):
Traceback (most recent call last):
File "<ipython-input-31-45f925f6d48d>", line 9, in <module>
df=df.reindex(reindex, fill_value=0)
File "C:\Users\spotter\AppData\Local\Continuum\Anaconda2_2\lib\site-packages\pandas\core\frame.py", line 2741, in reindex
**kwargs)
File "C:\Users\spotter\AppData\Local\Continuum\Anaconda2_2\lib\site-packages\pandas\core\generic.py", line 2229, in reindex
fill_value, copy).__finalize__(self)
File "C:\Users\spotter\AppData\Local\Continuum\Anaconda2_2\lib\site-packages\pandas\core\frame.py", line 2687, in _reindex_axes
fill_value, limit, tolerance)
File "C:\Users\spotter\AppData\Local\Continuum\Anaconda2_2\lib\site-packages\pandas\core\frame.py", line 2698, in _reindex_index
allow_dups=False)
File "C:\Users\spotter\AppData\Local\Continuum\Anaconda2_2\lib\site-packages\pandas\core\generic.py", line 2341, in _reindex_with_indexers
copy=copy)
File "C:\Users\spotter\AppData\Local\Continuum\Anaconda2_2\lib\site-packages\pandas\core\internals.py", line 3586, in reindex_indexer
self.axes[axis]._can_reindex(indexer)
File "C:\Users\spotter\AppData\Local\Continuum\Anaconda2_2\lib\site-packages\pandas\indexes\base.py", line 2293, in _can_reindex
raise ValueError("cannot reindex from a duplicate axis")
ValueError: cannot reindex from a duplicate axis
#set index by first column
df.set_index([df.columns[0]], inplace=True)
#check for duplicates - returns NO empty lists
print df[df.index.duplicated()]
#reindex
df=df.reindex(reindex, fill_value=0)
#check duplicates in first column
print df[df.iloc[:, 0].duplicated(keep=False)]
#set index + reindex
df.set_index([df.columns[0]], inplace=True)
df=df.reindex(reindex, fill_value=0)