Python 更改起始列标题并在读取多个文件时读取下一行
我希望“name”行是所有文件的起始列标题。在列标题完成后,我希望数据在它之后开始 我有3个文件(+更多),列位于不同的索引中:Python 更改起始列标题并在读取多个文件时读取下一行,python,pandas,dataframe,columnheader,Python,Pandas,Dataframe,Columnheader,我希望“name”行是所有文件的起始列标题。在列标题完成后,我希望数据在它之后开始 我有3个文件(+更多),列位于不同的索引中: 2 3 4 5 6 7 ... 0 A B nan nan nan nan ... 1 Nan B nan nan C nan ... 2 Nan
2 3 4 5 6 7 ...
0 A B nan nan nan nan ...
1 Nan B nan nan C nan ...
2 Nan B nan nan C nan ...
3 AA B nan nan C nan ...
4 Name Address Type Size Comment Grade ... Brand
5 John ggg sports 8 Nil A .... Nike
6 John ggg sports 9 Nil B .... Nike
7 Mary ggg sports 6 Nil A .... Adidas
我正在考虑这样做:(错误) 我想把代码放在这里:(忘了包括这个,对不起!!!)
对于文件中的文件:
df=pd.read\u csv(文件头=0)
您可以使用此
loc = df['2'].str.contains('Name').idxmax() # We find the row which has the string 'Name' in column '2'
cols = df.iloc[loc].tolist() #we get the contents of that row to use later as column names
df1 = df.iloc[loc+1:,] #we filter the dataframe to get rows after the row with 'Name'
df1.columns= cols #rename columns
print(df1)
输入
2 3 4 5 6 7 ... Brand
0 A B NaN NaN NaN NaN ... nike
1 Nan B NaN NaN C NaN ... nike
2 Nan B NaN NaN C NaN ... nike
3 AA B NaN NaN C NaN ... Adidas
4 Name Address Type Size Comment Grade ... Adidas
5 John ggg sports 8 Nil A .... Adidas
Name Address Type Size Comment Grade ... Adidas
5 John ggg sports 8 Nil A .... Adidas
输出
2 3 4 5 6 7 ... Brand
0 A B NaN NaN NaN NaN ... nike
1 Nan B NaN NaN C NaN ... nike
2 Nan B NaN NaN C NaN ... nike
3 AA B NaN NaN C NaN ... Adidas
4 Name Address Type Size Comment Grade ... Adidas
5 John ggg sports 8 Nil A .... Adidas
Name Address Type Size Comment Grade ... Adidas
5 John ggg sports 8 Nil A .... Adidas
循环的可以如下所示
for file in files:
df = pd.read_csv(file, header=0)
loc = df['2'].str.contains('Name').idxmax() # We find the row which has the string 'Name' in column '2'
cols = df.iloc[loc].tolist() #we get the contents of that row to use later as column names
df1 = df.iloc[loc+1:,] #we filter the dataframe to get rows after the row with 'Name'
df.append(df1)
退出for循环后,可以在此行中执行df.columns=cols
:
a = df.columns.str.startswith('Name')
columns应该是[2,3,4,5,6,7],因为它们是您的列名。这是错误的。当前名称不在列标题中,它只是名为“2”的列中的一个值
相反,您可以:
a = df[df['2'].str.startswith('Name', na=False)]
df.columns = a.tolist()
#you should be sure you only have exactly 1 occurrence of 'Name'
设置具有所需名称的列后,应删除具有以下名称的行:
df = df.drop(df['2'].str.startswith('Name', na=False).index)
因此,您需要在名为“2”(创建布尔掩码)的列上运行startswith
在代码2中。您也有类似的问题,.loc用于根据标签选择数据,但当前“Name”不是列或索引标签,它只是一个值
如果你仔细阅读pandas上的列和索引,并使用条件来创建布尔掩码,这可能会有所帮助。我发现有很多,有一个很好的链接:Question not clear。预期的输出是什么?若要将起始列标题从索引0更改为行“Name”,则要从df'
中删除包含“Name”的行之前的行?是的,我接受任何解决方案:)前两个文件已被最后一个文件替换。尽管格式正确,但在导出最后一个文件时,仅显示最后一个文件。我已经编辑了主体,其中包括循环,我可以知道如何不替换它吗?我在他的行中得到错误:a=df[df['2']。str.startswith('Name')]无法使用包含na/nan值的向量进行索引。请参阅我添加的更新代码,na=False以False填充缺少的值。df['2']中的某些行丢失,这会产生错误。
a = df[df['2'].str.startswith('Name', na=False)]
df.columns = a.tolist()
#you should be sure you only have exactly 1 occurrence of 'Name'
df = df.drop(df['2'].str.startswith('Name', na=False).index)