Python 更改起始列标题并在读取多个文件时读取下一行_Python_Pandas_Dataframe_Columnheader

Python 更改起始列标题并在读取多个文件时读取下一行

python pandas dataframe

Python 更改起始列标题并在读取多个文件时读取下一行,python,pandas,dataframe,columnheader,Python,Pandas,Dataframe,Columnheader,我希望“name”行是所有文件的起始列标题。在列标题完成后，我希望数据在它之后开始我有3个文件（+更多），列位于不同的索引中： 2 3 4 5 6 7 ... 0 A B nan nan nan nan ... 1 Nan B nan nan C nan ... 2 Nan

我希望“name”行是所有文件的起始列标题。在列标题完成后，我希望数据在它之后开始

我有3个文件（+更多），列位于不同的索引中：

    2        3      4      5      6       7     ...    
0    A        B     nan    nan    nan     nan    ...     
1    Nan      B     nan    nan     C      nan    ...     
2    Nan      B     nan    nan     C      nan    ...     
3    AA       B     nan    nan     C      nan    ...     
4    Name  Address  Type   Size   Comment Grade  ...     Brand
5    John    ggg    sports  8     Nil      A     ....    Nike
6    John    ggg    sports  9     Nil      B     ....    Nike
7    Mary    ggg    sports  6     Nil      A     ....    Adidas

我正在考虑这样做：（错误）

我想把代码放在这里：（忘了包括这个，对不起！！！）

对于文件中的文件：
df=pd.read\u csv（文件头=0）

您可以使用此

loc = df['2'].str.contains('Name').idxmax() # We find the row which has the string 'Name' in column '2'
cols = df.iloc[loc].tolist() #we get the contents of that row to use later as column names
df1 = df.iloc[loc+1:,] #we filter the dataframe to get rows after the row with 'Name'
df1.columns= cols #rename columns
print(df1)

输入

2   3   4   5   6   7   ...     Brand
0   A   B   NaN     NaN     NaN     NaN     ...     nike
1   Nan     B   NaN     NaN     C   NaN     ...     nike
2   Nan     B   NaN     NaN     C   NaN     ...     nike
3   AA  B   NaN     NaN     C   NaN     ...     Adidas
4   Name    Address     Type    Size    Comment     Grade   ...     Adidas
5   John    ggg     sports  8   Nil     A   ....    Adidas

Name    Address     Type    Size    Comment     Grade   ...     Adidas
5   John    ggg     sports  8   Nil     A   ....    Adidas

输出

2   3   4   5   6   7   ...     Brand
0   A   B   NaN     NaN     NaN     NaN     ...     nike
1   Nan     B   NaN     NaN     C   NaN     ...     nike
2   Nan     B   NaN     NaN     C   NaN     ...     nike
3   AA  B   NaN     NaN     C   NaN     ...     Adidas
4   Name    Address     Type    Size    Comment     Grade   ...     Adidas
5   John    ggg     sports  8   Nil     A   ....    Adidas

Name    Address     Type    Size    Comment     Grade   ...     Adidas
5   John    ggg     sports  8   Nil     A   ....    Adidas

循环的

可以如下所示
for file in files:
    df = pd.read_csv(file, header=0)
    loc = df['2'].str.contains('Name').idxmax() # We find the row which has the string 'Name' in column '2'
    cols = df.iloc[loc].tolist() #we get the contents of that row to use later as column names
    df1 = df.iloc[loc+1:,] #we filter the dataframe to get rows after the row with 'Name'
    df.append(df1)

退出for循环后，可以在此行中执行df.columns=cols
：
a = df.columns.str.startswith('Name')

columns应该是[2,3,4,5,6,7]，因为它们是您的列名。这是错误的。当前名称不在列标题中，它只是名为“2”的列中的一个值
相反，您可以：
a = df[df['2'].str.startswith('Name', na=False)] 
df.columns = a.tolist()
#you should be sure you only have exactly 1 occurrence of 'Name'

设置具有所需名称的列后，应删除具有以下名称的行：
df = df.drop(df['2'].str.startswith('Name', na=False).index)

因此，您需要在名为“2”（创建布尔掩码）的列上运行startswith
在代码2中。您也有类似的问题，.loc用于根据标签选择数据，但当前“Name”不是列或索引标签，它只是一个值
如果你仔细阅读pandas上的列和索引，并使用条件来创建布尔掩码，这可能会有所帮助。我发现有很多，有一个很好的链接：
Question not clear。预期的输出是什么？若要将起始列标题从索引0更改为行“Name”，则要从df'中删除包含“Name”的行之前的行？是的，我接受任何解决方案：）前两个文件已被最后一个文件替换。尽管格式正确，但在导出最后一个文件时，仅显示最后一个文件。我已经编辑了主体，其中包括循环，我可以知道如何不替换它吗？我在他的行中得到错误：a=df[df['2']。str.startswith（'Name'）]无法使用包含na/nan值的向量进行索引。请参阅我添加的更新代码，na=False以False填充缺少的值。df['2']中的某些行丢失，这会产生错误。
a = df[df['2'].str.startswith('Name', na=False)] 
df.columns = a.tolist()
#you should be sure you only have exactly 1 occurrence of 'Name'

df = df.drop(df['2'].str.startswith('Name', na=False).index)