Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/fsharp/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 更改起始列标题并在读取多个文件时读取下一行_Python_Pandas_Dataframe_Columnheader - Fatal编程技术网

Python 更改起始列标题并在读取多个文件时读取下一行

Python 更改起始列标题并在读取多个文件时读取下一行,python,pandas,dataframe,columnheader,Python,Pandas,Dataframe,Columnheader,我希望“name”行是所有文件的起始列标题。在列标题完成后,我希望数据在它之后开始 我有3个文件(+更多),列位于不同的索引中: 2 3 4 5 6 7 ... 0 A B nan nan nan nan ... 1 Nan B nan nan C nan ... 2 Nan

我希望“name”行是所有文件的起始列标题。在列标题完成后,我希望数据在它之后开始

我有3个文件(+更多),列位于不同的索引中:

    2        3      4      5      6       7     ...    
0    A        B     nan    nan    nan     nan    ...     
1    Nan      B     nan    nan     C      nan    ...     
2    Nan      B     nan    nan     C      nan    ...     
3    AA       B     nan    nan     C      nan    ...     
4    Name  Address  Type   Size   Comment Grade  ...     Brand
5    John    ggg    sports  8     Nil      A     ....    Nike
6    John    ggg    sports  9     Nil      B     ....    Nike
7    Mary    ggg    sports  6     Nil      A     ....    Adidas


我正在考虑这样做:(错误)

我想把代码放在这里:(忘了包括这个,对不起!!!)

对于文件中的文件:
df=pd.read\u csv(文件头=0)
您可以使用此

loc = df['2'].str.contains('Name').idxmax() # We find the row which has the string 'Name' in column '2'
cols = df.iloc[loc].tolist() #we get the contents of that row to use later as column names
df1 = df.iloc[loc+1:,] #we filter the dataframe to get rows after the row with 'Name'
df1.columns= cols #rename columns
print(df1)
输入

2   3   4   5   6   7   ...     Brand
0   A   B   NaN     NaN     NaN     NaN     ...     nike
1   Nan     B   NaN     NaN     C   NaN     ...     nike
2   Nan     B   NaN     NaN     C   NaN     ...     nike
3   AA  B   NaN     NaN     C   NaN     ...     Adidas
4   Name    Address     Type    Size    Comment     Grade   ...     Adidas
5   John    ggg     sports  8   Nil     A   ....    Adidas
Name    Address     Type    Size    Comment     Grade   ...     Adidas
5   John    ggg     sports  8   Nil     A   ....    Adidas
输出

2   3   4   5   6   7   ...     Brand
0   A   B   NaN     NaN     NaN     NaN     ...     nike
1   Nan     B   NaN     NaN     C   NaN     ...     nike
2   Nan     B   NaN     NaN     C   NaN     ...     nike
3   AA  B   NaN     NaN     C   NaN     ...     Adidas
4   Name    Address     Type    Size    Comment     Grade   ...     Adidas
5   John    ggg     sports  8   Nil     A   ....    Adidas
Name    Address     Type    Size    Comment     Grade   ...     Adidas
5   John    ggg     sports  8   Nil     A   ....    Adidas
循环的
可以如下所示

for file in files:
    df = pd.read_csv(file, header=0)
    loc = df['2'].str.contains('Name').idxmax() # We find the row which has the string 'Name' in column '2'
    cols = df.iloc[loc].tolist() #we get the contents of that row to use later as column names
    df1 = df.iloc[loc+1:,] #we filter the dataframe to get rows after the row with 'Name'
    df.append(df1)
退出for循环后,可以在此行中执行
df.columns=cols

a = df.columns.str.startswith('Name')
columns应该是[2,3,4,5,6,7],因为它们是您的列名。这是错误的。当前名称不在列标题中,它只是名为“2”的列中的一个值 相反,您可以:

a = df[df['2'].str.startswith('Name', na=False)] 
df.columns = a.tolist()
#you should be sure you only have exactly 1 occurrence of 'Name'
设置具有所需名称的列后,应删除具有以下名称的行:

df = df.drop(df['2'].str.startswith('Name', na=False).index)
因此,您需要在名为“2”(创建布尔掩码)的列上运行startswith 在代码2中。您也有类似的问题,.loc用于根据标签选择数据,但当前“Name”不是列或索引标签,它只是一个值


如果你仔细阅读pandas上的列和索引,并使用条件来创建布尔掩码,这可能会有所帮助。我发现有很多,有一个很好的链接:

Question not clear。预期的输出是什么?若要将起始列标题从索引0更改为行“Name”,则要从
df'
中删除包含“Name”的行之前的行?是的,我接受任何解决方案:)前两个文件已被最后一个文件替换。尽管格式正确,但在导出最后一个文件时,仅显示最后一个文件。我已经编辑了主体,其中包括循环,我可以知道如何不替换它吗?我在他的行中得到错误:a=df[df['2']。str.startswith('Name')]无法使用包含na/nan值的向量进行索引。请参阅我添加的更新代码,na=False以False填充缺少的值。df['2']中的某些行丢失,这会产生错误。
a = df[df['2'].str.startswith('Name', na=False)] 
df.columns = a.tolist()
#you should be sure you only have exactly 1 occurrence of 'Name'
df = df.drop(df['2'].str.startswith('Name', na=False).index)