如果在创建df和python usecols时不存在列，则跳过该列_Python_Pandas

如果在创建df和python usecols时不存在列，则跳过该列

python pandas

如果在创建df和python usecols时不存在列，则跳过该列,python,pandas,Python,Pandas,我用熊猫来装载成千上万的CSV。然而，我只对一些列感兴趣，这些列可能不会出现在所有CSV中如果在其中一个CSV中不存在指定的列名，则参数usecols似乎不起作用。最好的解决方法是什么？谢谢 import pandas as pd for fullPath in listFilenamesPath: df = pd.read_csv(fullPath, sep= ";" , usecols = ['name','hostname', 'application family'])

我用熊猫来装载成千上万的CSV。然而，我只对一些列感兴趣，这些列可能不会出现在所有CSV中

如果在其中一个CSV中不存在指定的列名，则参数usecols似乎不起作用。最好的解决方法是什么？谢谢

import pandas as pd
for fullPath in listFilenamesPath:
    df = pd.read_csv(fullPath, sep= ";" , usecols = ['name','hostname', 'application family'])
    df.to_csv(fullPath, sep = ';', index = False, header = True, encoding = 'utf-8')
    nrFiles = nrFiles + 1
    print(nrFiles, "files converted")

您可以在不使用

usecols

的情况下读取整个csv。这将允许您检查DataFrame具有哪些列。如果数据框没有所需的列，您可以忽略它或根据需要进行处理。

您可以在不使用

usecols

的情况下读取整个csv。这将允许您检查DataFrame具有哪些列。如果数据框没有所需的列，您可以忽略它或根据需要进行处理。

一种解决方法是获取列名，该列名可以同时出现在

usecols

列表（您要查找的列列表）和

df.columns

中。然后，您可以使用此常用列名列表来子集

df

代码中包含必要的注释：

### the column names you want to look for in the dataframes
usecols = ['name','hostname', 'application family']

for fullPath in listFilenamesPath:
    ### read the entire dataframe without usecols
    df = pd.read_csv(fullPath, sep= ";")
    ### get the column names that appear in both usecols list as well as df.columns
    final_list = list(set(usecols) & set(df.columns))
    ### subset it using the final_list
    df = df[final_list]
    ### write your df to csv and continue as usual
    df.to_csv(fullPath, sep = ';', index = False, header = True, encoding = 'utf-8')
    nrFiles = nrFiles + 1
    print(nrFiles, "files converted")

演示：以下是带有df的csv：

我要查找以下列：

usecols = ['A', 'D', 'B']

我读了整本书。我得到df和我要查找的列之间的公共列，在本例中它们是A和B，并将其子集如下所示：

df = pd.read_csv('test1.csv')
final_list = list(set(cols) & set(df.columns))
df = df[final_list]
print(df)

输出：

一种解决方法是获取同时出现在

usecols