Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/332.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/7/neo4j/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 为什么这个熊猫数据框没有';附加_Python_Pandas - Fatal编程技术网

Python 为什么这个熊猫数据框没有';附加

Python 为什么这个熊猫数据框没有';附加,python,pandas,Python,Pandas,我希望在下面的代码中,for循环将围绕文件夹中的所有csv循环,df数据帧将在读取每个csv后追加。但是,这里的df从不追加,只包含第一个csv的内容。有什么想法吗?谢谢 我们使用的是python 3.6和pandas 0.21 path = "/home/ubuntu/QA/client_" + CLIENT_ID + "_raw_data_" + year + "/_ACTUAL_*_Accrual*.xls" if CLIENT_ID in ('7') df_c

我希望在下面的代码中,for循环将围绕文件夹中的所有csv循环,df数据帧将在读取每个csv后追加。但是,这里的df从不追加,只包含第一个csv的内容。有什么想法吗?谢谢

我们使用的是python 3.6和pandas 0.21

    path = "/home/ubuntu/QA/client_" + CLIENT_ID + "_raw_data_" + year + "/_ACTUAL_*_Accrual*.xls"

    if CLIENT_ID in ('7')

    df_columns=pd.DataFrame(columns=['PropID','PROPERTY_CODE','TreeNodeID','ACCOUNT_CODE','TreeNodeName','ReportYear','Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec'])

    OUTPUT_CSV="Client_"+CLIENT_ID+"_"+year+"_"+ACCOUNTING_TYPE+"_QA.csv"

    df_columns.to_csv(OUTPUT_CSV, header=True, index=False, encoding='utf-8',na_rep="NA", mode='w')

    df = pd.DataFrame()

    for fname in glob.iglob(path):

        print (fname)

        df2 = pd.DataFrame()

        df2=pd.read_excel(fname,skiprows=4,converters={'TreeNodeCode':np.int64,'PropCode':np.str}).dropna(subset=['TreeNodeCode'],how='any') ## convert the account code in the raw data into strings. dropna drops the raw of the column 4 ,which is the IAM account code, if the column 4 is NA

        print (df2)

        df=df.append(df2)

    df=df.rename(columns={'TreeNodeCode':'ACCOUNT_CODE'})

    df=df.rename(columns={'PropCode':'PROPERTY_CODE'})

    df['PROPERTY_CODE'] = df_QA['PROPERTY_CODE'].astype(np.str)

    df['ACCOUNT_CODE'] = df_QA['ACCOUNT_CODE'].astype(np.str)

    df_QA['PROPERTY_CODE'] = df_QA['PROPERTY_CODE'].astype(np.str)

    df_QA['ACCOUNT_CODE'] = df_QA['ACCOUNT_CODE'].astype(np.str)

    print ("this is df")

    print (df)

    print ("this is df_QA")

    print (df_QA)

    df_check=pd.merge(df,df_QA, how='inner',on=['PROPERTY_CODE','ACCOUNT_CODE'])

    #print (df_check)

    # tricks in this ticket: https://stackoverflow.com/questions/384192823/subtracting-multiple-columns-and-appending-results-in-pandas-dataframe

    df_check[['Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec']] = df_check[['Jan_x','Feb_x','Mar_x','Apr_x','May_x','Jun_x','Jul_x','Aug_x','Sep_x','Oct_x','Nov_x','Dec_x']] - df_check[['Jan_y','Feb_y','Mar_y','Apr_y','May_y','Jun_y','Jul_y','Aug_y','Sep_y','Oct_y','Nov_y','Dec_y']].values

    #print (df_check)

    df_check2=df_check[['PropID','PROPERTY_CODE','TreeNodeID','ACCOUNT_CODE','TreeNodeName','ReportYear','Jan','Feb','Mar','Apr','May','Jun','Jul','Aug','Sep','Oct','Nov','Dec']]

    #print (df_check2)

    # tricks of panda query: https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.query.html#pandas-dataframe-query

    df_check3=df_check2.query('Jan > 0 | Jan < 0 | Feb > 0 | Feb < 0 | Mar > 0 | Mar < 0 | Apr > 0 | Apr < 0 | May > 0 | May < 0 | Jun > 0 | Jun < 0 | Jul > 0 | Jul < 0 | Aug > 0 | Aug < 0 | Sep > 0 | Sep < 0 | Oct > 0 | Oct < 0 | Nov > 0 | Nov < 0 | Dec > 0 | Dec < 0')

    #print (df_check3)

    #print (df_check3.info())

    df_check3.to_csv(OUTPUT_CSV, header=False, index=False,

               na_rep="NA", mode='a')
path=“/home/ubuntu/QA/client\uuu”+客户端ID+“\u原始数据\uuu”+年份+“/\u实际值\uuu*\ u累计值*.xls”
如果客户ID为('7')
df_columns=pd.DataFrame(columns=['PropID'、'PROPERTY_CODE'、'TreeNodeID'、'ACCOUNT_CODE'、'TreeNodeName'、'ReportYear'、'Jan'、'Feb'、'Mar'、'Apr'、'May'、'Jun'、'Jul'、'Aug'、'Sep'、'Oct'、'Nov'、'Dec')
输出\u CSV=“客户\u”+客户ID+“\u”+年度+“\u”+会计类型+“\u QA.CSV”
df_columns.to_csv(输出_csv,header=True,index=False,encoding='utf-8',na_rep=“na”,mode='w')
df=pd.DataFrame()
对于glob.iglob(路径)中的fname:
打印(fname)
df2=pd.DataFrame()
df2=pd.read_excel(fname,skiprows=4,converters={'TreeNodeCode':np.int64,'PropCode':np.str}).dropna(subset=['TreeNodeCode'],how='any')##将原始数据中的帐户代码转换为字符串。如果第4列为NA,dropna将删除第4列的原始数据,即IAM帐户代码
打印(df2)
df=df.append(df2)
df=df.rename(列={'TreeNodeCode':'ACCOUNT_CODE'})
df=df.rename(列={'PropCode':'PROPERTY_CODE'})
df['PROPERTY\u CODE']=df\u QA['PROPERTY\u CODE'].astype(np.str)
df['ACCOUNT\u CODE']=df\u QA['ACCOUNT\u CODE'].astype(np.str)
df_QA['PROPERTY_CODE']=df_QA['PROPERTY_CODE'].astype(np.str)
df_QA['ACCOUNT_CODE']=df_QA['ACCOUNT_CODE'].astype(np.str)
打印(“这是df”)
打印(df)
打印(“这是df_QA”)
打印(df_QA)
df_check=pd.merge(df,df_QA,how='inner',on=['PROPERTY_CODE','ACCOUNT_CODE']))
#打印(df_检查)
#这张票的诀窍:https://stackoverflow.com/questions/384192823/subtracting-multiple-columns-and-appending-results-in-pandas-dataframe
“月”、“月”、“月”、“月”、“月”、“月”、“月”、“月”、“月”、“月”、“月”、“月”、“月”、“月”、“月”、“月”、“月”、“月”、“月”、“月”、“月”、“月”、“月”、“月”、“月”、“月”、“月”、“月”、“月”、“月”、“月”、“月”、“月”、“月”、“月”、“月”、“月”、“月”、“月”、“月”、“月”、“月”、“月”、“月”、“月”、“月”、“月x x x x x x x x月”、“月x x x月x x x月x月x月x月x月x月x月”、“月x x月x月x x月x月x x月x月x x月x月x x月x x月x月x月x月x月x月x月x月x月x月x月x月x月x月x月x月x月x月”、“月x月x月x月x月x月x月x月x月x月x月x月x月x月x月x月x月x月x月x月x月x月x月x月x月x月x月,“六月”、“七月”、“八月”、“九月”、“十月”、“十一月”、“十二月”].数值
#打印(df_检查)
df_check2=df_check['PropID'、'PROPERTY_CODE'、'TreeNodeID'、'ACCOUNT_CODE'、'TreeNodeName'、'ReportYear'、'Jan'、'Feb'、'Mar'、'Apr'、'May'、'Jun'、'Jul'、'Aug'、'Sep'、'Oct'、'Nov'、'Dec']
#打印(df_检查2)
#熊猫查询技巧:https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.query.html#pandas-数据帧查询
df|u check3=df|u check2.query('Jan>0 | Jan<0 | Feb>0 | Feb<0 | Mar>0 | Mar>0 | Apr<0 | May>0 | May<0 | Jun>0 | Jun<0 | Jul>0 | Jul<0 | Aug>0 | Aug<0 | Aug>0 | Aug<0 | Sep>0 | Sep>0 | Sep<0 | Sep>0 | Sep>0 | Sep<0 | se
#打印(df_检查3)
#打印(df_check3.info())
df_check3.到_csv(输出_csv,标题=False,索引=False,
na_rep=“na”,mode='a')

我认为您需要先将每个数据帧附加到列表中,然后:


还有您的代码:

df=df.rename(columns={'TreeNodeCode':'ACCOUNT_CODE'})

df=df.rename(columns={'PropCode':'PROPERTY_CODE'})

df['PROPERTY_CODE'] = df_QA['PROPERTY_CODE'].astype(np.str)

df['ACCOUNT_CODE'] = df_QA['ACCOUNT_CODE'].astype(np.str)

df_QA['PROPERTY_CODE'] = df_QA['PROPERTY_CODE'].astype(np.str)

df_QA['ACCOUNT_CODE'] = df_QA['ACCOUNT_CODE'].astype(np.str)
应简化为:

df=df.rename(columns={'TreeNodeCode':'ACCOUNT_CODE', 'PropCode':'PROPERTY_CODE'})
cols = ['PROPERTY_CODE','ACCOUNT_CODE']
df_QA[cols] = df[cols] = df_QA[cols].astype(str)

我认为您需要先将每个数据帧附加到列表中,然后:


还有您的代码:

df=df.rename(columns={'TreeNodeCode':'ACCOUNT_CODE'})

df=df.rename(columns={'PropCode':'PROPERTY_CODE'})

df['PROPERTY_CODE'] = df_QA['PROPERTY_CODE'].astype(np.str)

df['ACCOUNT_CODE'] = df_QA['ACCOUNT_CODE'].astype(np.str)

df_QA['PROPERTY_CODE'] = df_QA['PROPERTY_CODE'].astype(np.str)

df_QA['ACCOUNT_CODE'] = df_QA['ACCOUNT_CODE'].astype(np.str)
应简化为:

df=df.rename(columns={'TreeNodeCode':'ACCOUNT_CODE', 'PropCode':'PROPERTY_CODE'})
cols = ['PROPERTY_CODE','ACCOUNT_CODE']
df_QA[cols] = df[cols] = df_QA[cols].astype(str)

是否所有csv都有相同的列?你能把两个csv的第一行放进去吗?所有csv都有相同的列吗?你能把csv的第一行放进去吗?谢谢@jezrael。但我发现了一件奇怪的事情:dfs行将在concated df中移动。对于每个PropCode,每个dfs行将向前移动1行。有什么想法吗?我想这是数据问题。我想在read_excel中可能需要skiprows=3。数据保密吗?谢谢@jezrael。但我发现了一件奇怪的事情:dfs行将在concated df中移动。对于每个PropCode,每个dfs行将向前移动1行。有什么想法吗?我想这是数据问题。我想在read_excel中可能需要skiprows=3。数据是保密的吗?