Python：在'；日期'；并保存在磁盘上_Python_Pandas_Csv_Merge

Python：在'；日期'；并保存在磁盘上

python pandas csv merge

Python：在'；日期'；并保存在磁盘上,python,pandas,csv,merge,Python,Pandas,Csv,Merge,我对Python完全陌生，正在尝试为这个问题编写代码： A）目录中有多个*.csv文件，所有这些文件都具有相同的列标题和结构。文件名示例： Google.csv、Alphabet.csv、Teva.csv、Bosch.csv 名为Google.csv的文件中的内容示例： Date,Open,High,Low,Close 2000-01-06,15,32,33.7,49.2 2000-01-07,33.1,10.1,57.3,62 2000-01-10,221,62.4,66.9,790.5 2

我对Python完全陌生，正在尝试为这个问题编写代码：

A）目录中有多个*.csv文件，所有这些文件都具有相同的列标题和结构。文件名示例： Google.csv、Alphabet.csv、Teva.csv、Bosch.csv

名为Google.csv的文件中的内容示例：

Date,Open,High,Low,Close
2000-01-06,15,32,33.7,49.2
2000-01-07,33.1,10.1,57.3,62
2000-01-10,221,62.4,66.9,790.5
2000-01-11,3.3,1.78,43.2,52.1
2000-01-12,73.2,54.0,121.6,89.4

名为Teva.csv的文件中的内容示例：

Date,Open,High,Low,Close
2000-01-01,115,312,332.7,449.2
2000-01-02,33.1,10.1,59.3,662
2000-01-03,22.1,623.4,663.9,794.5
2000-01-06,34.3,13.78,43.2,52.1
2000-01-07,703.2,504.0,121.6,879.4

B）有一个文件“List.csv”，它包含一些公司名称，是上述目录中提到的csv文件的子集。示例内容：

Company
Google
Teva

Date,
2000-01-01,
2000-01-02,
2000-01-03,
2000-01-06,
2000-01-07,
2000-01-08,
2000-01-09,

C）还有另一个文件“Dates.txt”，它只包含一些日期。示例内容：

Company
Google
Teva

Date,
2000-01-01,
2000-01-02,
2000-01-03,
2000-01-06,
2000-01-07,
2000-01-08,
2000-01-09,

我的目标是只合并List.txt（B）中列出的那些*.csv文件（A），以Dates.txt（C）中的“Date”作为键，只选择标题为“Low”的列，并将其保存在磁盘上作为csv文件

保存在磁盘上的最终csv文件应如下所示：

Date,Google,Teva
2000-01-01,,332.7
2000-01-02,,59.3
2000-01-03,,663.9
2000-01-06,33.7,43.2
2000-01-07,57.3,121.6

这是我设法拼凑的代码：

import os; import numpy as np; import csv; import pandas as pd; from shutil import copyfile
pd.set_option('display.max_rows', 500); pd.set_option('display.max_columns', 500); pd.set_option('display.width', 1000)
os.chdir('D:/SO/'); #print (os.getcwd())

open('temp.txt', 'a').close()
dst = 'Dates.txt';   temp1 = 'temp.txt'
path = "D:/SO/dir/";   directory = os.fsencode(path)

with open('temp.txt', 'w', newline='') as temp_date:
    copyfile(dst, temp1)
    f1 = pd.read_csv('Dates.txt', index_col = 1);  df1 = pd.DataFrame(f1);  # Read the dates in Dates.txt for joining
    with open('List.csv','r') as mylist:
        data = csv.reader(mylist, delimiter = ",")
        #next(data, None) # discard the header
        for i in data:
            c =i[0] + '.csv';  #print (c)#Add .csv to each line (CompanyName) in List.txt for searching the directory
            for file in os.listdir(path):       # Search for the file in directory
                if c in file:                 # if found,
                    print (file)
                    f2 = pd.read_csv(os.path.join(path, file));     df2 = pd.DataFrame(f2);  #print(df2.head(5))
                    f3= f1.merge(f2, how='left',on=['Date']); df3 = pd.DataFrame(f3); 
                    df3 = df3.drop(df3.columns[[1,2,4]], axis=1);  print(df3.head(10), '\n')  # merge
            continue

迄今为止的产出：

Google.csv
         Date   Low
0  2000-01-01   NaN
1  2000-01-02   NaN
2  2000-01-03   NaN
3  2000-01-06  33.7
4  2000-01-07  57.3
5  2000-01-08   NaN
6  2000-01-09   NaN 

Teva.csv
         Date    Low
0  2000-01-01  332.7
1  2000-01-02   59.3
2  2000-01-03  663.9
3  2000-01-06   43.2
4  2000-01-07  121.6
5  2000-01-08    NaN
6  2000-01-09    NaN

查询：上述代码确实分别加入/合并Dates.txt和所需文件。然而，我的要求是获得一个csv文件，日期在第0列，第2列的第一家公司，第3列的第二家公司，等等。有人能帮忙吗？我完全不了解Python，在这个论坛上找不到任何关于这个问题的问答
在Windows上使用Python 3.8.0
更新：

import os; import numpy as np; import csv; import pandas as pd; from shutil import copyfile pd.set_option('display.max_rows', 500); pd.set_option('display.max_columns', 500); pd.set_option('display.width', 1000) os.chdir('D:/SO/'); #print (os.getcwd()) open('temp.txt', 'a').close() dst = 'Dates.txt'; temp1 = 'temp.txt' path = "D:/SO/dir/"; directory = os.fsencode(path) with open('temp.txt', 'w', newline='') as temp_date: copyfile(dst, temp1) f1 = pd.read_csv('Dates.txt', index_col = 1); df1 = pd.DataFrame(f1); # Read the dates in Dates.txt for joining with open('List.csv','r') as mylist: data = csv.reader(mylist, delimiter = ",") #next(data, None) # discard the header for i in data: c =i[0] + '.csv'; #print (c)#Add .csv to each line (CompanyName) in List.txt for searching the directory for file in os.listdir(path): # Search for the file in directory if c in file: # if found, print (file) f2 = pd.read_csv(os.path.join(path, file)); df2 = pd.DataFrame(f2); #print(df2.head(5)) f3= f1.merge(f2, how='left',on=['Date']); df3 = pd.DataFrame(f3); df3 = df3.drop(df3.columns[[1,2,4]], axis=1); print(df3.head(10), '\n') # merge continue

正如所建议的，通过将列表列表转换为简单列表，我能够实现我想要的：

with open('temp.txt', 'r') as List_txt: list_csv = csv.reader(List_txt); #print(reader, '\n'); flat_list = [val for sublist in list_csv for val in sublist]; #print(flat_list, '\n');

使用pandas和list comprehension，您可以执行以下操作：

import pandas as pd # List of csv to retrieve list_csv = pd.read_csv('../List.csv').tolist() # List of dates dates = pd.read_csv('../Dates.txt').tolist() #Load only the csv's in the list df = pd.concat([pd.read_csv(f'../{ticker}.csv', index_col='Date', usecols=['Date', 'Low']).rename(columns={'Low': ticker}) for ticker in list_csv], axis=1) # Filter dates df = df[df.index.isin(dates)] # Write to a new csv df.to_csv('../merged_file.csv')

第1行和第2行给出的错误是DataFrame对象没有属性“tolist”。所以我让它
pd.read_csv（'List.csv'）.values.tolist（）
但是，第三行
df=pd.concat（[pd.read_csv（f'D:/SO/dir/{ticker}.csv'，index_col='Date'，usecols=['Date'，Low']）。为列表中的ticker重命名（columns={'Low'：ticker}），axis=1）
给出以下错误：文件b“D:/SO/dir/['Google'].csv”不存在：b“D:/SO/dir/['Google].csv”事实上，你看到括号内的“谷歌”让我觉得你有一个列表，而不是一个股票行情列表。因为我不知道您的数据的格式，所以您应该设法使用该格式。例如，您可以加载、访问列以获取熊猫系列，然后按照我的想法使用.tolist（）。完成！谢谢你的帮助！