Python:在';日期';并保存在磁盘上
我对Python完全陌生,正在尝试为这个问题编写代码: A) 目录中有多个*.csv文件,所有这些文件都具有相同的列标题和结构。文件名示例: Google.csv、Alphabet.csv、Teva.csv、Bosch.csv 名为Google.csv的文件中的内容示例:Python:在';日期';并保存在磁盘上,python,pandas,csv,merge,Python,Pandas,Csv,Merge,我对Python完全陌生,正在尝试为这个问题编写代码: A) 目录中有多个*.csv文件,所有这些文件都具有相同的列标题和结构。文件名示例: Google.csv、Alphabet.csv、Teva.csv、Bosch.csv 名为Google.csv的文件中的内容示例: Date,Open,High,Low,Close 2000-01-06,15,32,33.7,49.2 2000-01-07,33.1,10.1,57.3,62 2000-01-10,221,62.4,66.9,790.5 2
Date,Open,High,Low,Close
2000-01-06,15,32,33.7,49.2
2000-01-07,33.1,10.1,57.3,62
2000-01-10,221,62.4,66.9,790.5
2000-01-11,3.3,1.78,43.2,52.1
2000-01-12,73.2,54.0,121.6,89.4
名为Teva.csv的文件中的内容示例:
Date,Open,High,Low,Close
2000-01-01,115,312,332.7,449.2
2000-01-02,33.1,10.1,59.3,662
2000-01-03,22.1,623.4,663.9,794.5
2000-01-06,34.3,13.78,43.2,52.1
2000-01-07,703.2,504.0,121.6,879.4
B) 有一个文件“List.csv”,它包含一些公司名称,是上述目录中提到的csv文件的子集。示例内容:
Company
Google
Teva
Date,
2000-01-01,
2000-01-02,
2000-01-03,
2000-01-06,
2000-01-07,
2000-01-08,
2000-01-09,
C) 还有另一个文件“Dates.txt”,它只包含一些日期。示例内容:
Company
Google
Teva
Date,
2000-01-01,
2000-01-02,
2000-01-03,
2000-01-06,
2000-01-07,
2000-01-08,
2000-01-09,
我的目标是只合并List.txt(B)中列出的那些*.csv文件(A),以Dates.txt(C)中的“Date”作为键,只选择标题为“Low”的列,并将其保存在磁盘上作为csv文件
保存在磁盘上的最终csv文件应如下所示:
Date,Google,Teva
2000-01-01,,332.7
2000-01-02,,59.3
2000-01-03,,663.9
2000-01-06,33.7,43.2
2000-01-07,57.3,121.6
这是我设法拼凑的代码:
import os; import numpy as np; import csv; import pandas as pd; from shutil import copyfile
pd.set_option('display.max_rows', 500); pd.set_option('display.max_columns', 500); pd.set_option('display.width', 1000)
os.chdir('D:/SO/'); #print (os.getcwd())
open('temp.txt', 'a').close()
dst = 'Dates.txt'; temp1 = 'temp.txt'
path = "D:/SO/dir/"; directory = os.fsencode(path)
with open('temp.txt', 'w', newline='') as temp_date:
copyfile(dst, temp1)
f1 = pd.read_csv('Dates.txt', index_col = 1); df1 = pd.DataFrame(f1); # Read the dates in Dates.txt for joining
with open('List.csv','r') as mylist:
data = csv.reader(mylist, delimiter = ",")
#next(data, None) # discard the header
for i in data:
c =i[0] + '.csv'; #print (c)#Add .csv to each line (CompanyName) in List.txt for searching the directory
for file in os.listdir(path): # Search for the file in directory
if c in file: # if found,
print (file)
f2 = pd.read_csv(os.path.join(path, file)); df2 = pd.DataFrame(f2); #print(df2.head(5))
f3= f1.merge(f2, how='left',on=['Date']); df3 = pd.DataFrame(f3);
df3 = df3.drop(df3.columns[[1,2,4]], axis=1); print(df3.head(10), '\n') # merge
continue
迄今为止的产出:
Google.csv
Date Low
0 2000-01-01 NaN
1 2000-01-02 NaN
2 2000-01-03 NaN
3 2000-01-06 33.7
4 2000-01-07 57.3
5 2000-01-08 NaN
6 2000-01-09 NaN
Teva.csv
Date Low
0 2000-01-01 332.7
1 2000-01-02 59.3
2 2000-01-03 663.9
3 2000-01-06 43.2
4 2000-01-07 121.6
5 2000-01-08 NaN
6 2000-01-09 NaN
查询:
上述代码确实分别加入/合并Dates.txt和所需文件。然而,我的要求是获得一个csv文件,日期在第0列,第2列的第一家公司,第3列的第二家公司,等等。有人能帮忙吗?我完全不了解Python,在这个论坛上找不到任何关于这个问题的问答
在Windows上使用Python 3.8.0
更新:
import os; import numpy as np; import csv; import pandas as pd; from shutil import copyfile
pd.set_option('display.max_rows', 500); pd.set_option('display.max_columns', 500); pd.set_option('display.width', 1000)
os.chdir('D:/SO/'); #print (os.getcwd())
open('temp.txt', 'a').close()
dst = 'Dates.txt'; temp1 = 'temp.txt'
path = "D:/SO/dir/"; directory = os.fsencode(path)
with open('temp.txt', 'w', newline='') as temp_date:
copyfile(dst, temp1)
f1 = pd.read_csv('Dates.txt', index_col = 1); df1 = pd.DataFrame(f1); # Read the dates in Dates.txt for joining
with open('List.csv','r') as mylist:
data = csv.reader(mylist, delimiter = ",")
#next(data, None) # discard the header
for i in data:
c =i[0] + '.csv'; #print (c)#Add .csv to each line (CompanyName) in List.txt for searching the directory
for file in os.listdir(path): # Search for the file in directory
if c in file: # if found,
print (file)
f2 = pd.read_csv(os.path.join(path, file)); df2 = pd.DataFrame(f2); #print(df2.head(5))
f3= f1.merge(f2, how='left',on=['Date']); df3 = pd.DataFrame(f3);
df3 = df3.drop(df3.columns[[1,2,4]], axis=1); print(df3.head(10), '\n') # merge
continue
正如所建议的,通过将列表列表转换为简单列表,我能够实现我想要的:
with open('temp.txt', 'r') as List_txt:
list_csv = csv.reader(List_txt); #print(reader, '\n');
flat_list = [val for sublist in list_csv for val in sublist]; #print(flat_list, '\n');
使用pandas和list comprehension,您可以执行以下操作:
import pandas as pd
# List of csv to retrieve
list_csv = pd.read_csv('../List.csv').tolist()
# List of dates
dates = pd.read_csv('../Dates.txt').tolist()
#Load only the csv's in the list
df = pd.concat([pd.read_csv(f'../{ticker}.csv', index_col='Date', usecols=['Date', 'Low']).rename(columns={'Low': ticker}) for ticker in list_csv], axis=1)
# Filter dates
df = df[df.index.isin(dates)]
# Write to a new csv
df.to_csv('../merged_file.csv')
第1行和第2行给出的错误是DataFrame对象没有属性“tolist”。所以我让它
pd.read_csv('List.csv').values.tolist()
但是,第三行df=pd.concat([pd.read_csv(f'D:/SO/dir/{ticker}.csv',index_col='Date',usecols=['Date',Low'])。为列表中的ticker重命名(columns={'Low':ticker}),axis=1)
给出以下错误:文件b“D:/SO/dir/['Google'].csv”不存在:b“D:/SO/dir/['Google].csv”事实上,你看到括号内的“谷歌”让我觉得你有一个列表,而不是一个股票行情列表。因为我不知道您的数据的格式,所以您应该设法使用该格式。例如,您可以加载、访问列以获取熊猫系列,然后按照我的想法使用.tolist()。完成!谢谢你的帮助!