Python 从pandas中的1个csv文件读取多个表
假设我有一个如下的csv文件:Python 从pandas中的1个csv文件读取多个表,python,pandas,Python,Pandas,假设我有一个如下的csv文件: Name: Jack Place: Binghampton Age:27 Month,Sales,Revenue Jan,51,$1000 Feb,20,$1050 Mar,100,$10000 ### Blank File Space ### Blank File Space Name: Jill Place: Hamptonshire Age: 49 Month,Sales,Revenue Apr,11,$1000 May,55,$3000 Jun,23,$4
Name: Jack
Place: Binghampton
Age:27
Month,Sales,Revenue
Jan,51,$1000
Feb,20,$1050
Mar,100,$10000
### Blank File Space
### Blank File Space
Name: Jill
Place: Hamptonshire
Age: 49
Month,Sales,Revenue
Apr,11,$1000
May,55,$3000
Jun,23,$4600
### Blank File Space
### Blank File Space
...
并且文件的内容间隔均匀,如图所示。我想把每个月的销售收入部分作为自己的df。我知道我可以通过以下方式手动完成此操作:
df_Jack = pd.read_csv('./sales.csv', skiprows=3, nrows=3)
df_Jill = pd.read_csv('./sales.csv', skiprows=12, nrows=3)
我甚至不太担心df的名称,因为我认为我可以自己做到这一点,我只是不知道如何遍历等距文件来查找销售记录并将其存储为唯一的df
提前谢谢你的帮助 创建dfs列表怎么样
from io import StringIO
csvfile = StringIO("""Name: Jack
Place: Binghampton
Age:27
Month,Sales,Revenue
Jan,51,$1000
Feb,20,$1050
Mar,100,$10000
### Blank File Space
### Blank File Space
Name: Jill
Place: Hamptonshire
Age: 49
Month,Sales,Revenue
Apr,11,$1000
May,55,$3000
Jun,23,$4600
### Blank File Space
### Blank File Space""")
df = pd.read_csv(csvfile, sep=',', error_bad_lines=False, names=['Month','Sales','Revenue'])
df1 = df.dropna().loc[df.Month!='Month']
listofdf = [df1[i:i+3] for i in range(0,df1.shape[0],3)]
print(listofdf[0])
输出:
Month Sales Revenue
4 Jan 51 $1000
5 Feb 20 $1050
6 Mar 100 $10000
print(listofdf[1])
Month Sales Revenue
13 Apr 11 $1000
14 May 55 $3000
15 Jun 23 $4600
输出:
Month Sales Revenue
4 Jan 51 $1000
5 Feb 20 $1050
6 Mar 100 $10000
print(listofdf[1])
Month Sales Revenue
13 Apr 11 $1000
14 May 55 $3000
15 Jun 23 $4600
创建dfs列表怎么样
from io import StringIO
csvfile = StringIO("""Name: Jack
Place: Binghampton
Age:27
Month,Sales,Revenue
Jan,51,$1000
Feb,20,$1050
Mar,100,$10000
### Blank File Space
### Blank File Space
Name: Jill
Place: Hamptonshire
Age: 49
Month,Sales,Revenue
Apr,11,$1000
May,55,$3000
Jun,23,$4600
### Blank File Space
### Blank File Space""")
df = pd.read_csv(csvfile, sep=',', error_bad_lines=False, names=['Month','Sales','Revenue'])
df1 = df.dropna().loc[df.Month!='Month']
listofdf = [df1[i:i+3] for i in range(0,df1.shape[0],3)]
print(listofdf[0])
输出:
Month Sales Revenue
4 Jan 51 $1000
5 Feb 20 $1050
6 Mar 100 $10000
print(listofdf[1])
Month Sales Revenue
13 Apr 11 $1000
14 May 55 $3000
15 Jun 23 $4600
输出:
Month Sales Revenue
4 Jan 51 $1000
5 Feb 20 $1050
6 Mar 100 $10000
print(listofdf[1])
Month Sales Revenue
13 Apr 11 $1000
14 May 55 $3000
15 Jun 23 $4600
显然你可以这样做:
dfs = [pd.read_csv('./sales.csv', skiprows=i, nrows=3) for i in range(3, n, 9)]
# where n is your expected end line...
但另一种方法是自己读取csv,并将数据传回pandas
:
with open('./sales.csv', 'r') as file:
streaming = True
while streaming:
name = file.readline().rstrip().replace('Name: ','')
for _ in range(2): file.readline()
headers = file.readline().rstrip().split(',')
data = [file.readline().rstrip().split(',') for _ in range(3)]
dfs[name] = pd.DataFrame.from_records(data, columns=headers)
for _ in range(2):
streaming = file.readline()
我承认,与另一个答案相比,这是相当残忍和不雅的。。。但它是有效的。实际上,它在字典中按名称为您提供了DataFrame
:
>>> dfs['Jack']
Month Sales Revenue
0 Jan 51 $1000
1 Feb 20 $1050
2 Mar 100 $10000
>>> dfs['Jill']
Month Sales Revenue
0 Apr 11 $1000
1 May 55 $3000
2 Jun 23 $4600
显然你可以这样做:
dfs = [pd.read_csv('./sales.csv', skiprows=i, nrows=3) for i in range(3, n, 9)]
# where n is your expected end line...
但另一种方法是自己读取csv,并将数据传回pandas
:
with open('./sales.csv', 'r') as file:
streaming = True
while streaming:
name = file.readline().rstrip().replace('Name: ','')
for _ in range(2): file.readline()
headers = file.readline().rstrip().split(',')
data = [file.readline().rstrip().split(',') for _ in range(3)]
dfs[name] = pd.DataFrame.from_records(data, columns=headers)
for _ in range(2):
streaming = file.readline()
我承认,与另一个答案相比,这是相当残忍和不雅的。。。但它是有效的。实际上,它在字典中按名称为您提供了DataFrame
:
>>> dfs['Jack']
Month Sales Revenue
0 Jan 51 $1000
1 Feb 20 $1050
2 Mar 100 $10000
>>> dfs['Jill']
Month Sales Revenue
0 Apr 11 $1000
1 May 55 $3000
2 Jun 23 $4600