Pandas 从单个文件读取多个数据集
我的文本文件包含每个数据库的表。熊猫有没有办法读取此文件并为每个数据库创建单独的数据帧Pandas 从单个文件读取多个数据集,pandas,Pandas,我的文本文件包含每个数据库的表。熊猫有没有办法读取此文件并为每个数据库创建单独的数据帧 Database: ABC +-----------------------------------------------+----------+------------+ | Tables | Columns | Total Rows | +-------------------------------------------
Database: ABC
+-----------------------------------------------+----------+------------+
| Tables | Columns | Total Rows |
+-----------------------------------------------+----------+------------+
| ApplicationUpdateBankLog | 13 | 0 |
| ChangeLogTemp | 12 | 1678363 |
| Sheet2$ | 10 | 359 |
| tempAllowApplications | 1 | 9 |
+-----------------------------------------------+----------+------------+
4 rows in set.
Database: XYZ
+--------------------------------------------------+----------+------------+
| Tables | Columns | Total Rows |
+--------------------------------------------------+----------+------------+
| BKP_QualificationDetails_12082014 | 14 | 7959877 |
| BillNotGeneratedCount | 11 | 2312 |
| VVshipBenefit | 19 | 197356 |
| VVBenefit_Bkup29012016 | 19 | 101318 |
+--------------------------------------------------+----------+------------+
4 rows in set.
您可以使用
dict comprehension
创建数据帧的dict
:
import pandas as pd
from pandas.compat import StringIO
temp=u"""Database: ABC
+-----------------------------------------------+----------+------------+
| Tables | Columns | Total Rows |
+-----------------------------------------------+----------+------------+
| ApplicationUpdateBankLog | 13 | 0 |
| ChangeLogTemp | 12 | 1678363 |
| Sheet2$ | 10 | 359 |
| tempAllowApplications | 1 | 9 |
+-----------------------------------------------+----------+------------+
4 rows in set.
Database: XYZ
+--------------------------------------------------+----------+------------+
| Tables | Columns | Total Rows |
+--------------------------------------------------+----------+------------+
| BKP_QualificationDetails_12082014 | 14 | 7959877 |
| BillNotGeneratedCount | 11 | 2312 |
| VVshipBenefit | 19 | 197356 |
| VVBenefit_Bkup29012016 | 19 | 101318 |
+--------------------------------------------------+----------+------------+
4 rows in set."""
#after testing replace 'StringIO(temp)' to 'filename.csv'
df = pd.read_csv(StringIO(temp), sep="|", names=['a', 'Tables', 'Columns', 'Total Rows'])
您可以使用dict comprehension
创建数据帧的dict
:
import pandas as pd
from pandas.compat import StringIO
temp=u"""Database: ABC
+-----------------------------------------------+----------+------------+
| Tables | Columns | Total Rows |
+-----------------------------------------------+----------+------------+
| ApplicationUpdateBankLog | 13 | 0 |
| ChangeLogTemp | 12 | 1678363 |
| Sheet2$ | 10 | 359 |
| tempAllowApplications | 1 | 9 |
+-----------------------------------------------+----------+------------+
4 rows in set.
Database: XYZ
+--------------------------------------------------+----------+------------+
| Tables | Columns | Total Rows |
+--------------------------------------------------+----------+------------+
| BKP_QualificationDetails_12082014 | 14 | 7959877 |
| BillNotGeneratedCount | 11 | 2312 |
| VVshipBenefit | 19 | 197356 |
| VVBenefit_Bkup29012016 | 19 | 101318 |
+--------------------------------------------------+----------+------------+
4 rows in set."""
#after testing replace 'StringIO(temp)' to 'filename.csv'
df = pd.read_csv(StringIO(temp), sep="|", names=['a', 'Tables', 'Columns', 'Total Rows'])
print (dfs['Database: ABC'])
Tables Columns Total Rows
0 ApplicationUpdateBankLog 13 0
1 ChangeLogTemp 12 1678363
2 Sheet2$ 10 359
3 tempAllowApplications 1 9
print (dfs['Database: XYZ'])
Tables Columns Total Rows
0 BKP_QualificationDetails_12082014 14 7959877
1 BillNotGeneratedCount 11 2312
2 VVshipBenefit 19 197356
3 VVBenefit_Bkup29012016 19 101318