Python 如何基于文件中的“\n”从文件创建多个数据帧？_Python_Pandas

Python 如何基于文件中的“\n”从文件创建多个数据帧？

python pandas

Python 如何基于文件中的“\n”从文件创建多个数据帧？,python,pandas,Python,Pandas,我的档案如下：我想在存在数据帧时创建单独的数据帧。高于\n的值将是该特定数据帧的标头。我检查了一个解决方案，但它以不同的方式工作。谢谢你的帮助。谢谢 file1.txt 10C.vcf Allele Consequence IMPACT SYMBOL Gene Feature_type Feature 1P1.vcf Allele Consequence IMPACT SYMBOL Gene Feature_type Feature 13C.vcf Allele Consequence

我的档案如下：

我想在存在数据帧时创建单独的数据帧。高于\n的值将是该特定数据帧的标头。我检查了一个解决方案，但它以不同的方式工作。谢谢你的帮助。谢谢

file1.txt

10C.vcf

Allele
Consequence
IMPACT
SYMBOL
Gene
Feature_type
Feature
1P1.vcf

Allele
Consequence
IMPACT
SYMBOL
Gene
Feature_type
Feature
13C.vcf

Allele
Consequence
IMPACT
SYMBOL
Gene
Feature_type
Feature
40C.vcf

Allele
Consequence
IMPACT
SYMBOL
Gene
Feature_type
Feature

数据的预期输出：

df1

   10C.vcf
0  Allele
1  Consequence
2  IMPACT
3  SYMBOL
4  Gene
5  Feature_type
6  Feature

df2

   1P1.vcf
0  Allele
1  Consequence
2  IMPACT
3  SYMBOL
4  Gene
5  Feature_type
6  Feature

df3

   13C.vcf
0  Allele
1  Consequence
2  IMPACT
3  SYMBOL
4  Gene
5  Feature_type
6  Feature

df4

   40C.vcf
0  Allele
1  Consequence
2  IMPACT
3  SYMBOL
4  Gene
5  Feature_type
6  Feature

我想在存在数据帧时创建单独的数据帧。高于\n的值将是该特定数据帧的标头。我检查了一个解决方案，但它以不同的方式工作。谢谢你的帮助。谢谢

以下是您想要的方法。我在代码中添加了注释，因此您可以理解我在做什么

方法：创建一个全局变量。然后使用globals变量定义动态数据帧。然后访问该值，因为它现在是动态数据帧

如果您不太清楚这是如何工作的，请参阅我定义的堆栈溢出解决方案

其输出将为：

df_10C_vcf：

0          Allele
1     Consequence
2          IMPACT
3          SYMBOL
4            Gene
5    Feature_type
6         Feature

df_1P1_vcf：

0          Allele
1     Consequence
2          IMPACT
3          SYMBOL
4            Gene
5    Feature_type
6         Feature

df_13C_vcf：

0          Allele
1     Consequence
2          IMPACT
3          SYMBOL
4            Gene
5    Feature_type
6         Feature

df_40C_vcf：

0          Allele
1     Consequence
2          IMPACT
3          SYMBOL
4            Gene
5    Feature_type

这是你想怎么做的。我在代码中添加了注释，因此您可以理解我在做什么

方法：创建一个全局变量。然后使用globals变量定义动态数据帧。然后访问该值，因为它现在是动态数据帧

如果您不太清楚这是如何工作的，请参阅我定义的堆栈溢出解决方案

其输出将为：

df_10C_vcf：

0          Allele
1     Consequence
2          IMPACT
3          SYMBOL
4            Gene
5    Feature_type
6         Feature

df_1P1_vcf：

0          Allele
1     Consequence
2          IMPACT
3          SYMBOL
4            Gene
5    Feature_type
6         Feature

df_13C_vcf：

0          Allele
1     Consequence
2          IMPACT
3          SYMBOL
4            Gene
5    Feature_type
6         Feature

df_40C_vcf：

0          Allele
1     Consequence
2          IMPACT
3          SYMBOL
4            Gene
5    Feature_type

重新导入
作为pd进口熊猫
从io导入StringIO
# 1. 以字符串形式读取txt文件
将open（'file1.txt'，'r'）作为f：
txt=f.read（）
# 2. 用“*.vcf”分割您的txt`
数据=重新拆分（r'\n（？=[A-Za-z0-9]+\.vcf'），txt）
#或
#数据=re.split（r'\n（？=.+\.vcf'），txt）
打印（数据）
# 3. 从字符串创建DF
对于数据中的数据：
打印（pd.read\u csv（StringIO（数据），delim\u空格=True））

输出

['10C.vcf\n\nAllele\nConsequence\nIMPACT\nSYMBOL\nGene\nFeature_type\nFeature', '1P1.vcf\n\nAllele\nConsequence\nIMPACT\nSYMBOL\nGene\nFeature_type\nFeature', '13C.vcf\n\nAllele\nConsequence\nIMPACT\nSYMBOL\nGene\nFeature_type\nFeature', '40C.vcf\n\nAllele\nConsequence\nIMPACT\nSYMBOL\nGene\nFeature_type\nFeature\n']


        10C.vcf
0        Allele
1   Consequence
2        IMPACT
3        SYMBOL
4          Gene
5  Feature_type
6       Feature
        1P1.vcf
0        Allele
1   Consequence
2        IMPACT
3        SYMBOL
4          Gene
5  Feature_type
6       Feature
        13C.vcf
0        Allele
1   Consequence
2        IMPACT
3        SYMBOL
4          Gene
5  Feature_type
6       Feature
        40C.vcf
0        Allele
1   Consequence
2        IMPACT
3        SYMBOL
4          Gene
5  Feature_type
6       Feature

重新导入
作为pd进口熊猫
从io导入StringIO
# 1. 以字符串形式读取txt文件
将open（'file1.txt'，'r'）作为f：
txt=f.read（）
# 2. 用“*.vcf”分割您的txt`
数据=重新拆分（r'\n（？=[A-Za-z0-9]+\.vcf'），txt）
#或
#数据=re.split（r'\n（？=.+\.vcf'），txt）
打印（数据）
# 3. 从字符串创建DF
对于数据中的数据：
打印（pd.read\u csv（StringIO（数据），delim\u空格=True））

输出

['10C.vcf\n\nAllele\nConsequence\nIMPACT\nSYMBOL\nGene\nFeature_type\nFeature', '1P1.vcf\n\nAllele\nConsequence\nIMPACT\nSYMBOL\nGene\nFeature_type\nFeature', '13C.vcf\n\nAllele\nConsequence\nIMPACT\nSYMBOL\nGene\nFeature_type\nFeature', '40C.vcf\n\nAllele\nConsequence\nIMPACT\nSYMBOL\nGene\nFeature_type\nFeature\n']


        10C.vcf
0        Allele
1   Consequence
2        IMPACT
3        SYMBOL
4          Gene
5  Feature_type
6       Feature
        1P1.vcf
0        Allele
1   Consequence
2        IMPACT
3        SYMBOL
4          Gene
5  Feature_type
6       Feature
        13C.vcf
0        Allele
1   Consequence
2        IMPACT
3        SYMBOL
4          Gene
5  Feature_type
6       Feature
        40C.vcf
0        Allele
1   Consequence
2        IMPACT
3        SYMBOL
4          Gene
5  Feature_type
6       Feature

您可以编写代码，手动迭代文件的行，并创建包含每个数据帧内容的列表吗？为什么不将完整文件读入一个列表，然后按

\n\n

拆分列表。这会给你一个列表。然后您可以创建一个循环来为列表项的每次出现创建数据帧？我不认为您可以将数据帧名称指定为

13C.vcf

。您需要将

替换为

。

您可以编写代码手动迭代文件的行，并创建包含每个数据帧内容的列表吗？为什么不将完整文件读入一个列表，然后按

\n\n

拆分列表。这会给你一个列表。然后您可以创建一个循环来为列表项的每次出现创建数据帧？我不认为您可以将数据帧名称指定为

13C.vcf

。您需要将

替换为

。