Python 将具有类似数据帧内容的大字符串转换为数据帧_Python_Pandas_Dataframe

Python 将具有类似数据帧内容的大字符串转换为数据帧

python pandas dataframe

Python 将具有类似数据帧内容的大字符串转换为数据帧,python,pandas,dataframe,Python,Pandas,Dataframe,所以我有一个文件，我可以通过Python read函数来操作它，这个函数返回一个大字符串，它基本上看起来像一个数据帧，但仍然是一个大字符串。例如，它可能看起来像这样： 1609441 test.test1.test3 1/15.34 -1 100 622 669 160441 test.test1.test3 2/11.101 -1 100 140216 177363 16041 test2.test8.test6 2/15.34 -1 100 2791

所以我有一个文件，我可以通过Python read函数来操作它，这个函数返回一个大字符串，它基本上看起来像一个数据帧，但仍然是一个大字符串。例如，它可能看起来像这样：

1609441 test.test1.test3    1/15.34 -1  100 622 669
160441  test.test1.test3    2/11.101    -1  100 140216  177363
16041   test2.test8.test6   2/15.34 -1  100 2791    2346
160441  test.test7.test5    2/15.34 1   100 Bin Any 5   1794    2346
1609441 test4.test4.test4   2/15.34 1   100 E   Any 5   997 0
1642    test4.test3.test1   28.0.101    -1  100 5409155 10357332

如果它是一个真实的数据帧，它将看起来像：

1609441 test.test1.test3    1/15.34   -1    100   622       669
160441  test.test1.test3    2/11.101  -1    100   140216    177363
16041   test2.test8.test6   2/15.34   -1    100   2791      2346
160441  test.test7.test5    2/15.34   1     100   Bin       A          5    1794    2346
1609441 test4.test4.test4   2/15.34   1     100   E         A          5    997     0
1642    test4.test3.test1   28.0.101  -1    1     155       7332

可以看出，数据变化很大。有些有10行不同的数据，有些只有7行，以此类推。同样，这是一个大的文本字符串，我尝试了

read_csv

和

read_fwf

，但没有成功。最理想的情况是，它只需创建一个具有固定列数（我知道最大列数）的数据帧，如果没有任何值，那么只需生成一个

NaN

值即可

这可以通过任何方式实现吗？

我尝试了

read\u csv

，看起来效果不错：

t = '''1609441 test.test1.test3    1/15.34 -1  100 622 669
160441  test.test1.test3    2/11.101    -1  100 140216  177363
16041   test2.test8.test6   2/15.34 -1  100 2791    2346
160441  test.test7.test5    2/15.34 1   100 Bin Any 5   1794    2346
1609441 test4.test4.test4   2/15.34 1   100 E   Any 5   997 0
1642    test4.test3.test1   28.0.101    -1  100 5409155 10357332'''

with open('test.txt', 'w') as f:
    f.write(t)
    
pd.read_csv('test.txt', delim_whitespace=True, names=['1', '2', '3' ,'4', '5', '6' ,'7' ,'8', '9', '10'])

这对完整的数据集不起作用吗

我只是想知道，如果不先保存

test.txt

文件，就不能这样做吗？这似乎是一个额外的不必要的步骤，这会使所有操作花费更长的时间，因为它必须将所有这些行保存到硬盘上的一个文件中，而不仅仅是将其保存在RAM中。您可以使用此答案中描述的

StringIO

方法：