如何将文本文件中的输入格式化为python中的defaultdict
文本文件有超过50K行使用此格式如何将文本文件中的输入格式化为python中的defaultdict,python,Python,文本文件有超过50K行使用此格式 M:org.apache.mahout.common.RandomUtilsTest:testHashDouble():['(O)java.lang.Double:<init>(double)', '(M)java.lang.Double:hashCode()', '(S)org.apache.mahout.common.RandomUtils:hashDouble(double)', '(S)org.apache.mahout.common.Ran
M:org.apache.mahout.common.RandomUtilsTest:testHashDouble():['(O)java.lang.Double:<init>(double)', '(M)java.lang.Double:hashCode()', '(S)org.apache.mahout.common.RandomUtils:hashDouble(double)', '(S)org.apache.mahout.common.RandomUtilsTest:assertEquals(long,long)', '(O)java.lang.Double:<init>(double)']
M:org.apache.mahout.common.RandomUtilsTest:testHashFloat():['(M)java.util.Random:nextLong()', '(M)java.util.Random:nextLong()', '(M)java.util.Random:nextLong()', '(S)org.apache.mahout.common.RandomUtilsTest:assertEquals(java.lang.String,long,long)']
M:org.apache.mahout.math.AbstractVectorTest:testAssignBinaryFunction():['(I)org.apache.mahout.math.Vector:assign(org.apache.mahout.math.Vector,org.apache.mahout.math.function.DoubleDoubleFunction)', '(O)java.lang.StringBuilder:<init>()', '(I)org.apache.mahout.math.Vector:getQuick(int)', '(S)org.apache.mahout.math.AbstractVectorTest:assertEquals(java.lang.String,double,double,double)']
M:org.apache.mahout.math.AbstractVectorTest:testAssignBinaryFunction2():['(S)org.apache.mahout.math.function.Functions:plus(double)', '(I)org.apache.mahout.math.Vector:assign(org.apache.mahout.math.function.DoubleFunction)', '(S)org.apache.mahout.math.AbstractVectorTest:assertEquals(java.lang.String,double,double,double)']
使用可以将字符串列表转换为list
from collections import defaultdict
import ast
with open('tst.txt') as fp:
d = defaultdict(list)
for line in fp:
k, v = line[: line.index('):') + 1], ast.literal_eval(line[line.index(':[') + 1:])
d[k] += v
print(dict(d))
输出:
{
M:org.apache.mahout.common.RandomUtilsTest:testHashDoubl : ['(O)java.lang.Double:<init>(double)', '(M)java.lang.Double:hashCode()', '(S)org.apache.mahout.common.RandomUtils:hashDouble(double)', '(S)org.apache.mahout.common.RandomUtilsTest:assertEquals(long,long)', '(O)java.lang.Double:<init>(double)']
M:org.apache.mahout.common.RandomUtilsTest:testHashFloa : ['(M)java.util.Random:nextLong()', '(M)java.util.Random:nextLong()', '(M)java.util.Random:nextLong()', '(S)org.apache.mahout.common.RandomUtilsTest:assertEquals(java.lang.String,long,long)']
M:org.apache.mahout.math.AbstractVectorTest:testAssignBinaryFunctio : ['(I)org.apache.mahout.math.Vector:assign(org.apache.mahout.math.Vector,org.apache.mahout.math.function.DoubleDoubleFunction)', '(O)java.lang.StringBuilder:<init>()', '(I)org.apache.mahout.math.Vector:getQuick(int)', '(S)org.apache.mahout.math.AbstractVectorTest:assertEquals(java.lang.String,double,double,double)']
M:org.apache.mahout.math.AbstractVectorTest:testAssignBinaryFunction : ['(S)org.apache.mahout.math.function.Functions:plus(double)', '(I)org.apache.mahout.math.Vector:assign(org.apache.mahout.math.function.DoubleFunction)', '(S)org.apache.mahout.math.AbstractVectorTest:assertEquals(java.lang.String,double,double,double)']
}
{
M:org.apache.mahout.common.RandomUtilsTest:testHashDoubl:['(O)java.lang.Double:(Double)'(M)java.lang.Double:hashCode(),'(S)org.apache.mahout.common.RandomUtils:hashDouble:(S)org.apache.mahout.common.RandomUtilsTest:assertEquals(long,long)'(O)java.lang.Double:(Double)]
M:org.apache.mahout.common.RandomUtilsTest:testHashFloa:['(M)java.util.Random:nextLong(),'(M)java.util.Random:nextLong(),'(M)java.util.Random:nextLong(),'(S)org.apache.mahout.common.RandomUtilsTest:assertEquals(java.lang.String,long,long,long)]
M:org.apache.mahout.math.AbstractVectorTest:TestassignBinaryFunction:['(I)org.apache.mahout.math.Vector:assign(org.apache.mahout.math.Vector,org.apache.mahout.math.function.DoubleDoubleDoubleFunction),'(O)java.lang.StringBuilder:(),(I)org.apache.mahout.math.Vector:getQuick(S)org.apache.mahout.math.math.AbstractVectorTest:assertEquals(java.lang.String,double,double,double)]
M:org.apache.mahout.math.AbstractVectorTest:testAssignBinaryFunction:['(S)org.apache.mahout.math.function.function:plus(double)',(I)org.apache.mahout.math.Vector:assertEquals(java.lang.String,double,double,double)']
}
json模块可用于将python字典存储到文件中,然后加载该文件,并在将其写入文件之前将其解析为相同的数据类型
d = {}
with open('filtered.txt') as input:
for line in input:
key, value = line.strip().split("():")
key = "{}()".format(key)
d[key] = value
print(d)
# It would be better and easy if you write the data to the file using json module
import json
with open('data.txt', 'w') as json_file:
json.dump(d, json_file)
# Later you can read the file using the json module itself
with open('data.txt') as f:
# this data would be a dicitonay which can be easily managed.
data = json.load(f)
参考:和您能使用刚才描述的输入提供您预期的dict输出格式的示例吗?只是为了避免混淆。字典将输出与上面相同的内容,上面列出的示例行来自一个文本文件,其中数据来自我上面添加的附加代码创建的字典。现在我正在尝试要将文本文件中的数据反向解析到字典中以供更多使用,这里最好的解决方案是改进输出的格式!只需使用JSON之类的现有格式。这是否回答了您的问题?k,v=line[:line.index('():')-1],ast.literal\u eval(line[line.index(':[')+1:)ValueError:找不到子字符串我可以直接与您联系吗?我在上向您发送了一条消息twitter@kit文件中的某些值格式错误,并且不包含
[
符号
d = {}
with open('filtered.txt') as input:
for line in input:
key, value = line.strip().split("():")
key = "{}()".format(key)
d[key] = value
print(d)
# It would be better and easy if you write the data to the file using json module
import json
with open('data.txt', 'w') as json_file:
json.dump(d, json_file)
# Later you can read the file using the json module itself
with open('data.txt') as f:
# this data would be a dicitonay which can be easily managed.
data = json.load(f)