Python 在dict中的heading:section对上获取每个标题的节
我有这样一个文本文件:Python 在dict中的heading:section对上获取每个标题的节,python,parsing,python-3.x,Python,Parsing,Python 3.x,我有这样一个文本文件: APPENDIX -- GLOSSARY ------------------------------------------------------------------- Asymmetrical Encryption: Encryption using a pair of keys--the first encrypts a Big-O Notation, Complexity: Big-O notation is a way
APPENDIX -- GLOSSARY
-------------------------------------------------------------------
Asymmetrical Encryption:
Encryption using a pair of keys--the first encrypts a
Big-O Notation, Complexity:
Big-O notation is a way of describing the governing.
In noting complexity orders, constants and multipliers are
conventionally omitted, leaving only the dominant factor.
Compexities one often sees are:
#*------------- Common Big-O Complexities ---------------#
O(1) constant
Birthday Paradox:
The name "birthday paradox" comes from the fact--surprising
Cyclic Redundancy Check (CRC32):
See Hash. Based on mod 2 polynomial operations, CRC32 produces a
32-bit "fingerprint" of a set of data.
Idempotent Function:
The property that applying a function to its return value
'G=lambda x:F(F(F(...F(x)...)))'.
class spaceParser(object):
result = {}
last_title = ""
last_content = ""
def process_content(self, content_line):
if self.last_title:
self.last_content = self.last_content + content_line.strip()
self.result[self.last_title] = self.last_content
def process_title(self, content_line):
self.last_title = content_line.strip()
self.last_content = ""
def parse(self, raw_file):
for line in raw_file:
#look for patterns based in tabulation
if line[0:4] == " ":
#content type
self.process_content(line)
elif line[0:2] == " ":
#title type
self.process_title(line)
else:
#other types
pass
#append the last one
self.process_content("")
parser = spaceParser()
with open('appendix.txt', 'r') as raw_file:
parser.parse(raw_file)
print parser.result
我想解析文本文件,使其具有如下输出:
{'Asymmetrical Encryption': Encryption using a pair of keys--the first encrypts a,
'Big-O Notation, Complexity':'Big-O notation is a way of describing the governing. In noting complexity orders, constants and multipliers are conventionally omitted, leaving only the dominant factor. Compexities one often sees are: #*------------- Common Big-O Complexities ---------------# O(1) constant}', ..so on }
这就是我所做的:
dic = {}
with open('appendix.txt', 'r') as f:
data = f.read()
lines = data.split(':\n\n')
for line in lines:
res = line.split(':\n ')
field = res[0]
val = res[1:]
dic[field] = val
这会弄乱文本中的
:
值,尽管标题是空的。输出不正确 如果要基于第一个空格解析文本,可以使用如下脚本:
APPENDIX -- GLOSSARY
-------------------------------------------------------------------
Asymmetrical Encryption:
Encryption using a pair of keys--the first encrypts a
Big-O Notation, Complexity:
Big-O notation is a way of describing the governing.
In noting complexity orders, constants and multipliers are
conventionally omitted, leaving only the dominant factor.
Compexities one often sees are:
#*------------- Common Big-O Complexities ---------------#
O(1) constant
Birthday Paradox:
The name "birthday paradox" comes from the fact--surprising
Cyclic Redundancy Check (CRC32):
See Hash. Based on mod 2 polynomial operations, CRC32 produces a
32-bit "fingerprint" of a set of data.
Idempotent Function:
The property that applying a function to its return value
'G=lambda x:F(F(F(...F(x)...)))'.
class spaceParser(object):
result = {}
last_title = ""
last_content = ""
def process_content(self, content_line):
if self.last_title:
self.last_content = self.last_content + content_line.strip()
self.result[self.last_title] = self.last_content
def process_title(self, content_line):
self.last_title = content_line.strip()
self.last_content = ""
def parse(self, raw_file):
for line in raw_file:
#look for patterns based in tabulation
if line[0:4] == " ":
#content type
self.process_content(line)
elif line[0:2] == " ":
#title type
self.process_title(line)
else:
#other types
pass
#append the last one
self.process_content("")
parser = spaceParser()
with open('appendix.txt', 'r') as raw_file:
parser.parse(raw_file)
print parser.result
除非您已经知道字段名,否则您将很难做到这一点。-1用于
global
,并且因为您从未在raw\u文件上调用close
。(几乎)在处理文件时,始终使用和语法。