从文件中提取特定行并在python中创建数据段
尝试编写python脚本以从文件中提取行。该文件是一个文本文件,它是python suds输出的转储 我想:从文件中提取特定行并在python中创建数据段,python,Python,尝试编写python脚本以从文件中提取行。该文件是一个文本文件,它是python suds输出的转储 我想: 除去单词和数字以外的所有字符。我不需要任何“\n”、“[”、“]”、“{”、“=”等字符 查找以“arrayofxsd_string”开头的部分 从结果中删除下一行“item[]=” 抓取剩下的6行,根据第五行(123456234567345678)上的唯一数字创建一个字典,使用这个数字作为键,剩下的行作为值(请原谅我的无知,如果我没有用pythonic术语解释的话) 将结果输出到文件中
[(ArrayOf_xsd_string){
item[] =
"001",
"ABCD",
"1234",
"wordy type stuff",
"123456",
"more stuff, etc",
}, (ArrayOf_xsd_string){
item[] =
"002",
"ABCD",
"1234",
"wordy type stuff",
"234567",
"more stuff, etc",
}, (ArrayOf_xsd_string){
item[] =
"003",
"ABCD",
"1234",
"wordy type stuff",
"345678",
"more stuff, etc",
}]
我尝试重新编译,下面是我对代码的拙劣尝试:
import re, string
f = open('data.txt', 'rb')
linelist = []
for line in f:
line = re.compile('[\W_]+')
line.sub('', string.printable)
linelist.append(line)
print linelist
newlines = []
for line in linelist:
mylines = line.split()
if re.search(r'\w+', 'ArrayOf_xsd_string'):
newlines.append([next(linelist) for _ in range(6)])
print newlines
我是一个Python新手,在google或stackoverflow上还没有找到关于如何在找到特定文本后提取特定行数的任何结果。非常感谢您的帮助
请忽略我的代码,因为我在“黑暗中拍摄”:
以下是我希望看到的结果:
123456: 001,ABCD,1234,wordy type stuff,more stuff etc
234567: 002,ABCD,1234,wordy type stuff,more stuff etc
345678: 003,ABCD,1234,wordy type stuff,more stuff etc
我希望这有助于解释我的有缺陷的代码。如果您想在匹配的特定行之后提取特定数量的行。您也可以简单地使用readline读取数组,循环查找匹配的行,然后从数组中提取下一个N行。此外,您还可以在readline中使用while循环,如果文件可能变大,则更可取 以下是我能想到的最直接的代码修复,但它不一定是最好的整体实现,我建议遵循上面的提示,除非你有充分的理由不去做或只是想尽快完成工作;) 如果我正确地解释了您的需求,您应该做您想做的事情。这表示:只取下一行,然后取下17行(因此,直到但不包括匹配后的第20行),将它们附加到换行符(您不能一次附加整个列表,该列表将成为您要添加它们的列表中的单个索引)
祝你好运愉快:)关于你的代码的几点建议: 剥离所有非字母数字字符是完全不必要的,也是浪费时间的;根本不需要构建
行列表
。您知道您可以简单地使用普通的字符串。查找(“ArrayOf_xsd_string”)
或重新搜索(…)
\uuu
已经包含在\W
中了。但是下面对行的重新分配会覆盖您刚才读取的行
for line in f:
line = re.compile('[\W_]+') # overwrites the line you just read??
line.sub('', string.printable)
这是我的版本,它直接读取文件,还处理多个匹配:
with open('data.txt', 'r') as f:
theDict = {}
found = -1
for (lineno,line) in enumerate(f):
if found < 0:
if line.find('ArrayOf_xsd_string')>=0:
found = lineno
entries = []
continue
# Grab following 6 lines...
if 2 <= (lineno-found) <= 6+1:
entry = line.strip(' ""{}[]=:,')
entries.append(entry)
#then create a dict with the key from line 5
if (lineno-found) == 6+1:
key = entries.pop(4)
theDict[key] = entries
print key, ','.join(entries) # comma-separated, no quotes
#break # if you want to end on first match
found = -1 # to process multiple matches
让我们一起享受迭代器的乐趣吧
class SudsIterator(object):
"""extracts xsd strings from suds text file, and returns a
(key, (value1, value2, ...)) tuple with key being the 5th field"""
def __init__(self, filename):
self.data_file = open(filename)
def __enter__(self): # __enter__ and __exit__ are there to support
return self # `with SudsIterator as blah` syntax
def __exit__(self, exc_type, exc_val, exc_tb):
self.data_file.close()
def __iter__(self):
return self
def next(self): # in Python 3+ this should be __next__
"""looks for the next 'ArrayOf_xsd_string' item and returns it as a
tuple fit for stuffing into a dict"""
data = self.data_file
for line in data:
if 'ArrayOf_xsd_string' not in line:
continue
ignore = next(data)
val1 = next(data).strip()[1:-2] # discard beginning whitespace,
val2 = next(data).strip()[1:-2] # quotes, and comma
val3 = next(data).strip()[1:-2]
val4 = next(data).strip()[1:-2]
key = next(data).strip()[1:-2]
val5 = next(data).strip()[1:-2]
break
else:
self.data_file.close() # make sure file gets closed
raise StopIteration() # and keep raising StopIteration
return key, (val1, val2, val3, val4, val5)
data = dict()
for key, value in SudsIterator('data.txt'):
data[key] = value
print data
使用Python2.6.1,我在运行代码时遇到以下错误:AttributeError:“builtin_function_或_method”对象没有属性“split”(修复了它-input.split('\n')是测试的遗留问题,我在润色代码时内联了您的示例数据)。你可以自己解决这个问题,或者至少在接受之前告诉我。我比你接受的邮件早一天发布了这封邮件。我认为这封邮件的风格更好,也不太容易混淆。特别是链接比较
if 2,我做了一些小改动使之生效:entry=line.strip(“{}[]=:,\n”)如何将输出添加到文件中?我尝试了“for line in”语句,但一次只添加一行。您实际上不需要去除“[]=”,因为2谢谢!这个示例的工作原理与它所说的完全相同。我喜欢它将每行拆分的方式,因此如果我想添加更多,我可以轻松地完成。非常适合我们这些新手!
123456 001,ABCD,1234,wordy type stuff,more stuff, etc
234567 002,ABCD,1234,wordy type stuff,more stuff, etc
345678 003,ABCD,1234,wordy type stuff,more stuff, etc
class SudsIterator(object):
"""extracts xsd strings from suds text file, and returns a
(key, (value1, value2, ...)) tuple with key being the 5th field"""
def __init__(self, filename):
self.data_file = open(filename)
def __enter__(self): # __enter__ and __exit__ are there to support
return self # `with SudsIterator as blah` syntax
def __exit__(self, exc_type, exc_val, exc_tb):
self.data_file.close()
def __iter__(self):
return self
def next(self): # in Python 3+ this should be __next__
"""looks for the next 'ArrayOf_xsd_string' item and returns it as a
tuple fit for stuffing into a dict"""
data = self.data_file
for line in data:
if 'ArrayOf_xsd_string' not in line:
continue
ignore = next(data)
val1 = next(data).strip()[1:-2] # discard beginning whitespace,
val2 = next(data).strip()[1:-2] # quotes, and comma
val3 = next(data).strip()[1:-2]
val4 = next(data).strip()[1:-2]
key = next(data).strip()[1:-2]
val5 = next(data).strip()[1:-2]
break
else:
self.data_file.close() # make sure file gets closed
raise StopIteration() # and keep raising StopIteration
return key, (val1, val2, val3, val4, val5)
data = dict()
for key, value in SudsIterator('data.txt'):
data[key] = value
print data