从文件中提取特定行并在python中创建数据段_Python

从文件中提取特定行并在python中创建数据段

python

从文件中提取特定行并在python中创建数据段,python,Python,尝试编写python脚本以从文件中提取行。该文件是一个文本文件，它是python suds输出的转储我想：除去单词和数字以外的所有字符。我不需要任何“\n”、“[”、“]”、“{”、“=”等字符查找以“arrayofxsd_string”开头的部分从结果中删除下一行“item[]=” 抓取剩下的6行，根据第五行（123456234567345678）上的唯一数字创建一个字典，使用这个数字作为键，剩下的行作为值（请原谅我的无知，如果我没有用pythonic术语解释的话）将结果输出到文件中

尝试编写python脚本以从文件中提取行。该文件是一个文本文件，它是python suds输出的转储

我想：

除去单词和数字以外的所有字符。我不需要任何“\n”、“[”、“]”、“{”、“=”等字符

查找以“arrayofxsd_string”开头的部分

从结果中删除下一行“item[]=”

抓取剩下的6行，根据第五行（123456234567345678）上的唯一数字创建一个字典，使用这个数字作为键，剩下的行作为值（请原谅我的无知，如果我没有用pythonic术语解释的话）

将结果输出到文件中

文件中的数据是一个列表：

[(ArrayOf_xsd_string){
   item[] = 
      "001",
      "ABCD",
      "1234",
      "wordy type stuff",
      "123456",
      "more stuff, etc",
 }, (ArrayOf_xsd_string){
   item[] = 
      "002",
      "ABCD",
      "1234",
      "wordy type stuff",
      "234567",
      "more stuff, etc",
 }, (ArrayOf_xsd_string){
   item[] = 
      "003",
      "ABCD",
      "1234",
      "wordy type stuff",
      "345678",
      "more stuff, etc",
 }]

我尝试重新编译，下面是我对代码的拙劣尝试：

import re, string

f = open('data.txt', 'rb')
linelist = []
for line in f:
  line = re.compile('[\W_]+')
 line.sub('', string.printable)
 linelist.append(line)
 print linelist

newlines = []
for line in linelist:
    mylines = line.split()
    if re.search(r'\w+', 'ArrayOf_xsd_string'):
      newlines.append([next(linelist) for _ in range(6)])
      print newlines

我是一个Python新手，在google或stackoverflow上还没有找到关于如何在找到特定文本后提取特定行数的任何结果。非常感谢您的帮助

请忽略我的代码，因为我在“黑暗中拍摄”：

以下是我希望看到的结果：

123456: 001,ABCD,1234,wordy type stuff,more stuff etc
234567: 002,ABCD,1234,wordy type stuff,more stuff etc
345678: 003,ABCD,1234,wordy type stuff,more stuff etc

我希望这有助于解释我的有缺陷的代码。

如果您想在匹配的特定行之后提取特定数量的行。您也可以简单地使用readline读取数组，循环查找匹配的行，然后从数组中提取下一个N行。此外，您还可以在readline中使用while循环，如果文件可能变大，则更可取

以下是我能想到的最直接的代码修复，但它不一定是最好的整体实现，我建议遵循上面的提示，除非你有充分的理由不去做或只是想尽快完成工作；）

如果我正确地解释了您的需求，您应该做您想做的事情。这表示：只取下一行，然后取下17行（因此，直到但不包括匹配后的第20行），将它们附加到换行符（您不能一次附加整个列表，该列表将成为您要添加它们的列表中的单个索引）

祝你好运愉快：）

关于你的代码的几点建议：

剥离所有非字母数字字符是完全不必要的，也是浪费时间的；根本不需要构建

行列表

。您知道您可以简单地使用普通的

字符串。查找（“ArrayOf_xsd_string”）

或

重新搜索（…）

除去单词和数字以外的所有字符。我不需要任何“\n”、“[”、“]”、“{”、“=”等字符

查找以“arrayofxsd_string”开头的部分

从结果中删除下一行“item[]=”

那么对于您的正则表达式，

\uuu

已经包含在

\W

中了。但是下面对行的重新分配会覆盖您刚才读取的行

for line in f:
  line = re.compile('[\W_]+') # overwrites the line you just read??
  line.sub('', string.printable)

这是我的版本，它直接读取文件，还处理多个匹配：

with open('data.txt', 'r') as f:
    theDict = {}
    found = -1
    for (lineno,line) in enumerate(f):
        if found < 0:
            if line.find('ArrayOf_xsd_string')>=0:
                found = lineno
                entries = []
            continue
        # Grab following 6 lines...
        if 2 <= (lineno-found) <= 6+1:
            entry = line.strip(' ""{}[]=:,')
            entries.append(entry)
        #then create a dict with the key from line 5
        if (lineno-found) == 6+1:
            key = entries.pop(4)
            theDict[key] = entries
            print key, ','.join(entries) # comma-separated, no quotes
            #break # if you want to end on first match
            found = -1 # to process multiple matches

让我们一起享受迭代器的乐趣吧

class SudsIterator(object):
    """extracts xsd strings from suds text file, and returns a 
    (key, (value1, value2, ...)) tuple with key being the 5th field"""
    def __init__(self, filename):
        self.data_file = open(filename)
    def __enter__(self):  # __enter__ and __exit__ are there to support 
        return self       # `with SudsIterator as blah` syntax
    def __exit__(self, exc_type, exc_val, exc_tb):
        self.data_file.close()
    def __iter__(self):
        return self
    def next(self):     # in Python 3+ this should be __next__
        """looks for the next 'ArrayOf_xsd_string' item and returns it as a
        tuple fit for stuffing into a dict"""
        data = self.data_file
        for line in data:
            if 'ArrayOf_xsd_string' not in line:
                continue
            ignore = next(data)
            val1 = next(data).strip()[1:-2] # discard beginning whitespace,
            val2 = next(data).strip()[1:-2] #   quotes, and comma
            val3 = next(data).strip()[1:-2]
            val4 = next(data).strip()[1:-2]
            key = next(data).strip()[1:-2]
            val5 = next(data).strip()[1:-2]
            break
        else:
            self.data_file.close() # make sure file gets closed
            raise StopIteration()  # and keep raising StopIteration
        return key, (val1, val2, val3, val4, val5)

data = dict()
for key, value in SudsIterator('data.txt'):
    data[key] = value

print data

使用Python2.6.1，我在运行代码时遇到以下错误：AttributeError:“builtin_function_或_method”对象没有属性“split”（修复了它-input.split（'\n'）是测试的遗留问题，我在润色代码时内联了您的示例数据）。你可以自己解决这个问题，或者至少在接受之前告诉我。我比你接受的邮件早一天发布了这封邮件。我认为这封邮件的风格更好，也不太容易混淆。特别是链接比较

if 2，我做了一些小改动使之生效：entry=line.strip（“{}[]=：，\n”）如何将输出添加到文件中？我尝试了“for line in”语句，但一次只添加一行。您实际上不需要去除“[]=”，因为2谢谢！这个示例的工作原理与它所说的完全相同。我喜欢它将每行拆分的方式，因此如果我想添加更多，我可以轻松地完成。非常适合我们这些新手！
123456 001,ABCD,1234,wordy type stuff,more stuff, etc
234567 002,ABCD,1234,wordy type stuff,more stuff, etc
345678 003,ABCD,1234,wordy type stuff,more stuff, etc

class SudsIterator(object):
    """extracts xsd strings from suds text file, and returns a 
    (key, (value1, value2, ...)) tuple with key being the 5th field"""
    def __init__(self, filename):
        self.data_file = open(filename)
    def __enter__(self):  # __enter__ and __exit__ are there to support 
        return self       # `with SudsIterator as blah` syntax
    def __exit__(self, exc_type, exc_val, exc_tb):
        self.data_file.close()
    def __iter__(self):
        return self
    def next(self):     # in Python 3+ this should be __next__
        """looks for the next 'ArrayOf_xsd_string' item and returns it as a
        tuple fit for stuffing into a dict"""
        data = self.data_file
        for line in data:
            if 'ArrayOf_xsd_string' not in line:
                continue
            ignore = next(data)
            val1 = next(data).strip()[1:-2] # discard beginning whitespace,
            val2 = next(data).strip()[1:-2] #   quotes, and comma
            val3 = next(data).strip()[1:-2]
            val4 = next(data).strip()[1:-2]
            key = next(data).strip()[1:-2]
            val5 = next(data).strip()[1:-2]
            break
        else:
            self.data_file.close() # make sure file gets closed
            raise StopIteration()  # and keep raising StopIteration
        return key, (val1, val2, val3, val4, val5)

data = dict()
for key, value in SudsIterator('data.txt'):
    data[key] = value

print data