Python 在文本文件中的行间读取_Python

Python 在文本文件中的行间读取

python

Python 在文本文件中的行间读取,python,Python,首先，我的示例文本文件的内容如下所示： filename = 'mydata.txt' with open(filename) as f: for line in f: if 'Start here' in line: break for line in f: if 'End here' in line: break print line.strip() 一些数据没什么重要的

首先，我的示例文本文件的内容如下所示：

filename = 'mydata.txt'

with open(filename) as f:
    for line in f:
        if 'Start here' in line:
            break

    for line in f:
        if 'End here' in line:
            break
        print line.strip()

一些数据
没什么重要的
从这里开始
这很重要
也抓住这条线
还有这个小野
到此为止
文本继续，但并不重要
下一个文本
废话

现在，我想读入文本文件，我只想抓取“从这里开始”和“在这里结束”之间的行

因此，我的Python代码如下所示：

filename = 'example_file.txt'

with open(filename, 'r') as input:
   for line in input: # First loop breaks at specific line
       if 'Start here' in line:
           break

   for line_1 in input: # Second loop grabs all lines
       print line_1.strip()

   for line_2 in input: # Third loop breaks at specific line
       if 'End here' in line_2:
           break

但它不起作用

这里是我运行时的输出：

这很重要
也抓住这条线
还有这个
到此为止
文本继续，但并不重要
下一个文本
废话

正如您所看到的，我的脚本不会在这里结束。程序从正确的行开始，但不会在正确的行中断

怎么了？

您可以将正则表达式（

re

模块）与

re.DOTALL

选项一起使用，以便将换行符视为正则字符

import re

source = """Some Data
Nothing important
Start here
This is important
Grab this line too
And this ono too
End here
Text goes on, but isn't important
Next text
Blaah"""

# or else:
# source = open(filename, 'r').read() # or similar

result = re.search("Start here(.*)End here", source, re.DOTALL).group(1).strip()

print result

> This is important
> Grab this line too
> And this ono too

工作原理：

```
re.search
```
在某个字符串中查找模式
括号将组中的匹配项分隔开。第一组是整个模式，第二组是括号。组可以排序和嵌套
```
*
```
表示“任意字符，任意次数”。需要在两个硬编码标记（即
```
在此处开始
```
和
```
在此处结束
```
）之间获取所有内容
```
re.DOTALL
```
是一个秘密：它将新行字符视为常规字符串字符。点是“任意字符”的符号，所以“Dot all”的意思是“将任意字符视为常规字符，甚至是新行字符”
```
group（1）
```
表示需要第二个（从零开始的索引）组，即括号内的组

您可以将正则表达式（

re

模块）与

re.DOTALL

选项一起使用，以便将换行符视为正则字符

import re

source = """Some Data
Nothing important
Start here
This is important
Grab this line too
And this ono too
End here
Text goes on, but isn't important
Next text
Blaah"""

# or else:
# source = open(filename, 'r').read() # or similar

result = re.search("Start here(.*)End here", source, re.DOTALL).group(1).strip()

print result

> This is important
> Grab this line too
> And this ono too

工作原理：

```
re.search
```
在某个字符串中查找模式
括号将组中的匹配项分隔开。第一组是整个模式，第二组是括号。组可以排序和嵌套
```
*
```
表示“任意字符，任意次数”。需要在两个硬编码标记（即
```
在此处开始
```
和
```
在此处结束
```
）之间获取所有内容
```
re.DOTALL
```
是一个秘密：它将新行字符视为常规字符串字符。点是“任意字符”的符号，所以“Dot all”的意思是“将任意字符视为常规字符，甚至是新行字符”
```
group（1）
```
表示需要第二个（从零开始的索引）组，即括号内的组

for line_1 in input:
    if 'End here' in line_1:
        break
    print line_1.strip()

for line_1 in input:
    if 'End here' in line_1:
        break
    print line_1.strip()

filename = 'example_file.txt'

useful_content = []
with open(filename, 'r') as input:
    all_lines = input.readlines()  # read all lines
    for idx in range(len(all_lines)):  # iterate all lines
    if 'Start here' in all_lines[idx]:
        useful_content.append(all_lines[idx].strip())
        idx = idx + 1
        # found start of useful contents, continue iterate till it ends
        while 'End here' not in all_lines[idx]:
            useful_content.append(all_lines[idx].strip())
            idx = idx + 1
        break
for line in useful_content:
    print(line)

filename = 'example_file.txt'

useful_content = []
with open(filename, 'r') as input:
    all_lines = input.readlines()  # read all lines
    for idx in range(len(all_lines)):  # iterate all lines
    if 'Start here' in all_lines[idx]:
        useful_content.append(all_lines[idx].strip())
        idx = idx + 1
        # found start of useful contents, continue iterate till it ends
        while 'End here' not in all_lines[idx]:
            useful_content.append(all_lines[idx].strip())
            idx = idx + 1
        break
for line in useful_content:
    print(line)

filename = 'mydata.txt'

with open(filename, 'r') as f:
    for line in f:
        if 'Start here' in line:
            break

    for line_1 in f:
        if 'End here' in line:
            break
        else:
            print line.strip()

for循环上的变量仅限于for循环，因此我们可以重用该名称

中断后的任何代码都不会以任何方式运行，因此我们可以去掉其他

open
默认情况下使用读取模式


记住这一点，您的最终代码如下所示：
filename = 'mydata.txt'

with open(filename) as f:
    for line in f:
        if 'Start here' in line:
            break

    for line in f:
        if 'End here' in line:
            break
        print line.strip()

运行该命令，您将获得所需的输出：
This is important
Grab this line too
And this ono too

您的问题是，您应该在第二个循环中检查'End Here'，因为第二个循环和第三个循环不会同时运行。事实上，第三个循环甚至不会运行
考虑到这一点，此代码将起作用：
filename = 'mydata.txt'

with open(filename, 'r') as f:
    for line in f:
        if 'Start here' in line:
            break

    for line_1 in f:
        if 'End here' in line:
            break
        else:
            print line.strip()

但是，我们仍然可以进行一些优化：

for循环上的变量仅限于for循环，因此我们可以重用该名称
中断后的任何代码都不会以任何方式运行，因此我们可以去掉其他

open
默认情况下使用读取模式

记住这一点，您的最终代码如下所示：
filename = 'mydata.txt'

with open(filename) as f:
    for line in f:
        if 'Start here' in line:
            break

    for line in f:
        if 'End here' in line:
            break
        print line.strip()

运行该命令，您将获得所需的输出：
This is important
Grab this line too
And this ono too

塔·赫尔顿·比克，你的解决方案有效。但我是Python初学者。你能解释一下“第一组”是干什么的吗？为什么是小组第一？我认为这个表达式“（*）”就是我能读懂字里行间的原因，对吗？塔赫尔顿比克，你的解决方案有效。但我是Python初学者。你能解释一下“第一组”是干什么的吗？为什么是小组第一？我认为这个表达“（*）”是我能读懂字里行间的原因，对吗？大虫堂，但我认为你的解决方案可能会导致一个问题。假设文本文件非常大。在你的解决方案中，所有内容都写在内存（RAM）中。TA Chong Tang，但我认为你的解决方案可能会导致问题。假设文本文件非常大。在您的解决方案中，所有内容都写入内存（RAM）。