python中的正则表达式不'；行不通_Python_Regex

python中的正则表达式不'；行不通

python regex

python中的正则表达式不'；行不通,python,regex,Python,Regex,我正在做《Python for Informatics》一书中的练习，它要求我编写一个程序来模拟UNIX上grep命令的操作。但是，我的代码不起作用。在这里，我简化了代码，只想计算有多少行以“Find”开头。我很困惑，希望你能解释一下 from urllib.request import urlopen import re fhand = urlopen('http://www.py4inf.com/code/mbox-short.txt') sumFind = 0 for line in

我正在做《Python for Informatics》一书中的练习，它要求我编写一个程序来模拟UNIX上grep命令的操作。但是，我的代码不起作用。在这里，我简化了代码，只想计算有多少行以“Find”开头。我很困惑，希望你能解释一下

from urllib.request import urlopen
import re

fhand = urlopen('http://www.py4inf.com/code/mbox-short.txt')
sumFind = 0

for line in fhand:
    line = str(line) #convert from byte to string for re operation
    if re.search('^From',line) is not None:
        sumFind+=1

print(f'There are {sumFind} lines that match.')

脚本的输出是

有0行匹配

以下是输入文本的链接：

非常感谢您的时间。

错误在于使用

str

将字节转换为字符串

>>> str(b'foo')
"b'foo'"

你会需要的

line = line.decode()

但最好的方法是将字节正则表达式传递给正则表达式，这是受支持的：

for line in fhand:
    if re.search(b'^From',line) is not None:
        sumFind+=1

现在我有54场比赛

请注意，您可以将整个循环简化为：

sum_find = sum(bool(re.match(b'From',line)) for line in fhand)

```
re.match
```
将使用
```
^
```
替换为搜索
无需循环，
```
sum
```
计算
```
re.match
```
返回真实值的次数（显式转换为
```
bool
```
，因此可以求和0或1）

如果没有正则表达式，甚至更简单：

sum_find = sum(line.startswith(b"From") for line in fhand)

错误在于使用

str

将字节转换为字符串

>>> str(b'foo')
"b'foo'"

你会需要的

line = line.decode()

但最好的方法是将字节正则表达式传递给正则表达式，这是受支持的：

for line in fhand:
    if re.search(b'^From',line) is not None:
        sumFind+=1

现在我有54场比赛

请注意，您可以将整个循环简化为：

sum_find = sum(bool(re.match(b'From',line)) for line in fhand)

```
re.match
```
将使用
```
^
```
替换为搜索
无需循环，
```
sum
```
计算
```
re.match
```
返回真实值的次数（显式转换为
```
bool
```
，因此可以求和0或1）

如果没有正则表达式，甚至更简单：

sum_find = sum(line.startswith(b"From") for line in fhand)

问题是urllib模块从url/文本文件返回字节而不是字符串

您可以：

在正则表达式搜索中使用字节：re.search（b'From'，line）

使用“请求”模块以字符串形式下载文件并按行拆分：

导入请求

txt=requests.get（“”）.text.split（“\n”）

对于txt中的行：