使用python从文本文件中查找并打印带引号的文本_Python_Quotation Marks

使用python从文本文件中查找并打印带引号的文本

python

使用python从文本文件中查找并打印带引号的文本,python,quotation-marks,Python,Quotation Marks,我是python初学者，希望python从文本文件中捕获所有带引号的文本。我尝试了以下方法： filename = raw_input("Enter the full path of the file to be used: ") input = open(filename, 'r') import re quotes = re.findall(ur'"[\^u201d]*["\u201d]', input) print quotes 我得到一个错误： Traceback (most rece

我是python初学者，希望python从文本文件中捕获所有带引号的文本。我尝试了以下方法：

filename = raw_input("Enter the full path of the file to be used: ")
input = open(filename, 'r')
import re
quotes = re.findall(ur'"[\^u201d]*["\u201d]', input)
print quotes

我得到一个错误：

Traceback (most recent call last):
  File "/Users/nithin/Documents/Python/Capture Quotes", line 5, in <module>
    quotes = re.findall(ur'"[\^u201d]*["\u201d]', input)
  File "/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/re.py", line 177, in findall
    return _compile(pattern, flags).findall(string)
TypeError: expected string or buffer

回溯（最近一次呼叫最后一次）：
文件“/Users/nithin/Documents/Python/Capture Quotes”，第5行，在
quotes=re.findall（ur'[^u201d]*[“\u201d]”，输入）
findall中的文件“/Library/Frameworks/Python.framework/Versions/2.7/lib/python2.7/re.py”，第177行
返回编译（模式、标志）.findall（字符串）
TypeError:应为字符串或缓冲区

有人能帮我吗？

您可以尝试一些python内置代码，而不是使用正则表达式。我会让你做艰苦的工作：

message = '''
"some text in quotes", some text not in quotes. Some more text 'In different kinds of quotes'.
'''
list_of_single_quote_items = message.split("'")
list_of_double_quote_items = message.split(""")

具有挑战性的部分将是解释拆分列表的含义并处理所有边缘条件（字符串中只有一个引号、转义序列等）。

正如Bakuriu指出的，您需要添加

.read（）

，如下所示：

quotes = re.findall(ur'[^\u201d]*[\u201d]', input.read())

open（）

仅返回文件对象，而

f.read（）

将返回字符串。此外，我猜您希望得到两个引号之间的所有内容，而不是在引号之前出现零次或多次的

[^u201d]

。所以我想试试这个：

quotes = re.findall(ur'[\u201d][^\u201d]*[\u201d]', input.read(), re.U)

re.U

使用unicode。或者（如果没有两组右双引号且不需要unicode）：

最后，您可能希望选择一个不同于

input

的变量，因为

input

在python中是一个关键字

您的结果可能如下所示：

>>> input2 = """
cfrhubecf "ehukl wehunkl echnk
wehukb ewni; wejio;"
"werulih"
"""
>>> quotes = re.findall(r'"[^"]*"', input2, re.U)
>>> print quotes
['"ehukl wehunkl echnk\nwehukb ewni; wejio;"', '"werulih"']

它需要跨多条线路工作吗？e、例如，它是否需要对“一两个”thr\nee”起作用？错误是因为当

re.findall

需要字符串时，您正在传递一个

文件

对象。使用

re.findall（regex，input.read（））

是的，它应该跨多行工作。除此之外，您还尝试将

unicode

表达式与8位

str

（文件内容）匹配。您需要知道文件使用的字符集，并对文件内容进行编码或对正则表达式进行解码。此外，您编写的正则表达式会查找一个

“

”，

，

，或

，然后是一个

”

或

“

。这听起来和你在文章中描述的不一样。您是否正在查找

“…”

或

“…”

中包含的任何内容？或者…？你试过哪一个？最后一个？分配输入变量后，尝试打印type（input）和len（input）以查看它是否符合预期。如果使用unicode，可能需要在搜索字符串的末尾添加re.U，就像我上面所做的那样。输入变量是什么意思？此行中称为input的变量：input=open（filename，'r'））. 然后是打印类型（输入）、len（输入）

>>> input2 = """
cfrhubecf "ehukl wehunkl echnk
wehukb ewni; wejio;"
"werulih"
"""
>>> quotes = re.findall(r'"[^"]*"', input2, re.U)
>>> print quotes
['"ehukl wehunkl echnk\nwehukb ewni; wejio;"', '"werulih"']