Python 从类似记录文件的相邻行中提取电子邮件:名称:电话:
我有一个文本文件,其列表如下: 这些都在我的文本文件中的单独行上Python 从类似记录文件的相邻行中提取电子邮件:名称:电话:,python,python-3.x,list,Python,Python 3.x,List,我有一个文本文件,其列表如下: 这些都在我的文本文件中的单独行上 Email: jonsmith@emailaddie.com Name: Jon Smith Phone Number: 555-1212 Email: jonsmith@emailaddie.com Name: Jon Smith Phone Number: 555-1212 Email: jonsmith@emailaddie.com Name: Jon Smith Phone Number: 555-1212 我正在尝试
Email: jonsmith@emailaddie.com
Name: Jon Smith
Phone Number: 555-1212
Email: jonsmith@emailaddie.com
Name: Jon Smith
Phone Number: 555-1212
Email: jonsmith@emailaddie.com
Name: Jon Smith
Phone Number: 555-1212
我正在尝试将组:[电子邮件、姓名、电话]组合并导出为另一个文本文件,每个组位于单独的行中
以下是我迄今为止所做的尝试:(如果我能让它正确地打印到终端,我知道如何写入另一个文件
我正在运行Ubuntu Linux
import re
stuff = list()
#get line
with open("a2.txt", "r") as ins:
array = []
for line in ins:
if re.match("Email Address: ", line):
array.append(line)
if re.match("Phone Number: ", line):
array.append(line)
if re.match("Name: ", line):
array.append(line)
print(line)
如注释中所示,通过嵌套的
if
语句查看的是同一行。示例中没有一行与所有三个正则表达式匹配,因此代码永远不会提取任何内容。无论如何,这里不需要使用正则表达式;简单的line.startswith()
对于查找单个静态字符串或小的静态字符串集来说已经足够了
相反,你想要
array = []
for line in ins:
if line.startswith('Email Address:'):
array.append(<<Capture the rest of the line>>)
elif line.startswith('Name: '):
array.append(<<Capture the rest of the line>>)
elif line.startswith('Phone Number: '):
array.append(<<Capture the rest of the line>>)
print(array)
array = []
一开始读起来有点困难,但你应该很快就能理解它。我们有一个
索引
,它告诉我们字段
的期望值,并打印收集的信息,当字段
用完时,将索引
包装回零。这也方便了我们参考如果您确定参数(电子邮件、姓名和电话号码)为将以相同的顺序出现,然后代码将正常工作,如果没有,则在“else”语句中处理。您可以保存不完整的值或引发相同的异常
with open("path to the file") as fh:
# Track the status for a group
counter = 0
# List of all information
all_info = []
# Containing information of current group
current_info = ""
for line in fh:
if line.startswith("Email:") and counter == 0:
counter = 1
current_info = "{}{},".format(current_info, line)
elif line.startswith("Name:") and counter == 1:
counter = 2
current_info = "{}{},".format(current_info, line)
elif line.startswith("Phone Number:") and counter == 2:
counter = 0
all_info.append("{}{},".format(current_info, line).replace("\n",""))
current_info = ""
else:
# You can handle incomplete information here.
counter = 0
current_info = ""
你粘贴的代码缩进正确吗?@DroidX86我想是的,是的。所以你只想在所有3个条件都匹配的情况下打印行?你知道吗?@DroidX86正确。这个挑战项目开始时,我浏览了3000个PDF文件,将它们转换成文本文件,然后从中删除姓名、电子邮件和电话号码em使用python。但是输出文件最后将人员信息放在了单独的行上。我正试图用另一个py来纠正这一点。非常感谢。我的做法完全错了。对于我来说,编程最困难的部分必须是之前所涉及的方法,思维过程,甚至是编写一行代码。
with open("path to the file") as fh:
# Track the status for a group
counter = 0
# List of all information
all_info = []
# Containing information of current group
current_info = ""
for line in fh:
if line.startswith("Email:") and counter == 0:
counter = 1
current_info = "{}{},".format(current_info, line)
elif line.startswith("Name:") and counter == 1:
counter = 2
current_info = "{}{},".format(current_info, line)
elif line.startswith("Phone Number:") and counter == 2:
counter = 0
all_info.append("{}{},".format(current_info, line).replace("\n",""))
current_info = ""
else:
# You can handle incomplete information here.
counter = 0
current_info = ""