Python:我需要打印以只返回唯一的句子
我试图获取一个包含10个句子(所有单词)的txt文件,并将其作为命令行参数传递给python脚本。我想打印包含Python:我需要打印以只返回唯一的句子,python,string,shell,Python,String,Shell,我试图获取一个包含10个句子(所有单词)的txt文件,并将其作为命令行参数传递给python脚本。我想打印包含dic中列出的单词的句子。下面的脚本可以找到匹配的句子,但打印句子的次数与找到匹配的单词的次数相同 有没有其他方法可以用来做这件事?另外,我不希望输出被一行分隔(\n) 输出: eighty two is what i am bidding on the brent eighty two is what i am bidding on the brent eighty two is
dic
中列出的单词的句子。下面的脚本可以找到匹配的句子,但打印句子的次数与找到匹配的单词的次数相同
有没有其他方法可以用来做这件事?另外,我不希望输出被一行分隔(\n)
输出:
eighty two is what i am bidding on the brent
eighty two is what i am bidding on the brent
eighty two is what i am bidding on the brent
call on sixty five to sixty seventy
call on sixty five to sixty seventy
call on sixty five to sixty seventy
call on sixty five to sixty seventy
call on sixty five to sixty seventy
no nothing is going on double
i am bidding on the option for eighty five
i am bidding on the option for eighty five
recross sell seller selling sept
recross sell seller selling sept
recross sell seller selling sept
recross sell seller selling sept
recross sell seller selling sept
blah blah blah blah close
所需输出:
eighty two is what i am bidding on the brent
call on sixty five to sixty seventy
no nothing is going on double
i am bidding on the option for eighty five
recross sell seller selling sept
blah blah blah blah close
print(line)
语句之后添加break
,这样字典单词上的for
循环就中断了
f.readline()
引起的,因为它将在返回字符串的末尾包含\n
。您可以使用line.strip()
删除此项,但最好使用for line in f
语法
for line in f:
words=line.split()
if len(words) > 3:
for j in words:
if j in dic:
print(line)
break
print(line)
语句之后添加break
,这样字典单词上的for
循环就中断了
f.readline()
引起的,因为它将在返回字符串的末尾包含\n
。您可以使用line.strip()
删除此项,但最好使用for line in f
语法
for line in f:
words=line.split()
if len(words) > 3:
for j in words:
if j in dic:
print(line)
break
我建议为您的单词词典创建一个
set
,第二个set
包含文件每行的单词。然后,您可以使用和来比较这些集合,以获得它们的交集,或两者共有的单词。这比在列表中循环查找相似的单词更有效
import sys
dic=set(["april","aprils","ask","aug","augee","august","bid","bonds","brent","buy","call","callroll","calls","chance","checking","close","collar","condor","cover"])
filename = sys.argv[1]
with open(filename) as f:
for line in f:
s = set(line.split())
if s & dic:
print(line.strip())
我建议为您的单词词典创建一个set
,第二个set
包含文件每行的单词。然后,您可以使用和来比较这些集合,以获得它们的交集,或两者共有的单词。这比在列表中循环查找相似的单词更有效
import sys
dic=set(["april","aprils","ask","aug","augee","august","bid","bonds","brent","buy","call","callroll","calls","chance","checking","close","collar","condor","cover"])
filename = sys.argv[1]
with open(filename) as f:
for line in f:
s = set(line.split())
if s & dic:
print(line.strip())
在打印(行)
之后放置一个中断符
,这样它就不会检查其他单词。@trincot,谢谢你解决了80%的问题。我完全忘记了使用break。另外20%是由行
字符串中的换行符引起的。查看我的答案。在打印(行)
之后放置一个断点
,这样它就不会检查其他单词了。@trincot,谢谢你解决了80%的问题。我完全忘记了使用break。另外20%是由行
字符串中的换行符引起的。查看我的答案。要抑制换行,我做了:对于f:words=line.split()如果len(words)>3:for j in words:if j in dic:print(line.strip(“\n”))break
要抑制换行,我做了:对于f:words=line.split()如果len(words)>3:对于文字中的j:if j in dic:print(line.strip(“\n”))break
使用此方法的解决方案是:import sys dic=set([“四月”、“四月”、“询问”、“八月”、“八月”、“八月”、“出价”、“债券”、“布伦特”、“买入”、“买入”、“买入”、“买入”、“买入”、“买入”、“买入”、“买入”、“买入”、“买入”、“买入”、“买入”、“买入”、“买入”、“买入”、“买入”、“买入”、“买入”、“买入”、“买入”、“买入”、“买入”、“买入”、“买入”、“买入”、“买入”、“买入”、“买入”、“买入”、“买入”、“买入”、“买入”、“买入”、“买入filename=sys.argv[1],其中open(filename)作为f:for-in-f:if-len(line.split())>3:s=set(line.split())if-s&dic:print(line.split())
感谢您优化该过程len>3
检查哪些行?我只想要包含3个以上单词的行。这只是手工挑选语音到文本输出。使用此方法的解决方案是:import sys dic=set([“april”、“aprils”、“ask”、“aug”、“augee”、“august”、“bid”、“bonds”、“brent”、“buy”、“call”、“callroll”、“calls”、“chance”、“checking”、“close”、“collar”、“condor”、“cover”])filename=sys.argv[1],其中open(filename)作为f:if len中的行的f(line.split())>3:s=set(line.split())如果s&dic:print(line.strip())
感谢您优化程序检查len>3的内容是什么?我只想要包含3个以上单词的行。这只是为了手动选择语音到文本输出。