Python 找出一个句子是否有另一个句子的起始词或同一个句子的结束词
例如,我有一组这样的句子:Python 找出一个句子是否有另一个句子的起始词或同一个句子的结束词,python,Python,例如,我有一组这样的句子: New York is in New York State D.C. is the capital of United States The weather is cool in the south of that country. Lets take a bus to get to point b from point a. is cool in the south of that country 还有这样一句话: New York is in New York
New York is in New York State
D.C. is the capital of United States
The weather is cool in the south of that country.
Lets take a bus to get to point b from point a.
is cool in the south of that country
还有这样一句话:
New York is in New York State
D.C. is the capital of United States
The weather is cool in the south of that country.
Lets take a bus to get to point b from point a.
is cool in the south of that country
输出应为:该国南部的天气凉爽。
如果我有一个输入,如美国的,天气凉爽
,则输出应为:
D.C. is the capital of United States The weather is cool in the south of that country.
到目前为止,我尝试了
difflib
并得到了重叠,但这并不能完全解决所有情况下的问题。你可以根据句子构建一个起始表达式和结束表达式的词典。然后在这些词典中找到要扩展的句子的前缀和后缀。在这两种情况下,您都需要为从开头和结尾开始的每个单词子串构建/检查一个键:
sentences="""New York is in New York State
D.C. is the capital of United States
The weather is cool in the south of that country
Lets take a bus to get to point b from point a""".split("\n")
ends = { tuple(sWords[i:]):sWords[:i] for s in sentences
for sWords in [s.split()] for i in range(len(sWords)) }
starts = { tuple(sWords[:i]):sWords[i:] for s in sentences
for sWords in [s.split()] for i in range(1,len(sWords)+1) }
def extendSentence(sentence):
sWords = sentence.split(" ")
prefix = next( (ends[p] for i in range(1,len(sWords)+1)
for p in [tuple(sWords[:i])] if p in ends),
[])
suffix = next( (starts[p] for i in range(len(sWords))
for p in [tuple(sWords[i:])] if p in starts),
[])
return " ".join(prefix + [sentence] + suffix)
输出:
print(extendSentence("of United States The weather is cool"))
# D.C. is the capital of United States The weather is cool in the south of that country
print(extendSentence("is cool in the south of that country"))
# The weather is cool in the south of that country
请注意,我必须删除句子末尾的句点,因为它们阻止匹配。您需要在字典构建步骤中清理这些问题,但我要补充的是,您可以使用中的
轻松找到字符串中是否包含子字符串模式。例如,“该国南部天气凉爽”中的“该国南部天气凉爽”。
将返回True
。这将有助于查看您尝试了什么,以及哪些情况没有解决