Python 找出一个句子是否有另一个句子的起始词或同一个句子的结束词

Python 找出一个句子是否有另一个句子的起始词或同一个句子的结束词,python,Python,例如,我有一组这样的句子: New York is in New York State D.C. is the capital of United States The weather is cool in the south of that country. Lets take a bus to get to point b from point a. is cool in the south of that country 还有这样一句话: New York is in New York

例如,我有一组这样的句子:

New York is in New York State
D.C. is the capital of United States
The weather is cool in the south of that country.
Lets take a bus to get to point b from point a.
is cool in the south of that country
还有这样一句话:

New York is in New York State
D.C. is the capital of United States
The weather is cool in the south of that country.
Lets take a bus to get to point b from point a.
is cool in the south of that country
输出应为:
该国南部的天气凉爽。

如果我有一个输入,如美国的
,天气凉爽
,则输出应为:

D.C. is the capital of United States The weather is cool in the south of that country.

到目前为止,我尝试了
difflib
并得到了重叠,但这并不能完全解决所有情况下的问题。

你可以根据句子构建一个起始表达式和结束表达式的词典。然后在这些词典中找到要扩展的句子的前缀和后缀。在这两种情况下,您都需要为从开头和结尾开始的每个单词子串构建/检查一个键:

sentences="""New York is in New York State
D.C. is the capital of United States
The weather is cool in the south of that country
Lets take a bus to get to point b from point a""".split("\n")

ends   =  { tuple(sWords[i:]):sWords[:i] for s in sentences
               for sWords in [s.split()] for i in range(len(sWords)) }
starts  = { tuple(sWords[:i]):sWords[i:] for s in sentences
               for sWords in [s.split()] for i in range(1,len(sWords)+1) }

def extendSentence(sentence):
    sWords   = sentence.split(" ")
    prefix   = next( (ends[p] for i in range(1,len(sWords)+1)
                      for p in [tuple(sWords[:i])] if p in ends),
                    [])
    suffix   = next( (starts[p] for i in range(len(sWords))
                      for p in [tuple(sWords[i:])] if p in starts),
                    [])  
    return " ".join(prefix + [sentence] + suffix)
输出:

print(extendSentence("of United States The weather is cool"))

# D.C. is the capital of United States The weather is cool in the south of that country

print(extendSentence("is cool in the south of that country"))

# The weather is cool in the south of that country

请注意,我必须删除句子末尾的句点,因为它们阻止匹配。您需要在字典构建步骤中清理这些问题,但我要补充的是,您可以使用中的
轻松找到字符串中是否包含子字符串模式。例如,“该国南部天气凉爽”中的“该国南部天气凉爽”。
将返回
True
。这将有助于查看您尝试了什么,以及哪些情况没有解决