Python 查找文本中短语之间的距离

Python 查找文本中短语之间的距离,python,count,distance,Python,Count,Distance,我有一个问题,如何计算文本中短语之间的单词数?例如,我有下一个文本: 埃隆·马斯克是一位技术企业家和投资者。他是SpaceX的创始人、首席执行官和首席设计师。埃隆·马斯克表示,SpaceX、特斯拉和SolarCity的目标围绕着他改变世界和人类的愿景 我想数一数“Elon面具”和“SpaceX”之间有多少个单词。然后返回smth,就像一个带数字的列表,然后找到平均单词距离。例如,[15,6] 我知道,在单词的情况下,我们可以在单词上拆分文本。但是如何处理短语呢?您可以根据点、感叹号和问号分割文本

我有一个问题,如何计算文本中短语之间的单词数?例如,我有下一个文本:

埃隆·马斯克是一位技术企业家和投资者。他是SpaceX的创始人、首席执行官和首席设计师。埃隆·马斯克表示,SpaceX、特斯拉和SolarCity的目标围绕着他改变世界和人类的愿景

我想数一数“Elon面具”和“SpaceX”之间有多少个单词。然后返回smth,就像一个带数字的列表,然后找到平均单词距离。例如,[15,6]


我知道,在单词的情况下,我们可以在单词上拆分文本。但是如何处理短语呢?

您可以根据点、感叹号和问号分割文本,但是您的程序如何知道短语和表示缩写的点之间的区别呢?除此之外,您将如何处理括号?它们是否会被视为单独的短语


我不认为你的问题有一个直截了当的答案,除非你对你的短语施加一些严重的限制。

你可以根据点、感叹号和问号分割文本,但是你的程序如何知道短语和表示缩写的点之间的区别?除此之外,您将如何处理括号?它们是否会被视为单独的短语


我不认为你的问题有一个直截了当的答案,除非你对你的短语施加一些严重的限制。

正如用户Dominique提到的,有很多小细节你必须解释。我已经做了一个简单的程序,可以找到两个单词之间的距离。你想知道“Elon Musk”和“SpaceX”之间的距离。为什么不找出“Musk”和“SpaceX”之间的距离呢

注意:此示例将返回单词第一次出现之间的距离。在这个程序中,我们找到了“Musk”(第2个单词)和“SpaceX”(第18个单词)之间的距离。这两个单词之间的距离是15个单词

埃隆·马斯克是一位技术企业家和投资者。他是SpaceX的创始人、首席执行官和首席设计师。埃隆·马斯克表示,SpaceX、特斯拉和SolarCity的目标围绕着他改变世界和人类的愿景

示例(Python 3):

# Initial sentence
phrase = 'Elon Musk is a technology entrepreneur and investor. He is the founder, CEO, and lead designer of SpaceX. Elon Musk has stated that the goals of SpaceX, Tesla, and SolarCity revolve around his vision to change the world and humanity.'

# Removes common punctuation characters
phrase = ''.join(character for character in phrase if character not in ('!', '.' , ':' , ',', '"')) # Insert punctuation you want removed

# Creates a list of split words
word_list = phrase.split()

# Words you want to find the distance between (word_1 comes first in the sentence, then word_2)
word_1 = 'Musk'
word_2 = 'SpaceX'

# Calculates the distance between word_1 and word_2
distance = (word_list.index(word_2)) - (word_list.index(word_1))

# Prints distance between word_1 and word_2
print('Distance between "' + word_1 + '" and "' + word_2 + '" is ' + str(distance - 1) + ' words.')
输出:

# Initial sentence
phrase = 'Elon Musk is a technology entrepreneur and investor. He is the founder, CEO, and lead designer of SpaceX. Elon Musk has stated that the goals of SpaceX, Tesla, and SolarCity revolve around his vision to change the world and humanity.'

# Removes common punctuation characters
phrase = ''.join(character for character in phrase if character not in ('!', '.' , ':' , ',', '"')) # Insert punctuation you want removed

# Creates a list of split words
word_list = phrase.split()

# Words you want to find the distance between (word_1 comes first in the sentence, then word_2)
word_1 = 'Musk'
word_2 = 'SpaceX'

# Calculates the distance between word_1 and word_2
distance = (word_list.index(word_2)) - (word_list.index(word_1))

# Prints distance between word_1 and word_2
print('Distance between "' + word_1 + '" and "' + word_2 + '" is ' + str(distance - 1) + ' words.')

“Musk”和“SpaceX”之间的距离是15个单词。

正如用户Dominique提到的,有很多小细节你必须解释。我制作了一个简单的程序来计算两个单词之间的距离。你想计算“Elon Musk”和“SpaceX”之间的距离。为什么不直接计算“Musk”和“SpaceX”之间的距离呢

注意:此示例将返回第一次出现的单词之间的距离。在此程序中,我们将查找“Musk”(第2个单词)和“SpaceX”(第18个单词)之间的距离。中间的距离是15个单词

埃隆·马斯克是一位技术企业家和投资者。他是SpaceX的创始人、首席执行官和首席设计师。埃隆·马斯克表示,SpaceX、特斯拉和SolarCity的目标围绕着他改变世界和人类的愿景

示例(Python 3):

# Initial sentence
phrase = 'Elon Musk is a technology entrepreneur and investor. He is the founder, CEO, and lead designer of SpaceX. Elon Musk has stated that the goals of SpaceX, Tesla, and SolarCity revolve around his vision to change the world and humanity.'

# Removes common punctuation characters
phrase = ''.join(character for character in phrase if character not in ('!', '.' , ':' , ',', '"')) # Insert punctuation you want removed

# Creates a list of split words
word_list = phrase.split()

# Words you want to find the distance between (word_1 comes first in the sentence, then word_2)
word_1 = 'Musk'
word_2 = 'SpaceX'

# Calculates the distance between word_1 and word_2
distance = (word_list.index(word_2)) - (word_list.index(word_1))

# Prints distance between word_1 and word_2
print('Distance between "' + word_1 + '" and "' + word_2 + '" is ' + str(distance - 1) + ' words.')
输出:

# Initial sentence
phrase = 'Elon Musk is a technology entrepreneur and investor. He is the founder, CEO, and lead designer of SpaceX. Elon Musk has stated that the goals of SpaceX, Tesla, and SolarCity revolve around his vision to change the world and humanity.'

# Removes common punctuation characters
phrase = ''.join(character for character in phrase if character not in ('!', '.' , ':' , ',', '"')) # Insert punctuation you want removed

# Creates a list of split words
word_list = phrase.split()

# Words you want to find the distance between (word_1 comes first in the sentence, then word_2)
word_1 = 'Musk'
word_2 = 'SpaceX'

# Calculates the distance between word_1 and word_2
distance = (word_list.index(word_2)) - (word_list.index(word_1))

# Prints distance between word_1 and word_2
print('Distance between "' + word_1 + '" and "' + word_2 + '" is ' + str(distance - 1) + ' words.')

“Musk”和“SpaceX”之间的距离是15个单词。

有些逻辑您还没有指定,但下面类似的内容可能会起到作用:

def find_distance(sentence, word1, word2):
    distances = []
    while sentence != "":
        _, _, sentence = sentence.partition(word1)
        text, _, _ = sentence.partition(word2)
        if text != "":
            distances.append(len(text.split()))
    return distances
如果你用你的句子来调用它,你会得到你想要的结果

print(find_distance(phrase, "Elon Musk", "SpaceX"))

请注意,像Elon Musk这样的案例的行为是一个技术型的Elon Musk企业家……必须定义。你想选哪一种?第一个还是第二个?

有些逻辑您还没有指定,但下面类似的内容可能会起作用:

def find_distance(sentence, word1, word2):
    distances = []
    while sentence != "":
        _, _, sentence = sentence.partition(word1)
        text, _, _ = sentence.partition(word2)
        if text != "":
            distances.append(len(text.split()))
    return distances
如果你用你的句子来调用它,你会得到你想要的结果

print(find_distance(phrase, "Elon Musk", "SpaceX"))
请注意,像Elon Musk这样的案例的行为是一个技术型的Elon Musk企业家……必须定义。你想选哪一种?第一个还是第二个