Python 应用<；b></b>；设置文本中出现的单词列表的格式_Python

Python 应用<；b></b>；设置文本中出现的单词列表的格式

python

Python 应用<；b></b>；设置文本中出现的单词列表的格式,python,Python,给定一个单词列表，我想强调（使用..标记）字符串中的这些单词。不使用正则表达式例如，我有： list_of_words = ["python", "R", "Julia" ...] a_Speech = "A paragraph about programming languages ......R is good for statisticians . Python is good for programmers . ....." 输出应该是 a_Speech = "A paragrap

给定一个单词列表，我想强调（使用

标记）字符串中的这些单词。不使用正则表达式

例如，我有：

list_of_words = ["python", "R", "Julia" ...]
a_Speech = "A paragraph about programming languages  ......R is good for statisticians . Python is good for programmers . ....."

输出应该是

a_Speech = "A paragraph about programming languages  ......<b>R</b> is good for statisticians . <b>Python</b> is good for programmers . ....."

a_Speech=“关于编程语言的一段……R对统计学家很好。Python对程序员很好……”

我试过这样的方法：

def right_shift(astr, index, n):
    # shift by n = 3,n = 4  characters 

def function_name(a_speech): 

    for x in list_of_words: 
        if x in a_speech: 
             loc = a_speech.index(x) 
             right_shift(a_speech, loc, 3)
             a_speech[loc] = "<b>"

             right_shift(a_speech, loc+len(x), 4)          
             a_speech[loc+len(x)] = "</b>

    return a_speech

def右移（应力传感器，索引，n）：
#移位n=3，n=4个字符
def功能名称（语音）：
对于单词列表中的x：
如果在演讲中出现x：
loc=a_语音索引（x）
右移（a_演讲，loc，3）
a_语音[loc]=“”
右移（a音，loc+len（x），4）
a_语音[loc+len（x）]=”
复述

这完全有效。您需要在空格和句点上拆分语音，因此我们编写了一个复合拆分函数

is\u split\u char（）

，然后将其传递到

itertools.groupby（）

，这是一个非常简洁的迭代器

bold_words = set(word.lower() for word in ["python", "R", "Julia"])
  # faster to use a set than a list to test membership

import itertools

def bold_specific_words(bold_words, splitchars, text):
"""Generator to split on specified splitchars, and bold words in wordset, case-insensitive. Don't split contiguous blocks of splitchars. Don't discard the split chars, unlike string.split()."""

  def is_split_char(char, charset=splitchars):
    return char not in charset

  for is_splitchar, chars in itertools.groupby(text, is_split_char):
     word = ''.join(chars) # reform our word from the sub-iterators
     if word.lower() in bold_words:
         yield '<b>' + word + '</b>'
     else:
         yield word

>>> ''.join(word for word in bold_specific_words(bold_words, ' .', a_Speech))
'A paragraph about programming languages  ......<b>R</b> is good for statisticians . <b>Python</b> is good for programmers . .....'

bold_words=set（word.lower（）表示[“python”、“R”、“Julia”]中的word）
#使用集合比使用列表测试成员身份更快
进口itertools
def加粗字符特定字符（加粗字符、拆分字符、文本）：
“”“生成器要在指定的拆分字符和字集中的粗体字上拆分，不区分大小写。”。不要拆分SplitChar的连续块。与string.split（）不同，不要丢弃拆分字符。“”
def是_split_char（char，charset=splitchars）：
返回不在字符集中的字符
对于is_splitchar，itertools.groupby中的字符（文本，is_split_char）：
word=''.join（chars）#从子迭代器中改革我们的单词
如果用粗体字表示word.lower（）：
产生“”+字+“”
其他：
屈服词
>>>''.join（用粗体字表示的逐字逐句（粗体字），一句话）
“一段关于编程语言的文章……R适合统计学家。Python适合程序员……”

类似的方法可能会奏效，创建一个包含详细信息的子字符串列表，并将其追加到末尾：

def function_name(a_speech): 

    loc = 0
    substrings = []
    for word in list_of_words:
        if word in a_speech[loc:]:
             currentloc = loc
             loc = a_speech.index(word, start=currentloc)
             substrings.append(a_speech[currentloc:loc])
             substrings.append("<b>")
             substrings.append(word)
             substrings.append("</b>")
             loc += 3 + len(word) + 4

    return "".join(substrings)

def函数名（语音）：
loc=0
子字符串=[]
对于单词列表中的单词：
如果演讲中的单词[loc:]：
currentloc=loc
loc=a_语音索引（word，start=currentloc）
substring.append（a_speech[currentloc:loc]）
子字符串。追加（“”）
子字符串。追加（word）
子字符串。追加（“”）
loc+=3+长度（字）+4
返回“”。连接（子字符串）

（注意：未测试。您可能需要了解一些最后的细节）

真的不清楚这个用户在问什么。这不清楚。请用更清晰的例子解释你的问题。来吧，伙计们，尽管OP的问题没有明确说明，但他们想对文本中出现的单词列表应用…格式化为什么我觉得我被要求做人们的家庭作业？Jens，在某种程度上这很好。OP在完成这项假定的家庭作业方面已经付出了一定的努力，现在已经陷入了困境。最初的问题也被当作家庭作业练习来处理，这没有帮助。这是区分大小写的，所以它不会将

“python”

与

“python”

匹配。我想不仅仅是空格和句点；逗号、分号、感叹号ark等。总的来说，这让你接近使用正则表达式。@Evert：一般来说，是的，在这种情况下，OP文本中唯一的标点是句点。完全重写，完全可以工作。你可以参数化

is_split_char（char，charset=splitchars）

这将匹配文本中的字母，即“Rescue”“。同样成功的话，我们可以用更少的代码和更多的pythonic替换

，你必须编写一个复合拆分函数，这可不是件小事。”。最好作为生成器，生成子字符串，而不是显式使用索引。@Reishin是的，是的（您指的是子字符串，而不仅仅是单词中的字母）。因此，人们仍然坚持正确地寻找单词边界。