Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/362.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python—使用HTML将给定列表中的字符串实例包围在另一个字符串中_Python_Html - Fatal编程技术网

Python—使用HTML将给定列表中的字符串实例包围在另一个字符串中

Python—使用HTML将给定列表中的字符串实例包围在另一个字符串中,python,html,Python,Html,我已经编写了一个函数,它用一个具有给定属性的HTML元素来包围搜索词。其思想是,生成的字符串稍后会写入日志文件,并突出显示搜索词 def inject_html(needle, haystack, html_element="span", html_attrs={"class":"matched"}): # Find all occurrences of a given string in some text # Surround the occurrences with a H

我已经编写了一个函数,它用一个具有给定属性的HTML元素来包围搜索词。其思想是,生成的字符串稍后会写入日志文件,并突出显示搜索词

def inject_html(needle, haystack, html_element="span", html_attrs={"class":"matched"}):
    # Find all occurrences of a given string in some text
    # Surround the occurrences with a HTML element and given HTML attributes
    new_str = haystack
    start_index = 0
    while True:
        try:
            # Get the bounds
            start = new_str.lower().index(needle.lower(), start_index)
            end = start + len(needle)

            # Needle is present, compose the HTML to inject
            html_open = "<" + html_element + " " + " ".join(["%s=\"%s\""%(k,html_attrs[k]) for k in html_attrs]) + ">"
            html_close = "</" + html_element + ">"

            new_str = new_str[0:start] + html_open + new_str[start:end] + html_close + new_str[end:len(new_str)]
            start_index = end + len(html_close) + len(html_open)

        except ValueError as ex:
            # String doesn't occur in text after index, break loop
            break
    return new_str
def inject_html(针形,草垛,html_element=“span”,html_attrs={“class”:“matched”}):
#在某些文本中查找给定字符串的所有匹配项
#用一个HTML元素和给定的HTML属性包围这些事件
new_str=干草堆
开始索引=0
尽管如此:
尝试:
#获得边界
开始=新的下针()索引(下针(),开始索引)
结束=开始+透镜(打捆针)
#针存在,编写HTML进行注入
html_open=“”
html_close=“”
new_str=new_str[0:start]+html_open+new_str[start:end]+html_close+new_str[end:len(new_str)]
开始索引=结束+len(html\u关闭)+len(html\u打开)
除ValueError外,例如:
#字符串不出现在索引后的文本中,请断开循环
打破
返回新的\u str
我想打开它来接受一系列的针,在草堆中用HTML来定位和包围它们。我可以很容易地通过另一个循环来围绕代码,循环遍历指针,定位和围绕搜索词的实例来实现这一点。问题是,这并不能防止意外地包围先前注入的HTML代码,例如

def inject_html(needles, haystack, html_element="span", html_attrs={"class":"matched"}):
    # Find all occurrences of a given string in some text
    # Surround the occurrences with a HTML element and given HTML attributes
    new_str = haystack
    for needle in needles:
        start_index = 0
        while True:
        try:
            # Get the bounds
            start = new_str.lower().index(needle.lower(), start_index)
            end = start + len(needle)

            # Needle is present, compose the HTML to inject
            html_open = "<" + html_element + " " + " ".join(["%s=\"%s\""%(k,html_attrs[k]) for k in html_attrs]) + ">"
            html_close = "</" + html_element + ">"

            new_str = new_str[0:start] + html_open + new_str[start:end] + html_close + new_str[end:len(new_str)]
            start_index = end + len(html_close) + len(html_open)

        except ValueError as ex:
            # String doesn't occur in text after index, break loop
            break
    return new_str

search_strings = ["foo", "pan", "test"]
haystack = "Foobar"
print(inject_html(search_strings,haystack))

<s<span class="matched">pan</span> class="matched">Foo</span>bar
def inject_html(针、草垛、html_element=“span”、html_attrs={“class”:“matched”}):
#在某些文本中查找给定字符串的所有匹配项
#用一个HTML元素和给定的HTML属性包围这些事件
new_str=干草堆
针入针:
开始索引=0
尽管如此:
尝试:
#获得边界
开始=新的下针()索引(下针(),开始索引)
结束=开始+透镜(打捆针)
#针存在,编写HTML进行注入
html_open=“”
html_close=“”
new_str=new_str[0:start]+html_open+new_str[start:end]+html_close+new_str[end:len(new_str)]
开始索引=结束+len(html\u关闭)+len(html\u打开)
除ValueError外,例如:
#字符串不出现在索引后的文本中,请断开循环
打破
返回新的\u str
搜索字符串=[“foo”、“pan”、“test”]
haystack=“Foobar”
打印(注入html(搜索字符串,haystack))
福巴
在第二次迭代中,代码从上一次迭代中插入的“span”中搜索并包围“pan”文本

您建议我如何更改原始函数以查找针的列表,而不会将HTML注入到不需要的位置(例如在现有标记中)

---更新---

我通过维护一个“免疫”范围列表来解决这个问题(这些范围已经被HTML包围,因此不需要再次检查)

def inject_html(needles, haystack, html_element="span", html_attrs={"class":"matched"}):
    # Find all occurrences of a given string in some text
    # Surround the occurrences with a HTML element and given HTML attributes
    immune = []
    new_str = haystack
    for needle in needles:
        next_index = 0
        while True:
            try:
                # Get the bounds
                start = new_str.lower().index(needle.lower(), next_index)
                end = start + len(needle)

                if not any([(x[0] > start and x[0] < end) or (x[1] > start and x[1] < end) for x in immune]):
                    # Needle is present, compose the HTML to inject
                    html_open = "<" + html_element + " " + " ".join(["%s=\"%s\""%(k,html_attrs[k]) for k in html_attrs]) + ">"
                    html_close = "</" + html_element + ">"

                    new_str = new_str[0:start] + html_open + new_str[start:end] + html_close + new_str[end:len(new_str)]
                    next_index = end + len(html_close) + len(html_open)

                    # Add the highlighted range (and HTML code) to the list of immune ranges
                    immune.append([start, next_index])

            except ValueError as ex:
                # String doesn't occur in text after index, break loop
                break

    return new_str
def inject_html(针、草垛、html_element=“span”、html_attrs={“class”:“matched”}):
#在某些文本中查找给定字符串的所有匹配项
#用一个HTML元素和给定的HTML属性包围这些事件
免疫=[]
new_str=干草堆
针入针:
下一个索引=0
尽管如此:
尝试:
#获得边界
开始=新的下一个索引(指针下一个索引)
结束=开始+透镜(打捆针)
如果没有任何([(x[0]>开始和x[0]开始和x[1]

不过,这并不是特别的python,我很想看看是否有人能想出更干净的东西。

我会用这样的东西:

def inject_html(phrases, text_body, html_element_name="span", html_attrs={"class":"matched"}):

    new_text_body = []

    html_start_tag = "<" + html_element_name + " ".join(["%s=\"%s\""%(k,html_attrs[k]) for k in html_attrs]) + ">"
    html_end_tag = "</" + html_element_name + ">"

    text_body_lines = text_body.split("\n")

    for line in text_body_lines:
        for p in phrases:
            if line.lower() == p.lower():
                line = html_start_tag + p + html_end_tag
                break

        new_text_body.append(line)

    return "\n".join(new_text_body)

如何找到
指针
?这是正则表达式的一个很好的候选项。
指针
是手动输入的,它们是我想用HTML标记包围的字符串。
指针
是否需要与给定行精确匹配(不区分大小写)?它是一个单词吗?
pinder
可以有很多行吗?是的,它必须是精确匹配的。它是一个任意的单行字符串。很接近,但我们要查找的短语可能不是行中的唯一内容。这只会在整行匹配时匹配。@ryansin得到了它-这就是我问
pinder是否需要是一个字符串时的意思对一个给定的行< /代码>的精确匹配。我明天应该有另一个PoC。抱歉,是的,我的错误被误解了。line@ryansin我已经用这些要求更新了我的答案。
import re

def inject_html(phrases, text_body, html_element_name="span", html_attrs={"class": "matched"}):

    html_start_tag = "<" + html_element_name + " " + " ".join(["%s=\"%s\"" % (k, html_attrs[k]) for k in html_attrs]) + ">"
    html_end_tag = "</" + html_element_name + ">"

    for p in phrases:
        text_body = re.sub(r"({})".format(p), r"{}\1{}".format(html_start_tag, html_end_tag), text_body, flags=re.IGNORECASE)

    return text_body
text = """
Somewhat more than forty years ago, Mr Baillie Fraser published a 
lively and instructive volume under the title _A Winter’s Journey  
(Tatar) from Constantinople to Teheran. Political complications 
had arisen between Russia and Turkey - an old story, of which we are 
witnessing a new version at the present time. The English government 
deemed it urgently necessary to send out instructions to our 
representatives at Constantinople and Teheran.
"""

new = inject_html(["TEHERAN", "Constantinople"], text)

print(new)

> Somewhat more than forty years ago, Mr Baillie Fraser published a lively and instructive volume under the title _A Winter’s Journey (Tatar) from <span class="matched">Constantinople</span> to <span class="matched">Teheran</span>. Political complications had arisen between Russia and Turkey - an old story, of which we are witnessing a new version at the present time. The English government deemed it urgently necessary to send out instructions to our representatives at <span class="matched">Constantinople</span> and <span class="matched">Teheran</span>.