Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/326.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
如何在Python中使用正则表达式实现对内容的搜索_Python_Regex_String_Algorithm_Search - Fatal编程技术网

如何在Python中使用正则表达式实现对内容的搜索

如何在Python中使用正则表达式实现对内容的搜索,python,regex,string,algorithm,search,Python,Regex,String,Algorithm,Search,我有一个带有“content”键的分层字典: 其中“内容”是html文件的内容: <p>It is common for content in Arabic, Hebrew, and other languages that use right-to-left scripts to include numerals or include text from other scripts. Both of these typically flow left-to-right with

我有一个带有“content”键的分层字典:

其中“内容”是html文件的内容:

<p>It is common for content in Arabic, Hebrew, and other languages that use right-to-left scripts to include numerals or include text from  other scripts. Both of these typically flow  left-to-right within the overall right-to-left  context. </p> 
<p>This article tells you how to write HTML where text with different writing directions is mixed <em>within a paragraph or other HTML block</em> (ie. <dfn id="term_inline">inline or phrasal</dfn> content). (A companion article <a href="/International/questions/qa-html-dir"><cite>Structural markup and right-to-left text in HTML</cite></a> tells you how to use HTML markup for  elements such as <code class="kw">html</code>, and structural markup such as <code class="kw">p</code> or <code class="kw">div</code> and forms.)</p>
我想使用p(位置)在将内容指定给“content”键的位置(包括找到的单词位于句子开头的情况)提取找到的单词之前和之后的几个单词:

例如:

如何在Python中使用正则表达式或其他方法实现它?
提前谢谢你

我不确定您的词典结构和导航方式是否与您的问题相关,因此我将重新表述您的问题:

“如何使用正则表达式搜索一个词,并获取搜索词前后的单词?”

这个问题的答案是使用正则表达式捕获组

下面是一个查找搜索词前一个和后一个单词的示例。您可能需要调整表达式以获得所需的多个单词或标点符号:

import re

test_string = "How much wood could a wood chuck chuck if a wood chuck would chuck wood"
search_word = "wood"

for match in re.finditer('([^ ]*? |)%s( [^ ]*|)' % search_word, test_string):
    print "entire match: %s" % match.group(0)
    print "prev word: %s" % match.group(1)
    print "next word: %s" % match.group(2)
顺便说一句,如果您还没有,请访问www.regex101.com以测试和调整您的正则表达式模式

import re

def look_through(d, s):
    r = []
    content = readFile(d["path"])
    content = BeautifulSoup(content)
    content = content.getText()
    pos = [m.start() for m in re.finditer(s, content)]
    if pos:
        if "phrase" not in d:
            d["phrase"] = [s]
        else:
            d["phrase"].append(s)
        for p in pos:
            r.append({"content": content, "phrase": d["phrase"], "name": d["name"]})
    for b in d["decendent"] or []:
            r += look_through(b, s)
    return r
r.append({"content": content, "phrase": d["phrase"], "name": d["name"]})
>>> look_through(dict, "how to write") 
[{"content": "article tells you how to write HTML where text", "phrase": "how to write", "name" : "Section_3"}]
import re

test_string = "How much wood could a wood chuck chuck if a wood chuck would chuck wood"
search_word = "wood"

for match in re.finditer('([^ ]*? |)%s( [^ ]*|)' % search_word, test_string):
    print "entire match: %s" % match.group(0)
    print "prev word: %s" % match.group(1)
    print "next word: %s" % match.group(2)