Python 正在mwparserfromhell中获取下一个节点_Python_Mediawiki Api

Python 正在mwparserfromhell中获取下一个节点

python

Python 正在mwparserfromhell中获取下一个节点,python,mediawiki-api,Python,Mediawiki Api,我正在使用mwparserfromhell解析WikiMedia文本我需要从网页上找到位置例如，当我使用（），我可以通过filter\u headings（）解析后获得所有标题。现在我需要把内容放在标题下我可以得到标题（“位置”），但是我如何得到它的内容呢这是密码。任何帮助都将不胜感激 import mwparserfromhell import urllib.request import urllib.parse import json def main(): searc

我正在使用

mwparserfromhell

解析WikiMedia文本

我需要从网页上找到位置

例如，当我使用（），我可以通过

filter\u headings（）

解析后获得所有标题。现在我需要把内容放在标题下

我可以得到标题（“位置”），但是我如何得到它的内容呢

这是密码。任何帮助都将不胜感激

import mwparserfromhell
import urllib.request
import urllib.parse
import json


def main():
    search('muriel')


def search(name):
    wiki_parsed = get_json(name, True)
    headings = wiki_parsed.filter_headings()
    filtered_headings = [heading
                         for heading in headings
                         if heading.title == 'Places']

    if len(filtered_headings) > 0:
        print(filtered_headings[0])

        # ===================================
        # need to get the content inside heading
        #
        # ?????????????????????????????????????


def get_json(name, ignore_cache=False):
    url = 'https://en.wikipedia.org/w/api.php'

    args = {'action': 'query',
            'titles': name,
            'prop': 'revisions',
            'rvprop': 'content',
            'format': 'json'}

    content = get_url_content(url, args)

    data = json.loads(content)
    wiki_text = (list(data['query']['pages'].values())[
                 0]['revisions'][0]['*'])
    parsed = mwparserfromhell.parse(wiki_text)

    return parsed


def get_url_content(url, req_params):
    url = url + '?' + urllib.parse.urlencode(req_params)
    fp = urllib.request.urlopen(url)
    str_content = fp.read().decode('utf-8', 'ignore')
    fp.close()
    return str_content

if __name__ == "__main__":
    main()

在MediaWiki语法中，内容实际上并不“在”节中，节标题更像是一个锚，一个页面上的标记。这意味着你可以做像这样的事情

== section 1 ==
{|
|
== section 2 ==
|}

这将导致

<h2>section 1</h2>
<table>
  <tr>
    <td>
      <h2>section 2</h2>
    </td>
  </tr>
</table>

第1节
第二节

在这种情况下，第1节的内容是什么？

我检查了，是的，

标题

节点中没有子节点。如何获取下一个节点？还是兄弟节点？可能吗？当然可以，

wiki\u parsed.nodes[wiki\u parsed.nodes.index（筛选的\u标题[0]）+1）

例如。但正如我所解释的，它并不一定意味着你期望它的意思。尽管

mwparserfromhell

似乎无论如何都无法处理嵌套节。我目前使用的是你上面提到的第一种方法。在我面临问题或我的报废程序失败之前，我祈祷成功。：）谢谢，哪种python你建议图书馆解析维基文本吗？