Php 抓取维基百科文章的内容_Php_Api_Wikipedia

Php 抓取维基百科文章的内容

php api

Php 抓取维基百科文章的内容,php,api,wikipedia,Php,Api,Wikipedia,我想使用实际的API获取wikipedia文章的内容。现在，我非常了解action=render和action=raw，但我想要尽可能简单的纯文本版本。没有格式，没有链接，最好没有模板，没有引用，没有目录。举个例子，下面是SO页面的摘录： <p><b>Stack Overflow</b> is a <a href="http://en.wikipedia.org/wiki/Website" title="Website">website</a

我想使用实际的API获取wikipedia文章的内容。现在，我非常了解

action=render

和

action=raw

，但我想要尽可能简单的纯文本版本。没有格式，没有链接，最好没有模板，没有引用，没有目录。举个例子，下面是SO页面的摘录：

<p><b>Stack Overflow</b> is a <a href="http://en.wikipedia.org/wiki/Website" title="Website">website</a>, part of the <a href="http://en.wikipedia.org/wiki/Stack_Exchange_Network" title="Stack Exchange Network">Stack Exchange Network</a>,<sup id="cite_ref-blog_legal_1-0" class="reference"><a href="#cite_note-blog_legal-1"><span>[</span>2<span>]</span></a></sup><sup id="cite_ref-stackapps_legal_2-0" class="reference"><a href="#cite_note-stackapps_legal-2"><span>[</span>3<span>]</span></a></sup> featuring questions and answers on a wide range of topics in <a href="http://en.wikipedia.org/wiki/Computer_programming" title="Computer programming">computer programming</a>.<sup id="cite_ref-secrets_3-0" class="reference"><a href="#cite_note-secrets-3"><span>[</span>4<span>]</span></a></sup><sup id="cite_ref-slashdot_4-0" class="reference"><a href="#cite_note-slashdot-4"><span>[</span>5<span>]</span></a></sup><sup id="cite_ref-google-tech-talks_5-0" class="reference"><a href="#cite_note-google-tech-talks-5"><span>[</span>6<span>]</span></a></sup></p>

堆栈溢出是的一部分，以中广泛主题的问答为特色。

这是所有的模板和东西，甚至。我想把这些完全删掉，然后找到真正的文章从哪里开始。然后我需要把它进一步简化为：

堆栈溢出是网站的一部分堆栈交换网络，具有广泛的问答计算机编程的主题

我如何通过模板和wiki格式来获取原始文章内容？这将在PHP中实现。

api提供了您所需要的一切。对于SO示例，这里是

我不认为你可以通过API直接获得纯文本。你需要从这些中选择你想要的

希望这有帮助

可能是在文档中写的：@hakre看起来不太像，除非我错过了它？它们似乎都不是我想要的东西：/好吧，你并不总是能得到一个伤口，然后尝试现成的解决方案。你需要从某个地方开始工作。但是没有一个比API本身做得更好，这离我的目标很遥远。你是在寻找这样的东西吗？不幸的是，当我尝试使用它时，它给了我一个错误：