Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/javascript/425.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
使用Java解析和修改HTML文件_Java_Html Parser - Fatal编程技术网

使用Java解析和修改HTML文件

使用Java解析和修改HTML文件,java,html-parser,Java,Html Parser,我必须解析给定的HTML并修改其内容,然后保存修改后的版本 我的HTML输入: <div> <div class="post-text"><p>@MarcoS had an excellent solution using a NodeTraversor to make a list of nodes to change at <a href="https://stackoverflow.com/a/6594828/1861357">https:/

我必须解析给定的HTML并修改其内容,然后保存修改后的版本

我的HTML输入:

<div>
<div class="post-text"><p>@MarcoS had an excellent solution using a NodeTraversor to make a list of nodes to change at <a href="https://stackoverflow.com/a/6594828/1861357">https://stackoverflow.com/a/6594828/1861357</a> and I only very slightly modified his method which replaces a node (a set of tags) with the data in the node plus whatever information you would like to add.</p>

<p>To store a String in memory I used a static <code>StringBuilder</code> to save the HTML in memory. </p>

<p>First we read in the HTML file (that is manually specified, this can be changed), then we make a series of checks to change whatever nodes with any data that we want.</p>

<p>The one problem that I didn't fix in the solution by MarcoS was that it split each individual word, instead of looking at a line. However I just used '-' for multiple words, because otherwise it places the string directly after that word.</p>

<p>So a full implementation: </p>
</div>
<div>
<div class="post-text" itemprop="description">

        <p>Recently I was recommended to use JSoup to parse and modify HTML documents. </p>

<p>However what if I have a HTML document that I want to modify (to send, store somewhere else, etc.), how might I go about doing that without changing the original document? </p>
</div>
我的问题是,我必须在html中找到“@MarcoS有一个很好的解决方案,使用NodeTraversor创建一个要更改的节点列表,并且只更改我的节点”,并在其周围放置一个
div标记
(或任何东西)(而不是在其父标记或整个段落周围)。 我搜索的文本之间会有html标记

我希望输出如下:

 <div class="post-text"><p><div id="myDiv">@MarcoS had an excellent solution using a NodeTraversor to make a list of nodes to change at <a href="https://stackoverflow.com/a/6594828/1861357">https://stackoverflow.com/a/6594828/1861357</a> and I only</div>......</div>
@MarcoS有一个很好的解决方案,使用NodeTraversor创建一个节点列表,只在我和我之间进行更改。。。。。。

RegEx是唯一的解决方案还是任何HTML解析器都可以做到这一点?

如果不想使用某些XML解析器,可以尝试使用regexp:

String xmlStr = "some_xml";
xmlStr = xml.replaceAll(">\\s+<", "><").trim();
String xmlStr=“some_xml”;
xmlStr=xml.replaceAll(“>\\s+