Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/csharp/307.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/html/75.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
C# Html敏捷包-删除元素,但不删除innerHtml_C#_Html_Html Agility Pack - Fatal编程技术网

C# Html敏捷包-删除元素,但不删除innerHtml

C# Html敏捷包-删除元素,但不删除innerHtml,c#,html,html-agility-pack,C#,Html,Html Agility Pack,我只需注意就可以轻松删除元素。remove()如下所示: HtmlDocument html = new HtmlDocument(); html.Load(Server.MapPath(@"~\Site\themes\default\index.cshtml")); foreach (var item in html.DocumentNode.SelectNodes("//removeMe")) { item.Remove(); } 但这也会删除innerHtml。 如果我只想删

我只需注意就可以轻松删除元素。remove()如下所示:

HtmlDocument html = new HtmlDocument();

html.Load(Server.MapPath(@"~\Site\themes\default\index.cshtml"));

foreach (var item in html.DocumentNode.SelectNodes("//removeMe"))
{
    item.Remove();
}
但这也会删除innerHtml。 如果我只想删除标记并保留innerHtml呢

例如:

<ul>
    <removeMe>
        <li>
            <a href="#">Keep me</a>
        </li>
    </removeMe>
</ul>

任何帮助都将不胜感激:)

使用正则表达式,您可以使用htmlagilitypack,还是需要使用htmlagilitypack

string html = "<ul><removeMe><li><a href="#">Keep me</a></li></removeMe></ul>";

html = Regex.Replace(html, "<removeMe.*?>", "", RegexOptions.Compiled);
html = Regex.Replace(html, "</removeMe>", "", RegexOptions.Compiled);
string html=“
    • ”; html=Regex.Replace(html,“,”,RegexOptions.Compiled); html=Regex.Replace(html,“,”,RegexOptions.Compiled);
这应该可以:

foreach (var item in doc.DocumentNode.SelectNodes("//removeMe"))
{
    if (item.PreviousSibling == null)
    {
        //First element -> so add it at beginning of the parent's innerhtml
        item.ParentNode.InnerHtml = item.InnerHtml + item.ParentNode.InnerHtml;
    }
    else
    {
        //There is an element before itemToRemove -> add the innerhtml after the previous item
        foreach(HtmlNode node in item.ChildNodes){
            item.PreviousSibling.ParentNode.InsertAfter(node, item.PreviousSibling);
        }
    }
    item.Remove();
}

也许这就是你要找的

foreach (HtmlNode node in html.DocumentNode.SelectNodes("//removeme"))
{
    HtmlNodeCollection children = node.ChildNodes; //get <removeme>'s children
    HtmlNode parent = node.ParentNode; //get <removeme>'s parent
    node.Remove(); //remove <removeme>
    parent.AppendChildren(children); //append the children to the parent
}
foreach(html.DocumentNode.SelectNodes(“//removeme”)中的HtmlNode节点)
{
HtmlNodeCollection children=node.ChildNodes;//get的子项
HtmlNode parent=node.ParentNode;//get的父节点
node.Remove();//删除
parent.AppendChildren(children);//将子对象附加到父对象
}

编辑:L.B.的答案更清晰。跟他走

bool KeepGrandChildren实现有一个问题,可能是那些试图删除的元素中包含文本的人。如果removeme标记中包含文本,则该文本也将被删除。例如,
text更多文本

将成为
更多文本

试试这个:

private static void RemoveElementKeepText(HtmlNode node)
    {
        //node.ParentNode.RemoveChild(node, true);
        HtmlNode parent = node.ParentNode;
        HtmlNode prev = node.PreviousSibling;
        HtmlNode next = node.NextSibling;

        foreach (HtmlNode child in node.ChildNodes)
        {
            if (prev != null)
                parent.InsertAfter(child, prev);
            else if (next != null)
                parent.InsertBefore(child, next);
            else
                parent.AppendChild(child);

        }
        node.Remove();
    }
有一个简单的方法:

 element.InnerHtml = element.InnerHtml.Replace("<br>", "{1}"); 
 var innerTextWithBR = element.InnerText.Replace("{1}", "<br>");
element.InnerHtml=element.InnerHtml.Replace(“
”,“{1}”); var innerTextWithBR=element.InnerText.Replace(“{1}”,“
”);
这个怎么样

var removedNodes = document.SelectNodes("//removeme");
if(removedNodes != null)
    foreach(var rn in removedNodes){
        HtmlTextNode innernodes =document.CreateTextNode(rn.InnerHtml);
        rn.ParnetNode.ReplaceChild(innernodes, rn);
    }

添加我的两分钱,因为这些方法都不能处理我想要的(删除一组给定的标记,如
p
div
,并在保留内部标记的同时正确处理嵌套)

这就是我所提出的,并将我所有的单元测试与我所需要处理的大多数情况相结合:

var htmlDoc = new HtmlDocument();

// load html
htmlDoc.LoadHtml(html);

var tags = (from tag in htmlDoc.DocumentNode.Descendants()
           where tagNames.Contains(tag.Name)
           select tag).Reverse();

// find formatting tags
foreach (var item in tags)
{
    if (item.PreviousSibling == null)
    {
        // Prepend children to parent node in reverse order
        foreach (HtmlNode node in item.ChildNodes.Reverse())
        {
            item.ParentNode.PrependChild(node);
        }                        
    }
    else
    {
        // Insert children after previous sibling
        foreach (HtmlNode node in item.ChildNodes)
        {
            item.ParentNode.InsertAfter(node, item.PreviousSibling);
        }
    }

    // remove from tree
    item.Remove();
}

// return transformed doc
html = htmlDoc.DocumentNode.WriteContentTo().Trim();
以下是我用来测试的案例:

[TestMethod]
public void StripTags_CanStripSingleTag()
{
    var input = "<p>tag</p>";
    var expected = "tag";
    var actual = HtmlUtilities.StripTags(input, "p");

    Assert.AreEqual(expected, actual);
}

[TestMethod]
public void StripTags_CanStripNestedTag()
{
    var input = "<p>tag <p>inner</p></p>";
    var expected = "tag inner";
    var actual = HtmlUtilities.StripTags(input, "p");

    Assert.AreEqual(expected, actual);
}

[TestMethod]
public void StripTags_CanStripTwoTopLevelTags()
{
    var input = "<p>tag</p> <div>block</div>";
    var expected = "tag block";
    var actual = HtmlUtilities.StripTags(input, "p", "div");

    Assert.AreEqual(expected, actual);
}

[TestMethod]
public void StripTags_CanStripMultipleNestedTags_2LevelsDeep()
{
    var input = "<p>tag <div>inner</div></p>";
    var expected = "tag inner";
    var actual = HtmlUtilities.StripTags(input, "p", "div");

    Assert.AreEqual(expected, actual);
}

[TestMethod]
public void StripTags_CanStripMultipleNestedTags_3LevelsDeep()
{
    var input = "<p>tag <div>inner <p>superinner</p></div></p>";
    var expected = "tag inner superinner";
    var actual = HtmlUtilities.StripTags(input, "p", "div");

    Assert.AreEqual(expected, actual);
}

[TestMethod]
public void StripTags_CanStripTwoTopLevelMultipleNestedTags_3LevelsDeep()
{
    var input = "<p>tag <div>inner <p>superinner</p></div></p> <div><p>inner</p> toplevel</div>";
    var expected = "tag inner superinner inner toplevel";
    var actual = HtmlUtilities.StripTags(input, "p", "div");

    Assert.AreEqual(expected, actual);
}

[TestMethod]
public void StripTags_IgnoresTagsThatArentSpecified()
{
    var input = "<p>tag <div>inner <a>superinner</a></div></p>";
    var expected = "tag inner <a>superinner</a>";
    var actual = HtmlUtilities.StripTags(input, "p", "div");

    Assert.AreEqual(expected, actual);

    input = "<wrapper><p>tag <div>inner</div></p></wrapper>";
    expected = "<wrapper>tag inner</wrapper>";
    actual = HtmlUtilities.StripTags(input, "p", "div");

    Assert.AreEqual(expected, actual);
}

[TestMethod]
public void StripTags_CanStripSelfClosingAndUnclosedTagsLikeBr()
{
    var input = "<p>tag</p><br><br/>";
    var expected = "tag";
    var actual = HtmlUtilities.StripTags(input, "p", "br");

    Assert.AreEqual(expected, actual);
}
[TestMethod]
public void StripTags\u CanStripSingleTag()
{
var input=“标记”

”; var expected=“tag”; var-actual=HtmlUtilities.StripTags(输入,“p”); 断言.AreEqual(预期、实际); } [测试方法] public void StripTags\u CanStripNestedTag() { var input=“标记内部

”; var expected=“标记内部”; var-actual=HtmlUtilities.StripTags(输入,“p”); 断言.AreEqual(预期、实际); } [测试方法] public void StripTags\u CanStripTwoTopLevelTags() { var input=“tag

block”; var expected=“标记块”; var actual=HtmlUtilities.StripTags(输入“p”、“div”); 断言.AreEqual(预期、实际); } [测试方法] public void StripTags\u CanStripMultipleNestedTags\u 2levelsDep() { var input=“标记内部”

”; var expected=“标记内部”; var actual=HtmlUtilities.StripTags(输入“p”、“div”); 断言.AreEqual(预期、实际); } [测试方法] public void StripTags\u canstripMultiplementedTags\u 3LevelsDeep() { var input=“标记内部superinner

”; var expected=“标记内部上级”; var actual=HtmlUtilities.StripTags(输入“p”、“div”); 断言.AreEqual(预期、实际); } [测试方法] public void StripTags\u canstripTwoTopLevelMultiplementedTags\u 3LevelsDeep() { var input=“标记内部高级内部顶级”; var expected=“标记内部上级内部顶级”; var actual=HtmlUtilities.StripTags(输入“p”、“div”); 断言.AreEqual(预期、实际); } [测试方法] public void StripTags_ignorestagsthattarentspecified() { var input=“标记内部上级”

”; var expected=“标记内部上级”; var actual=HtmlUtilities.StripTags(输入“p”、“div”); 断言.AreEqual(预期、实际); 输入=“标记内部”

”; expected=“标记内部”; 实际值=Hmlutilities.StripTags(输入“p”、“div”); 断言.AreEqual(预期、实际); } [测试方法] public void StripTags\u CanStripSelfClosingAndUnclosedTagsLikeBr() { var input=“标记

”; var expected=“tag”; var actual=HtmlUtilities.StripTags(输入“p”、“br”); 断言.AreEqual(预期、实际); }

它可能不能处理所有问题,但它能满足我的需要。

通常正确的表达式是
node.ParentNode.RemoveChildren(node,true)

由于
HtmlNode.RemoveChildren()
()中的一个排序错误,我创建了一个类似的方法。对不起,这是VB。如果有人要翻译,我就写一个

'The HTML Agility Pack (1.4.9) includes the HtmlNode.RemoveChild() method but it has an ordering bug with preserving child nodes.  
'The below implementation orders children correctly.
Private Shared Sub RemoveNode(node As HtmlAgilityPack.HtmlNode, keepChildren As Boolean)
    Dim parent = node.ParentNode
    If keepChildren Then
        For i = node.ChildNodes.Count - 1 To 0 Step -1
            parent.InsertAfter(node.ChildNodes(i), node)
        Next
    End If
    node.Remove()
End Sub
我已使用以下测试标记测试了此代码:

<removeme>
    outertextbegin
    <p>innertext1</p>
    <p>innertext2</p>
    outertextend
</removeme>

outertextbegin
innertext1

innertext2

外伸
输出为:

outertextbegin
<p>innertext1</p>
<p>innertext2</p>
outertextend
outertextbegin
innertext1

innertext2

外伸
这是C语言版本-从2014年12月3日17:57开始回复帖子-伪编码器

该网站不允许我评论和添加到原始帖子中。也许它会帮助别人

private void removeNode(HtmlAgilityPack.HtmlNode node, bool keepChildren)
{
    var parent = node.ParentNode;
    if (keepChildren)
    {
        for ( int i = node.ChildNodes.Count - 1; i >= 0; i--)
        {
            parent.InsertAfter(node.ChildNodes[i], node);
        }            
    }
    node.Remove(); 
}

查找
removeMe
节点的父节点,并将
removeMe
节点的innerHtml附加到父节点的innerHtml,然后将其删除?:-)想一想,但是如果父节点包含5个嵌套节点,removeMe是第3个,那么如果我将removeMe的innerHtml附加到父节点,位置就不再相同了。也许你可以用它的innerHtml替换removeMe节点,或者在前一个节点之后插入,我没有太多使用HTMLAP替换HTML的经验,但是浏览和遍历DOM树非常容易。另一种解决方案是在remove me上使用InsertAfter,插入innerHtml,然后remove removeMe,但我不知道如何正确使用InsertAfter。@Codemaster,现在尝试一下好主意。编辑,没有replace方法,只有replaceChild方法。“bool KeepGrandChildren”,这绝对是最好的解决方案,谢谢!很好,对你有用吗?我总是发现异常:在集合中找不到节点“”-我的测试html:
    private void removeNode(HtmlAgilityPack.HtmlNode node, bool keepChildren) { var parent = node.ParentNode; if (keepChildren) { for ( int i = node.ChildNodes.Count - 1; i >= 0; i--) { parent.InsertAfter(node.ChildNodes[i], node); } } node.Remove(); }