C# Html不';无法使用Html敏捷包进行更新
我试图从一段html中删除img和map元素C# Html不';无法使用Html敏捷包进行更新,c#,html,html-agility-pack,C#,Html,Html Agility Pack,我试图从一段html中删除img和map元素 HtmlDocument doc = new HtmlDocument(); doc.LoadHtml(html); var oldHtml = doc.DocumentNode.InnerHtml; if (doc.DocumentNode.SelectNodes("//img[@usemap]") != null) { HtmlNode img = doc.DocumentNode.SelectSingleNode("//img[@u
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(html);
var oldHtml = doc.DocumentNode.InnerHtml;
if (doc.DocumentNode.SelectNodes("//img[@usemap]") != null)
{
HtmlNode img = doc.DocumentNode.SelectSingleNode("//img[@usemap]");
img.ParentNode.RemoveChild(img);
}
if (doc.DocumentNode.SelectNodes("//map") != null)
{
HtmlNode map = doc.DocumentNode.SelectSingleNode("//map");
map.ParentNode.RemoveChild(map);
}
var newHtml = doc.DocumentNode.InnerHtml;
新的HTML仍然包含img和map元素。在更新html之前,是否需要执行其他操作
以下是我试图剥离的html:
<p><img src="/media/8301/HD00_498x299.jpg" width="498" height="299" alt="HD00.JPG" usemap="#imgmap201392714219"/><br />
<br />
<a title="Download ZIP DWG"
href="/media/8103/detailtekeningen-dwg-unidek-aero.zip"
target="_blank">Klik hier om alle DWG bestanden in
een zipfile te downloaden.</a><br />
<a title="Download DXF"
href="/media/8104/detailtekeningen-dxf-unidek-aero.zip"
target="_blank">Klik hier om alle DXF bestanden in een zipfile te
downloaden.</a><br />
<a title="Download PDF"
href="/media/8116/detailtekeningen-pdf-unidek-aero.zip"
target="_blank">Klik hier om alle PDF bestanden in een zipfile te
downloaden.</a><br />
<br />
<strong><a title="Bouwdetails berekende psi-waarden"
href="/{localLink:8014}" target="_blank">Link naar de technische
bouwdetails met verbeterde eigen ψ-waarden<br />
</a></strong> <map name="imgmap2012104102243"
id="imgmap2012104102243">
<area title="" href="/nl/producten/hellend-dak/unidek-aero/1"
shape="rect" coords="194,419,219,439" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/2"
shape="rect" coords="221,420,246,439" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/3"
shape="rect" coords="200,302,226,320" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/4"
shape="rect" coords="209,167,234,185" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/6"
shape="rect" coords="68,46,98,67" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/7"
shape="rect" coords="102,203,129,224" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/8"
shape="rect" coords="273,339,302,360" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/9"
shape="rect" coords="387,350,417,372" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/10"
shape="rect" coords="324,341,354,363" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/11"
shape="rect" coords="223,369,252,390" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/12"
shape="rect" coords="62,270,89,294" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/13"
shape="rect" coords="93,270,119,294" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/14"
shape="rect" coords="31,94,60,114" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/14"
shape="rect" coords="79,161,106,182" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/15"
shape="rect" coords="19,150,50,171" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/15"
shape="rect" coords="82,113,110,134" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/16"
shape="rect" coords="176,231,205,253" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/17"
shape="rect" coords="147,179,176,200" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/18"
shape="rect" coords="139,235,166,257" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/19"
shape="rect" coords="204,56,231,78" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/20"
shape="rect" coords="125,135,153,157" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/21"
shape="rect" coords="265,263,290,284" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/23"
shape="rect" coords="9,202,36,225" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/24"
shape="rect" coords="39,202,65,225" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/25"
shape="rect" coords="158,80,184,101" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/26"
shape="rect" coords="188,80,213,102" target="_blank" alt="" />
</map><map id="imgmap201392714219">
<area title="" href="/nl/producten/hellend-dak/unidek-aero/1"
shape="rect" coords="265,463,279,480" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/2"
shape="rect" coords="282,466,297,480" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/3"
shape="rect" coords="213,339,237,358" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/4"
shape="rect" coords="206,204,227,220" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/6"
shape="rect" coords="113,105,135,121" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/7"
shape="rect" coords="134,246,154,262" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/8"
shape="rect" coords="299,369,319,386" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/9"
shape="rect" coords="432,409,453,425" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/10"
shape="rect" coords="363,394,385,413" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/11"
shape="rect" coords="254,406,276,422" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/12"
shape="rect" coords="105,298,122,314" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/13"
shape="rect" coords="122,298,139,314" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/14"
shape="rect" coords="53,121,77,139" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/15"
shape="rect" coords="49,165,72,182" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/16"
shape="rect" coords="195,272,214,288" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/17"
shape="rect" coords="152,212,175,230" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/18"
shape="rect" coords="160,276,180,293" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/19"
shape="rect" coords="234,88,255,105" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/20"
shape="rect" coords="132,155,158,174" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/21"
shape="rect" coords="299,294,321,311" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/23"
shape="rect" coords="40,234,55,250" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/24"
shape="rect" coords="56,233,73,251" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/25"
shape="rect" coords="185,108,202,127" target="_blank" alt="" />
<area title="" href="/nl/producten/hellend-dak/unidek-aero/26"
shape="rect" coords="203,109,219,127" target="_blank" alt="" />
</map></p>
当我调试时,找到了img和map元素,但是调用RemoveChild根本不会改变html。另外,当我尝试更改属性或其他内容时,什么也不会发生。删除节点后,HtmlAlityPack似乎不会更新
HtmlDocument.DocumentNode.InnerHtml
属性。最简单的解决方法是使用OuterHtml
属性,而不是InnerHtml
:
var newHtml = doc.DocumentNode.OuterHtml;
到目前为止,我总是使用OuterHtml
属性来检查我所做的更改是否产生了预期的结果,并且现在才意识到InnerHtml
的这种行为
更新:
在发布的HTML示例中,您有2个
元素。你的代码只删除一个。尝试用这种方法删除所有
和
节点:
if (doc.DocumentNode.SelectNodes("//img[@usemap]") != null)
{
HtmlNodeCollection imgs = doc.DocumentNode.SelectNodes("//img[@usemap]");
foreach (HtmlNode img in imgs)
{
img.ParentNode.RemoveChild(img);
}
}
if (doc.DocumentNode.SelectNodes("//map") != null)
{
HtmlNodeCollection maps = doc.DocumentNode.SelectNodes("//map");
foreach (HtmlNode map in maps)
{
map.ParentNode.RemoveChild(map);
}
}
var newHtml = doc.DocumentNode.OuterHtml;
[]这对我很有用:
var doc = new HtmlDocument();
doc.LoadHtml(html);
var root = doc.DocumentNode;
if (root != null)
{
var replace = false;
images = root.SelectNodes("//img[@usemap]");
if (images != null)
{
foreach (var image in images)
{
image.ParentNode.RemoveChild(image);
}
replace = true;
}
if (replace)
{
html = root.OuterHtml;
}
}
var newhtml = html;
图像已从html中删除。到目前为止,在html agility pack开始工作之前,我需要在Umbraco中执行此操作:
var documents = Document.GetDocumentsOfDocumentType(5125);
var document = documents.Where(x => x.Id == 5127).First();
var html = document.getProperty("content").Value.ToString();
html = html.Replace("\r\n", "");
html = umbraco.library.RemoveFirstParagraphTag(html);
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(html);
我刚刚发现HTML Agility pack的缺陷在于,您只能请求
.InnerHtml
一次。之后,它将不会更新。你要求它两次:
HtmlDocument doc = new HtmlDocument();
doc.LoadHtml(html);
var oldHtml = doc.DocumentNode.InnerHtml;
if (doc.DocumentNode.SelectNodes("//img[@usemap]") != null)
{
HtmlNode img = doc.DocumentNode.SelectSingleNode("//img[@usemap]");
img.ParentNode.RemoveChild(img);
}
if (doc.DocumentNode.SelectNodes("//map") != null)
{
HtmlNode map = doc.DocumentNode.SelectSingleNode("//map");
map.ParentNode.RemoveChild(map);
}
var newHtml = doc.DocumentNode.InnerHtml;
如果你去掉这一行:
var oldHtml = doc.DocumentNode.InnerHtml;
它应该会起作用。这似乎是一个带有HtmlAlityPack的随机错误
Sniffdk的解决方案有效,因为他只获得一次
.OuterHtml
。HtmlLityPack的人需要解决这个问题。我尝试了OuterHtml,但结果是一样的。html仍然没有更新。更新了我的答案,根据您的html样本测试了代码(包装在
中),在这里对我来说效果很好,newHtml
最后只包含
。看起来html有问题。此示例可以工作:,但我从Umbraco RTE获取的html无法工作。可能是因为它在html中有本地链接。不确定为什么您认为本地链接与此问题有关。。无论如何,如果不能够重现问题,就很难进一步诊断。如果您使用我在问题中发布的更新html,您应该能够重现问题,因为该html不起作用。PS-snifdk下面的解决方案有效,因为他只获得一次.OuterHtml。HTMLAlityPack的家伙们需要解决:)我在HTMLAlityPack 1.4.6
(对InnerHtml的分配没有反映在调用OuterHtml中)上有一个类似的问题,它在升级到HTMLAlityPack 1.4.9.5
后消失了。也许这个版本也能解决你的问题?