C# 如何从页面上的每个表单元素中去掉一个公共属性？_C#_.net_Html_Regex

C# 如何从页面上的每个表单元素中去掉一个公共属性？

c# .net html regex

C# 如何从页面上的每个表单元素中去掉一个公共属性？,c#,.net,html,regex,C#,.net,Html,Regex,我有一个字符串变量，它包含HTML页面的响应。它包含数百个标记，包括以下三个html标记： <tag1 prefix1314030136543="2"> <tag2 prefix131403013654="1" anotherAttribute="432"> <tag3 prefix13140301376543="4"> 我需要能够去除任何以“prefix”开头的属性及其值，而不管标记名如何。最后，我希望： <tag1> <tag2 a

我有一个字符串变量，它包含HTML页面的响应。它包含数百个标记，包括以下三个html标记：

<tag1 prefix1314030136543="2">
<tag2 prefix131403013654="1" anotherAttribute="432">
<tag3 prefix13140301376543="4">

我需要能够去除任何以“prefix”开头的属性及其值，而不管标记名如何。最后，我希望：

<tag1>
<tag2 anotherAttribute="432">
<tag3>

我用的是C#。我假设正则表达式是解决方案，但我对正则表达式很反感，希望有人能帮我解决这个问题。

正则表达式不是解决方案，因为HTML不是一种常规语言，因此不应该用正则表达式解析。我听说过解析和使用HTML的好方法。看看吧。

看看

使用正则表达式：

(?<=<[^<>]*)\sprefix\w+="[^"]"\s?(?=[^<>]*>)

var result = Regex.Replace(s, 
    @"(?<=<[^<>]*)\sprefix\w+=""[^""]""(?=[^<>]*>)", string.Empty);

（？这是一种严厉的方法
    String str = "<tag1 prefix131403013654=\"2\">"; 
            while (str.IndexOf("prefix131403013654=\"") != -1) //At least one still exists...
            {
               int point = str.IndexOf("prefix131403013654=\"");
               int length = "prefix131403013654=\"".Length;

               //need to grab last part now. We know there's a leading double quote and a ending double quote surrounding it, so we find the second quote.
               int secondQuote = str.IndexOf("\"",point + length); //second part is your position
               if (str.Substring(point - 1, 1) == " ")
               {
                  str = str.Replace(str.Substring(point, (secondQuote - point + 1)),"");
               }
            }

String str=”“；
while（str.IndexOf（“prefix131403013654=\”）！=-1）//至少仍存在一个。。。
{
int point=str.IndexOf（“前缀为131403013654=\”）；
int length=“prefix131403013654=\”.length；
//现在我们需要抓住最后一部分。我们知道它周围有一个前导双引号和一个结尾双引号，所以我们找到了第二个引号。
int secondQuote=str.IndexOf（“\”，点+长度）；//第二部分是您的位置
if（str.Substring（点-1，1）==“”）
{
str=str.Replace（str.Substring（point，（secondQuote-point+1）），“”）；
}
}

为更好的代码而编辑。测试后再次编辑，添加+1替换以计算最终报价。这很有效。基本上，您可以将其包含在一个循环中，该循环通过一个数组列表，其中包含所有“删除这些”值
如果您不知道完整前缀的名称，可以这样更改：
 String str = "<tag1 prefix131403013654=\"2\">"; 
            while (str.IndexOf("prefix") != -1) //At least one still exists...
            {
               int point = str.IndexOf("prefix");

               int firstQuote = str.IndexOf("\"", point);

               int length = firstQuote - point + 1;
               //need to grab last part now. We know there's a leading double quote and a ending double quote surrounding it, so we find the second quote.
               int secondQuote = str.IndexOf("\"",point + length); //second part is your position
               if (str.Substring(point - 1, 1) == " ") //checking if its actually a prefix
               {
                   str = str.Replace(str.Substring(point, (secondQuote - point + 1)),"");
               }
               //Like I said, a very heavy way of doing it.
            }

String str=”“；
虽然（str.IndexOf（“prefix”）！=-1）//至少仍存在一个。。。
{
int point=str.IndexOf（“前缀”）；
int firstQuote=str.IndexOf（“\”，点）；
int length=firstQuote-点+1；
//现在我们需要抓住最后一部分。我们知道它周围有一个前导双引号和一个结尾双引号，所以我们找到了第二个引号。
int secondQuote=str.IndexOf（“\”，点+长度）；//第二部分是您的位置
if（str.Substring（point-1，1）==“”）//检查它是否实际上是一个前缀
{
str=str.Replace（str.Substring（point，（secondQuote-point+1）），“”）；
}
//就像我说的，这是一种非常沉重的方式。
}

这将捕获以前缀开头的所有字符。
html=Regex.Replace（html，@“（？]*>），”）；
html = Regex.Replace(html, @"(?<=<\w+\s[^>]*)\s" + Regex.Escape(prefix) + @"\w+\s?=\s?""[^""]*""(?=[^>]*>)", "");


你向后看，向前看，会发现，然后你有一个匹配前缀的匹配器，前缀是“？”如果你的正则表达式不匹配，前缀前面总是有一个空格\s
html = Regex.Replace(html, @"(?<=<\w+\s[^>]*)\s" + Regex.Escape(prefix) + @"\w+\s?=\s?""[^""]*""(?=[^>]*>)", "");