Javascript 使用Xml.parse（）删除标记内容与值数组匹配的html标记和内容_Javascript_Xml Parsing_Google Apps Script

Javascript 使用Xml.parse（）删除标记内容与值数组匹配的html标记和内容

javascript google-apps-script

Javascript 使用Xml.parse（）删除标记内容与值数组匹配的html标记和内容,javascript,xml-parsing,google-apps-script,Javascript,Xml Parsing,Google Apps Script,我使用.getBody（）从GmailApp中提取了一些html，并希望返回一些html，这些html过滤特定的标记和内容，其中的内容与数组中的任何值匹配（特别是与特定文本链接）。看一看，我认为最简单的方法是使用Xml.parse（）并过滤对象，但只能创建XmlDocument 例如，如果： var html = '<div>some text then <div><a href="http://example1.com">foo</a></

我使用.getBody（）从GmailApp中提取了一些html，并希望返回一些html，这些html过滤特定的标记和内容，其中的内容与数组中的任何值匹配（特别是与特定文本链接）。看一看，我认为最简单的方法是使用

Xml.parse（）

并过滤对象，但只能创建XmlDocument

例如，如果：

var html = '<div>some text then <div><a href="http://example1.com">foo</a></div> and then <span>some <a href="http://example2.com">baa</a>,and finally <a href="http://example3.com">close</a></span></div>';

我怎样才能回来

var newHtml = '<div>some text then <div></div> and then <span>some ,and finally <a href="http://example3.com">close</a></span></div>';

我可以得到一个对象来处理，但是它都是从那里分离的（我也只是考虑使用<代码> .Rebug（）/代码>但是考虑到与ReGEX匹配的问题，最好避免）

< P>建议选择使用ReGEX

var html = '<div>some text then <div><a href="http://example1.com">foo</a></div> and then <span>some <a href="http://example2.com">baa</a>,and finally <a href="http://example3.com">close</a></span></div>';

var linksToRemove = ['baa', 'foo'];
var newHtml = cleanBody(html, linksToRemove);

/**
 * Removes links from html text
 * @param {string} html The html to be cleaned.
 * @param {array} exclude The array of link text to remove.
 * @returns {string} Cleaned html.
 */
function cleanBody(html, exclude) {
    html = html.replace(/\r?\n|\r|\t/g, ''); // used to remove breaks and tabs
    var re = '<a\\b[^>]*>(' + exclude.join('|') + ')<\\/a>';
    return html.replace(new RegExp(re, 'ig'), "");
}

var html='先是一些文本，然后是一些文本，最后是'；
var linksToRemove=['baa'，'foo']；
var newHtml=cleanBody（html，linksToRemove）；
/**
*从html文本中删除链接
*@param{string}html要清理的html。
*@param{array}排除要删除的链接文本数组。
*@returns{string}清除html。
*/
函数cleanBody（html，排除）{
html=html.replace（//\r？\n |\r |\t/g'，）；//用于删除分隔符和制表符
var re='
以下建议选择尝试使用正则表达式
var html = '<div>some text then <div><a href="http://example1.com">foo</a></div> and then <span>some <a href="http://example2.com">baa</a>,and finally <a href="http://example3.com">close</a></span></div>';

var linksToRemove = ['baa', 'foo'];
var newHtml = cleanBody(html, linksToRemove);

/**
 * Removes links from html text
 * @param {string} html The html to be cleaned.
 * @param {array} exclude The array of link text to remove.
 * @returns {string} Cleaned html.
 */
function cleanBody(html, exclude) {
    html = html.replace(/\r?\n|\r|\t/g, ''); // used to remove breaks and tabs
    var re = '<a\\b[^>]*>(' + exclude.join('|') + ')<\\/a>';
    return html.replace(new RegExp(re, 'ig'), "");
}

var html='先是一些文本，然后是一些文本，最后是'；
var linksToRemove=['baa'，'foo']；
var newHtml=cleanBody（html，linksToRemove）；
/**
*从html文本中删除链接
*@param{string}html要清理的html。
*@param{array}排除要删除的链接文本数组。
*@returns{string}清除html。
*/
函数cleanBody（html，排除）{
html=html.replace（//\r？\n |\r |\t/g'，）；//用于删除分隔符和制表符
var re='
有一条不成文的规则，即不使用正则表达式解析html，但在这个实例中（这是一个简单的查找和替换）这是我将使用的方法。XML.parse假设一个格式良好的XML文档。HTML通常不是，尽管是出于好意。@Jonathon在执行此操作时遇到的问题是，替换对硬编码测试数据没有问题，但在.getBody（）HTML中失败。我的基本测试正则表达式是HTML.replace（/]*>（管理订阅）/ig“”）；
Does.getBody返回转义html吗？@Jonathon发现问题getBody（）响应中有一个回车符。不使用正则表达式解析html是一条不成文的规则，但在本例中（这是一个简单的查找和替换）这是我将使用的方法。XML.parse假设一个格式良好的XML文档。HTML通常不是，尽管是出于好意。@Jonathon在执行此操作时遇到的问题是，替换对硬编码测试数据没有问题，但在.getBody（）HTML中失败。我的基本测试正则表达式是HTML.replace（/]*>（管理订阅）/ig“”）；Does.getBody返回转义html吗？@Jonathon发现问题getBody（）中有一个回车符回应如果你对某种格式有某种特定的需求，而这种格式你或多或少都知道或更好地控制，regex工作得很好。如果你想构建一个浏览器，这将是最糟糕的方法之一：DIf如果你对某种格式有某种特定的需求，而这种格式你或多或少都知道或更好地控制，regex工作得很好。如果你想这将是做事情最糟糕的方式之一
var html = '<div>some text then <div><a href="http://example1.com">foo</a></div> and then <span>some <a href="http://example2.com">baa</a>,and finally <a href="http://example3.com">close</a></span></div>';

var linksToRemove = ['baa', 'foo'];
var newHtml = cleanBody(html, linksToRemove);

/**
 * Removes links from html text
 * @param {string} html The html to be cleaned.
 * @param {array} exclude The array of link text to remove.
 * @returns {string} Cleaned html.
 */
function cleanBody(html, exclude) {
    html = html.replace(/\r?\n|\r|\t/g, ''); // used to remove breaks and tabs
    var re = '<a\\b[^>]*>(' + exclude.join('|') + ')<\\/a>';
    return html.replace(new RegExp(re, 'ig'), "");
}