Javascript 如何在HTML属性值中转义引号?

Javascript 如何在HTML属性值中转义引号?,javascript,html,Javascript,Html,我正在使用jQuery创建一个html字符串,创建一行以插入表中 var row = ""; row += "<tr>"; row += "<td>Name</td>"; row += "<td><input value='"+data.name+"'/></td>"; row += "</tr>"; var行=”; 行+=”; 行+=“名称”; 行+=”; 行+=”; data.name是从ajax调用返回

我正在使用jQuery创建一个html字符串,创建一行以插入表中

var row = "";
row += "<tr>";
row += "<td>Name</td>";
row += "<td><input value='"+data.name+"'/></td>";
row += "</tr>";
var行=”;
行+=”;
行+=“名称”;
行+=”;
行+=”;
data.name
是从ajax调用返回的字符串,可以包含任何字符。如果它包含一个引号,
,它将通过定义属性值的结尾来中断HTML

如何确保在浏览器中正确呈现字符串?

我认为您可以:

var row = "";
row += "<tr>";
row += "<td>Name</td>";
row += "<td><input value=\""+data.name+"\"/></td>";
row += "</tr>";
var行=”;
行+=”;
行+=“名称”;
行+=”;
行+=”;
如果您担心
data.name
中存在的单引号


在最好的情况下,您可以创建一个
INPUT
元素,然后为它创建
setValue(data.name)

您只需将任何
字符与等效的HTML实体字符代码交换:

data.name.replace(/'/g, "&#39;");
或者,您可以使用jQuery的DOM操作方法创建整个过程:

var row = $("<tr>").append("<td>Name</td><td></td>");
$("<input>", { value: data.name }).appendTo(row.children("td:eq(1)"));
var行=$(“”)。追加(“名称”);
$(“”,{value:data.name}).appendTo(row.children(“td:eq(1)”);
示例:

<div attr="Tim &quot;The Toolman&quot; Taylor"
<div attr='Tim "The Toolman" Taylor'
<div attr="Tim 'The Toolman' Taylor"
<div attr='Tim &#39;The Toolman&#39; Taylor'
因此,用“引用属性值”并使用如下函数:

function escapeAttrNodeValue(value) {
    return value.replace(/(&)|(")|(\u00A0)/g, function(match, amp, quote) {
        if (amp) return "&amp;";
        if (quote) return "&quot;";
        return "&nbsp;";
    });
}

实际上,您可能需要这两个函数中的一个(这取决于使用的上下文)。这些函数处理所有类型的字符串引号,并保护HTML/XML语法

1.用于将文本嵌入HTML/XML的
quoteattr()
函数:
quoteattr()
函数在上下文中使用,其中结果将由javascript计算,但必须由XML或HTML解析器解释,并且必须绝对避免破坏元素属性的语法

function quoteattr(s, preserveCR) {
    preserveCR = preserveCR ? '&#13;' : '\n';
    return ('' + s) /* Forces the conversion to string. */
        .replace(/&/g, '&amp;') /* This MUST be the 1st replacement. */
        .replace(/'/g, '&apos;') /* The 4 other predefined entities, required. */
        .replace(/"/g, '&quot;')
        .replace(/</g, '&lt;')
        .replace(/>/g, '&gt;')
        /*
        You may add other replacements here for HTML only 
        (but it's not necessary).
        Or for XML, only if the named entities are defined in its DTD.
        */ 
        .replace(/\r\n/g, preserveCR) /* Must be before the next replacement. */
        .replace(/[\r\n]/g, preserveCR);
        ;
}
如果生成文本元素的内容,则换行符会在本机上保留。但是,如果生成属性值,则此赋值将在设置后立即由DOM规范化,因此所有空格(空格、制表符、CR、LF)将被压缩,剥离前导和尾随空格,并将所有中间的空格序列缩减为单个空格

但是有一个例外:CR字符将被保留,不会被视为空白,仅当它用数字字符引用表示时才被视为。结果将对所有元素属性有效,NMTOKEN或ID类型的属性或NMTOKENS除外:引用CR的存在将使assigned值对于这些属性无效(例如HTML元素的id=“…”属性):此值无效,将被DOM忽略。但是在其他属性中(类型为CDATA),由数字字符引用表示的所有CR字符将被保留且不规范化。请注意,此技巧不适用于保留其他空白(空格、制表符、LF),即使它们由NCR表示,因为所有空白的规范化(NCR到CR除外)在所有属性中都是必需的

请注意,此函数本身不执行任何HTML/XML空白规范化,因此在生成文本元素的内容时仍然是安全的(对于这种情况,不要传递第二个preserveCR参数)

因此,如果您传递一个可选的第二个参数(其默认值将被视为false),并且如果该参数的计算结果为true,则当您要生成文字属性值时,将使用该NCR保留换行符,并且该属性的类型为CDATA(例如title=“…”属性)并且不是ID、IDLIST、NMTOKEN或NMTOKENS类型(例如ID=“…”属性)

除了
quoteattr()
,还可以仅使用DOM API: 如果您生成的HTML代码将是当前HTML文档的一部分,则另一种方法是使用文档的DOM方法单独创建每个HTML元素,这样您可以直接通过DOM API设置其属性值,而不是使用单个元素的innerHTML属性插入完整的HTML内容:

data.value = "It's just a \"sample\" <test>.\n\tTry & see yourself!";
var row = document.createElement('tr');
var cell = document.createElement('td');
cell.innerText = 'Name';
row.appendChild(cell);
cell = document.createElement('td');
var input = document.createElement('input');
input.setAttribute('value', data.value);
cell.appendChild(input);
tr.appendChild(cell);
/*
The HTML code is generated automatically and is now accessible in the
row.innerHTML property, which you are not required to insert in the
current document.

But you can continue by appending tr into a 'tbody' element object, and then
insert this into a new 'table' element object, which ou can append or insert
as a child of a DOM object of your document.
*/
警告!此源代码不会检查编码文档作为有效纯文本文档的有效性。但是它应该永远不会引发异常(内存不足情况除外):Javascript/JSON源字符串只是16位代码单元的不受限制的流,不需要是有效的纯文本,也不受HTML/XML文档语法的限制。这意味着代码不完整,还应替换:

  • 使用\xNN符号表示C0和C1控件的所有其他代码单元(TAB和LF除外,如上所述,但可以保持不变而不替换它们)
  • 在Unicode中分配给非字符的所有代码单位,应使用\unnn表示法替换(例如\uFFFE或\uFFFF)
  • 在\uD800..\DFFF范围内可用作Unicode代理的所有代码单元,如下所示:
    • 如果它们没有正确配对成有效的UTF-16对,表示整个范围U+0000..U+10FFFF中的有效Unicode代码点,则应使用符号\uDNNN单独替换这些代理代码单元
    • 否则,如果代码单元对表示的代码点在Unicode纯文本中无效,因为该代码点分配给非字符,则应使用符号\U00NNNNNN替换这两个代码点
  • 最后,如果由代码单元(或在补充平面中表示一个代码点的代码单元对)表示的代码点在HTML/XML源文档中无效(请参见其规范),则应使用\unnn符号替换该代码点,无论该代码点是否已分配或保留/未分配(如果代码点位于BMP中)或\u00NNNNNN(如果代码点位于补充平面中)
还要注意的是,最后5次更换并不是真的需要,但你确实需要
function quoteattr(s, preserveCR) {
    preserveCR = preserveCR ? '&#13;' : '\n';
    return ('' + s) /* Forces the conversion to string. */
        .replace(/&/g, '&amp;') /* This MUST be the 1st replacement. */
        .replace(/'/g, '&apos;') /* The 4 other predefined entities, required. */
        .replace(/"/g, '&quot;')
        .replace(/</g, '&lt;')
        .replace(/>/g, '&gt;')
        /*
        You may add other replacements here for HTML only 
        (but it's not necessary).
        Or for XML, only if the named entities are defined in its DTD.
        */ 
        .replace(/\r\n/g, preserveCR) /* Must be before the next replacement. */
        .replace(/[\r\n]/g, preserveCR);
        ;
}
data.value = "It's just a \"sample\" <test>.\n\tTry & see yourself!";
var row = '';
row += '<tr>';
row += '<td>Name</td>';
row += '<td><input value="' + quoteattr(data.value) + '" /></td>';
row += '</tr>';
data.value = "It's just a \"sample\" <test>.\n\tTry & see yourself!";
var row = document.createElement('tr');
var cell = document.createElement('td');
cell.innerText = 'Name';
row.appendChild(cell);
cell = document.createElement('td');
var input = document.createElement('input');
input.setAttribute('value', data.value);
cell.appendChild(input);
tr.appendChild(cell);
/*
The HTML code is generated automatically and is now accessible in the
row.innerHTML property, which you are not required to insert in the
current document.

But you can continue by appending tr into a 'tbody' element object, and then
insert this into a new 'table' element object, which ou can append or insert
as a child of a DOM object of your document.
*/
function escape(s) {
    return ('' + s) /* Forces the conversion to string. */
        .replace(/\\/g, '\\\\') /* This MUST be the 1st replacement. */
        .replace(/\t/g, '\\t') /* These 2 replacements protect whitespaces. */
        .replace(/\n/g, '\\n')
        .replace(/\u00A0/g, '\\u00A0') /* Useful but not absolutely necessary. */
        .replace(/&/g, '\\x26') /* These 5 replacements protect from HTML/XML. */
        .replace(/'/g, '\\x27')
        .replace(/"/g, '\\x22')
        .replace(/</g, '\\x3C')
        .replace(/>/g, '\\x3E')
        ;
}
var title = "It's a \"title\"!";
var msg   = "Both strings contain \"quotes\" & 'apostrophes'...";
setTimeout(
    '__forceCloseDialog("myDialog", "' +
        escape(title) + '", "' +
        escape(msg) + '")',
    2000);
var msg =
    "It's just a \"sample\" <test>.\n\tTry & see yourself!";
/* This is similar to the above, but this JavaScript code will be reinserted below: */ 
var scriptCode =
    'alert("' +
    escape(msg) + /* important here!, because part of a JS string literal */
    '");';

/* First case (simple when inserting in a text element): */
document.write(
    '<script type="text/javascript">' +
    '\n//<![CDATA[\n' + /* (not really necessary but improves compatibility) */
    scriptCode +
    '\n//]]>\n' +       /* (not really necessary but improves compatibility) */
    '</script>');

/* Second case (more complex when inserting in an HTML attribute value): */
document.write(
    '<span onclick="' +
    quoteattr(scriptCode) + /* important here, because part of an HTML attribute */
    '">Click here !</span>');
function unquoteattr(s) {
    /*
    Note: this can be implemented more efficiently by a loop searching for
    ampersands, from start to end of ssource string, and parsing the
    character(s) found immediately after after the ampersand.
    */
    s = ('' + s); /* Forces the conversion to string type. */
    /*
    You may optionally start by detecting CDATA sections (like
    `<![CDATA[` ... `]]>`), whose contents must not be reparsed by the
    following replacements, but separated, filtered out of the CDATA
    delimiters, and then concatenated into an output buffer.
    The following replacements are only for sections of source text
    found *outside* such CDATA sections, that will be concatenated
    in the output buffer only after all the following replacements and
    security checkings.

    This will require a loop starting here.

    The following code is only for the alternate sections that are
    not within the detected CDATA sections.
    */
    /* Decode by reversing the initial order of replacements. */
    s = s
        .replace(/\r\n/g, '\n') /* To do before the next replacement. */ 
        .replace(/[\r\n]/, '\n')
        .replace(/&#13;&#10;/g, '\n') /* These 3 replacements keep whitespaces. */
        .replace(/&#1[03];/g, '\n')
        .replace(/&#9;/g, '\t')
        .replace(/&gt;/g, '>') /* The 4 other predefined entities required. */
        .replace(/&lt;/g, '<')
        .replace(/&quot;/g, '"')
        .replace(/&apos;/g, "'")
        ;
    /*
    You may add other replacements here for predefined HTML entities only 
    (but it's not necessary). Or for XML, only if the named entities are
    defined in *your* assumed DTD.
    But you can add these replacements only if these entities will *not* 
    be replaced by a string value containing *any* ampersand character.
    Do not decode the '&amp;' sequence here !

    If you choose to support more numeric character entities, their
    decoded numeric value *must* be assigned characters or unassigned
    Unicode code points, but *not* surrogates or assigned non-characters,
    and *not* most C0 and C1 controls (except a few ones that are valid
    in HTML/XML text elements and attribute values: TAB, LF, CR, and
    NL='\x85').

    If you find valid Unicode code points that are invalid characters
    for XML/HTML, this function *must* reject the source string as
    invalid and throw an exception.

    In addition, the four possible representations of newlines (CR, LF,
    CR+LF, or NL) *must* be decoded only as if they were '\n' (U+000A).

    See the XML/HTML reference specifications !
    */
    /* Required check for security! */
    var found = /&[^;]*;?/.match(s);
    if (found.length >0 && found[0] != '&amp;')
        throw 'unsafe entity found in the attribute literal content';
     /* This MUST be the last replacement. */
    s = s.replace(/&amp;/g, '&');
    /*
    The loop needed to support CDATA sections will end here.
    This is where you'll concatenate the replaced sections (CDATA or
    not), if you have splitted the source string to detect and support
    these CDATA sections.

    Note that all backslashes found in CDATA sections do NOT have the
    semantic of escapes, and are *safe*.

    On the opposite, CDATA sections not properly terminated by a
    matching `]]>` section terminator are *unsafe*, and must be rejected
    before reaching this final point.
    */
    return s;
}
function unescape(s) {
    /*
    Note: this can be implemented more efficiently by a loop searching for
    backslashes, from start to end of source string, and parsing and
    dispatching the character found immediately after the backslash, if it
    must be followed by additional characters such as an octal or
    hexadecimal 7-bit ASCII-only encoded character, or an hexadecimal Unicode
    encoded valid code point, or a valid pair of hexadecimal UTF-16-encoded
    code units representing a single Unicode code point.

    8-bit encoded code units for non-ASCII characters should not be used, but
    if they are, they should be decoded into a 16-bit code units keeping their
    numeric value, i.e. like the numeric value of an equivalent Unicode
    code point (which means ISO 8859-1, not Windows 1252, including C1 controls).

    Note that Javascript or JSON does NOT require code units to be paired when
    they encode surrogates; and Javascript/JSON will also accept any Unicode
    code point in the valid range representable as UTF-16 pairs, including
    NULL, all controls, and code units assigned to non-characters.
    This means that all code points in \U00000000..\U0010FFFF are valid,
    as well as all 16-bit code units in \u0000..\uFFFF, in any order.
    It's up to your application to restrict these valid ranges if needed.
    */
    s = ('' + s) /* Forces the conversion to string. */
    /* Decode by reversing the initial order of replacements */
        .replace(/\\x3E/g, '>')
        .replace(/\\x3C/g, '<')
        .replace(/\\x22/g, '"')
        .replace(/\\x27/g, "'")
        .replace(/\\x26/g, '&') /* These 5 replacements protect from HTML/XML. */
        .replace(/\\u00A0/g, '\u00A0') /* Useful but not absolutely necessary. */
        .replace(/\\n/g, '\n')
        .replace(/\\t/g, '\t') /* These 2 replacements protect whitespaces. */
        ;
    /*
    You may optionally add here support for other numerical or symbolic
    character escapes.
    But you can add these replacements only if these entities will *not* 
    be replaced by a string value containing *any* backslash character.
    Do not decode to any doubled backslashes here !
    */
    /* Required check for security! */
    var found = /\\[^\\])?/.match(s);
    if (found.length > 0 && found[0] != '\\\\')
        throw 'Unsafe or unsupported escape found in the literal string content';
    /* This MUST be the last replacement. */
    return s.replace(/\\\\/g, '\\');
}
$("<a>", { href: 'very<script>\'b"ad' }).text('click me')[0].outerHTML
const serialised = _.escape("Here's a string that could break HTML");

// Add it into data-attr in HTML
<a data-value-serialised=" + serialised + " onclick="callback()">link</a>
// and then at JS where this value will be read:
function callback(e) {  
    $(e.currentTarget).data('valueSerialised'); // with a bit of help from jQuery

    const originalString = _.unescape(serialised); // can be used as part of a payload or whatever.
}