Warning: file_get_contents(/data/phpspider/zhask/data//catemap/9/javascript/415.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Javascript 为什么可以';我不能在单词边界旁使用重音字符吗?_Javascript_Regex_Unicode_Replace_Diacritics - Fatal编程技术网

Javascript 为什么可以';我不能在单词边界旁使用重音字符吗?

Javascript 为什么可以';我不能在单词边界旁使用重音字符吗?,javascript,regex,unicode,replace,diacritics,Javascript,Regex,Unicode,Replace,Diacritics,我正在尝试创建一个匹配人名的动态正则表达式。它在大多数名字上都没有问题,直到我在名字的末尾遇到了重音字符 例子:一些花式的纳美 到目前为止,我使用的正则表达式是: /\b(Fancy Namé|Namé)\b/i 这样使用: "Goal: Some Fancy Namé. Awesome.".replace(/\b(Fancy Namé|Namé)\b/i, '<a href="#">$1</a>'); “目标:一些花式游戏。很棒。”。替换(/\b(花式游戏)\b/i

我正在尝试创建一个匹配人名的动态正则表达式。它在大多数名字上都没有问题,直到我在名字的末尾遇到了重音字符

例子:一些花式的纳美

到目前为止,我使用的正则表达式是:

/\b(Fancy Namé|Namé)\b/i
这样使用:

"Goal: Some Fancy Namé. Awesome.".replace(/\b(Fancy Namé|Namé)\b/i, '<a href="#">$1</a>');
“目标:一些花式游戏。很棒。”。替换(/\b(花式游戏)\b/i,”;
这根本不匹配。如果我用e替换é,它匹配得很好。 如果我试着匹配一个名字,比如“一些花式的纳美亚”,它就很好用。 如果我移除单词最后一个单词边界锚,它就可以正常工作

为什么边界标志这个词在这里不起作用?对我如何解决这个问题有什么建议吗

我曾考虑过使用类似的方法,但我不确定性能惩罚是什么:

"Some fancy namé. Allow me to ellaborate.".replace(/([\s.,!?])(fancy namé|namé)([\s.,!?]|$)/g, '$1<a href="#">$2</a>$3')
“一些花式的纳梅。请允许我添加硼酸。”。替换(/([\s,!?])(花式纳梅)([\s,!?]|$)/g,“$1$3”)

建议?想法?

在使用正则表达式时,可以尝试使用
\o
\x
标志

世界末日可能会帮助你


至于实际的八进制/十六进制值与什么相关,我不确定。

JavaScript的正则表达式实现不支持Unicode。它只知道标准低字节ASCII中的“单词字符”,不包括
é
或任何其他重音或非英语字母

因为
é
对JS来说不是单词字符,
é
后面跟空格永远不能被视为单词边界。(如果在一个词的中间使用,它将匹配<代码> \b/COD>,如<代码> NaNEYS )

/([\s,!?])(花式纳美)([\s,!?])/


是的,这将是JS通常的解决方法(尽管可能有更多的标点符号)。对于其他语言,您通常会使用lookahead/lookahead来避免前后边界字符的匹配,但是JS中对这些字符的支持很差/buggy,因此最好避免使用它们。

Rob是正确的。引自ECMAScript第3版:

15.10.2.6断言:

生产断言
\b
的计算方法是

2.调用IsWordChar(e−1) 设a为布尔结果
3.调用IsWordChar(e)并将b作为布尔结果

内部辅助函数是WordChar。。。执行以下操作:

3.如果c是下表中63个字符之一,则返回true

a b c d e f g h i j k l m n o p q r s t u v w x y z
A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
0 1 2 3 4 5 6 7 8 9 _
由于
é
不是这63个字符中的一个,因此
é
a
之间的位置将被视为单词边界

如果您知道字符的类别,则可以使用否定的前瞻断言,例如

/(^|[^\wÀ-ÖØ-öø-ſ])(Fancy Namé|Namé)(?![\wÀ-ÖØ-öø-ſ])/
replace()接受回调函数作为其第二个参数。(不知道为什么这么多JS教程忽略了这个有用的特性。)因此,我们可以编写自己的单词边界测试

其他地方提出的解决方案,使用regexp
/(\W| ^)(fancy namé| namé)(\W|$)/ig
,在诸如“nam|é”之类的文本情况下给出误报

String.prototype.isWordCharAt = function(i) {
    // should work for European languages and Unicode
    return (this.charAt(i) >= 'A' && this.charAt(i) <= 'Z')
        || (this.charAt(i) >= 'a' && this.charAt(i) <= 'z')
        || (this.charCodeAt(i) >= 0xC0 && this.charCodeAt(i) < 0x2000)
    ;
};

"Namé. Goal: Some Fancy Namé. Namé. Nénamé. Namée. Nénamée. Namé"
.replace(/(Namé|Fancy Namé)/ig, function(
match, part1, /* part2, part3, ... */ offset, fullText) {
  // Keep in mind that the number of arguments changes
  // if the number of capturing parenthesis in regexp changes.
  // We could use 'arguments' pseudo-array instead.
  var len1 = part1.length;
  var leftWordBoundary;
  var rightWordBoundary;

  if (offset === 0) {
    leftWordBoundary = fullText.isWordCharAt(offset);
  }
  else {
    leftWordBoundary = (fullText.isWordCharAt(offset - 1)
      != fullText.isWordCharAt(offset));
  }

  if (offset + len1 == fullText.length) {
    rightWordBoundary = fullText.isWordCharAt(offset + len1 - 1);
  }
  else {
    rightWordBoundary = (fullText.isWordCharAt(offset + len1 - 1)
      != fullText.isWordCharAt(offset + len1));
  }

  if (leftWordBoundary && rightWordBoundary) {
    return '<a href="#">' + part1 + '</a>';
  }
  else {
    return part1;
  }
});
String.prototype.isWordCharAt=函数(i){
//应该适用于欧洲语言和Unicode
return(this.charAt(i)>='A'和&this.charAt(i)='A'和&this.charAt(i)=0xC0和&this.charCodeAt(i)<0x2000)
;
};
“纳梅。进球:一些花哨的纳梅。纳梅。纳梅。纳梅。纳梅。纳梅。”
.替换(/(Namé| Fancy Namé)/ig,功能(
匹配,第1部分,/*第2部分,第3部分,*/偏移量,全文){
//请记住,参数的数量会发生变化
//如果regexp中捕获括号的数目更改。
//我们可以改用“arguments”伪数组。
var len1=零件1.长度;
var-leftWordBoundary;
var rightWordBoundary;
如果(偏移量===0){
leftWordBoundary=全文.isWordCharAt(偏移量);
}
否则{
leftWordBoundary=(全文.isWordCharAt(偏移量-1)
!=全文.isWordCharAt(偏移量));
}
if(offset+len1==fullText.length){
rightWordBoundary=全文.isWordCharAt(偏移量+len1-1);
}
否则{
rightWordBoundary=(全文.isWordCharAt(偏移量+len1-1)
!=全文.isWordCharAt(偏移量+len1));
}
if(leftWordBoundary&&rightWordBoundary){
返回“”;
}
否则{
返回部分1;
}
});
了解自己的边界 不幸的是,即使有一天Javascript能够完全正确地支持Unicode,您仍然必须非常小心单词边界。很容易误解a
\b
的真正作用

下面是解释
\b
真正在做什么的Perl代码,无论您的模式引擎是否已经过BNM升级,这都是正确的:

  # if next is word char:
  #     then last isn't    word
  #     else last isn't nonword

    $word_boundary_before = qr{ (?(?=  \w ) (?<! \w ) | (?<! \W ) ) }x;

  # if last is word:
  #     then next isn't    word
  #     else next isn't nonword

    $word_boundary_after  = qr{ (?(?<= \w ) (?!  \w ) | (?!  \W ) ) }x;
您只能在Java和Javascript中使用短名称,但Perl也允许您使用长名称,这有助于提高易读性。5.12版Perl支持大约3000个Unicode属性。Python仍然没有任何值得一提的Unicode属性支持,Ruby刚刚开始在1.9版本中使用它。PCRE有一些有限的支持,主要像Java1.7

Java6支持Unicode块属性,如
\p{ingeralparcination}
\p{block=generalpuntuation}
,Java7支持Unicode脚本属性,如
\p{IsHiragana}
\p{script Hiragana}

但是,它仍然不支持任何接近的功能,包括近乎关键的功能,如
\p⁠{WhiteSpace}
\p{Dash}
\p{quote\u Mark}
,更不用说像
\p这样的其他两方了⁠{Line_Break=字母}
\p⁠{东亚宽度:窄}
\p⁠{Numeric_Value=1000}
,或
\p⁠⁠{年龄:5.2}

前一个集合是非常不可或缺的,尤其是考虑到对
\s
正常工作的支持不足,而后一个集合则是不可或缺的
Short Name  Long Name
¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
 \pL        \p{Letter}
   \p{Lu}   \p{Uppercase_Letter}
   \p{Ll}   \p{Lowercase_Letter}
   \p{Lt}   \p{Titlecase_Letter}
   \p{Lm}   \p{Modifier_Letter}
   \p{Lo}   \p{Other_Letter}

Short Name  Long Name
¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
 \pM       \p{Mark}
   \p{Mn}  \p{Nonspacing_Mark}
   \p{Mc}  \p{Spacing_Mark}
   \p{Me}  \p{Enclosing_Mark}

Short Name  Long Name
¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
 \pN       \p{Number}
   \p{Nd}  \p{Decimal_Number},\p{Digit}
   \p{Nl}  \p{Letter_Number}
   \p{No}  \p{Other_Number}

Short Name  Long Name
¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
 \pP       \p{Punctuation}, \p{Punct})
   \p{Pc}  \p{Connector_Punctuation}
   \p{Pd}  \p{Dash_Punctuation}
   \p{Ps}  \p{Open_Punctuation}
   \p{Pe}  \p{Close_Punctuation}
   \p{Pi}  \p{Initial_Punctuation}
   \p{Pf}  \p{Final_Punctuation}
   \p{Po}  \p{Other_Punctuation}

Short Name  Long Name
¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
 \pS       \p{Symbol}
   \p{Sm}  \p{Math_Symbol}
   \p{Sc}  \p{Currency_Symbol}
   \p{Sk}  \p{Modifier_Symbol}
   \p{So}  \p{Other_Symbol}

Short Name  Long Name
¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
 \pZ       \p{Separator}
   \p{Zs}  \p{Space_Separator}
   \p{Zl}  \p{Line_Separator}
   \p{Zp}  \p{Paragraph_Separator}

Short Name  Long Name
¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯¯
 \pC       \p{Other}
   \p{Cc}  \p{Control}, \p{Cntrl}
   \p{Cf}  \p{Format}
   \p{Cs}  \p{Surrogate}
   \p{Co}  \p{Private_Use}
   \p{Cn}  \p{Unassigned}
$ uninames face
 ፦  4966  1366  ETHIOPIC PREFACE COLON
 ⁙  8281  2059  FIVE DOT PUNCTUATION
        = Greek pentonkion
        = quincunx
        x (die face-5 - 2684)
 ∯  8751  222F  SURFACE INTEGRAL
        # 222E 222E
 ☹  9785  2639 WHITE FROWNING FACE
 ☺  9786  263A WHITE SMILING FACE
        = have a nice day!
 ☻  9787  263B BLACK SMILING FACE
 ⚀  9856  2680 DIE FACE-1
 ⚁  9857  2681 DIE FACE-2
 ⚂  9858  2682 DIE FACE-3
 ⚃  9859  2683 DIE FACE-4
 ⚄  9860  2684 DIE FACE-5
 ⚅  9861  2685 DIE FACE-6
 ⾯  12207 2FAF KANGXI RADICAL FACE
        # 9762
 〠  12320 3020 POSTAL MARK FACE
 龜  64206 FACE CJK COMPATIBILITY IDEOGRAPH-FACE
        : 9F9C
$ uniprops -va LF 85 Greek:Sigma INFINITY BOM U+3000 U+12345

U+000A ‹U+000A› \N{ LINE FEED (LF) }:
    \s \v \R \pC \p{Cc}
    \p{All} \p{Any} \p{ASCII} \p{Assigned} \p{C} \p{Other} \p{Cc} \p{Cntrl} \p{Common} \p{Zyyy} \p{Control} \p{Pat_WS} \p{Pattern_White_Space} \p{PatWS} \p{PerlSpace} \p{PosixCntrl} \p{PosixSpace} \p{Space} \p{SpacePerl} \p{VertSpace} \p{White_Space} \p{WSpace}
    \p{Age:1.1} \p{Block=Basic_Latin} \p{Bidi_Class:B} \p{Bidi_Class=Paragraph_Separator} \p{Bidi_Class:Paragraph_Separator} \p{Bc=B} \p{Block:ASCII} \p{Block:Basic_Latin} \p{Blk=ASCII} \p{Canonical_Combining_Class:0} \p{Canonical_Combining_Class=Not_Reordered}
       \p{Canonical_Combining_Class:Not_Reordered} \p{Ccc=NR} \p{Canonical_Combining_Class:NR} \p{Script=Common} \p{Decomposition_Type:None} \p{Dt=None} \p{East_Asian_Width=Neutral} \p{East_Asian_Width:Neutral} \p{Grapheme_Cluster_Break:LF} \p{GCB=LF} \p{Hangul_Syllable_Type:NA}
       \p{Hangul_Syllable_Type=Not_Applicable} \p{Hangul_Syllable_Type:Not_Applicable} \p{Hst=NA} \p{Joining_Group:No_Joining_Group} \p{Jg=NoJoiningGroup} \p{Joining_Type:Non_Joining} \p{Jt=U} \p{Joining_Type:U} \p{Joining_Type=Non_Joining} \p{Line_Break:LF} \p{Line_Break=Line_Feed}
       \p{Line_Break:Line_Feed} \p{Lb=LF} \p{Numeric_Type:None} \p{Nt=None} \p{Numeric_Value:NaN} \p{Nv=NaN} \p{Present_In:1.1} \p{Age=1.1} \p{In=1.1} \p{Present_In:2.0} \p{In=2.0} \p{Present_In:2.1} \p{In=2.1} \p{Present_In:3.0} \p{In=3.0} \p{Present_In:3.1} \p{In=3.1}
       \p{Present_In:3.2} \p{In=3.2} \p{Present_In:4.0} \p{In=4.0} \p{Present_In:4.1} \p{In=4.1} \p{Present_In:5.0} \p{In=5.0} \p{Present_In:5.1} \p{In=5.1} \p{Present_In:5.2} \p{In=5.2} \p{Script:Common} \p{Sc=Zyyy} \p{Script:Zyyy} \p{Sentence_Break:LF} \p{SB=LF} \p{Word_Break:LF}
       \p{WB=LF}

U+0085 ‹U+0085› \N{ NEXT LINE (NEL) }:
    \s \v \R \pC \p{Cc}
    \p{All} \p{Any} \p{Assigned} \p{InLatin1} \p{C} \p{Other} \p{Cc} \p{Cntrl} \p{Common} \p{Zyyy} \p{Control} \p{Pat_WS} \p{Pattern_White_Space} \p{PatWS} \p{Space} \p{SpacePerl} \p{VertSpace} \p{White_Space} \p{WSpace}
    \p{Age:1.1} \p{Bidi_Class:B} \p{Bidi_Class=Paragraph_Separator} \p{Bidi_Class:Paragraph_Separator} \p{Bc=B} \p{Block:Latin_1} \p{Block=Latin_1_Supplement} \p{Block:Latin_1_Supplement} \p{Blk=Latin1} \p{Canonical_Combining_Class:0} \p{Canonical_Combining_Class=Not_Reordered}
       \p{Canonical_Combining_Class:Not_Reordered} \p{Ccc=NR} \p{Canonical_Combining_Class:NR} \p{Script=Common} \p{Decomposition_Type:None} \p{Dt=None} \p{East_Asian_Width=Neutral} \p{East_Asian_Width:Neutral} \p{Grapheme_Cluster_Break:CN} \p{Grapheme_Cluster_Break=Control}
       \p{Grapheme_Cluster_Break:Control} \p{GCB=CN} \p{Hangul_Syllable_Type:NA} \p{Hangul_Syllable_Type=Not_Applicable} \p{Hangul_Syllable_Type:Not_Applicable} \p{Hst=NA} \p{Joining_Group:No_Joining_Group} \p{Jg=NoJoiningGroup} \p{Joining_Type:Non_Joining} \p{Jt=U}
       \p{Joining_Type:U} \p{Joining_Type=Non_Joining} \p{Line_Break:Next_Line} \p{Lb=NL} \p{Line_Break:NL} \p{Line_Break=Next_Line} \p{Numeric_Type:None} \p{Nt=None} \p{Numeric_Value:NaN} \p{Nv=NaN} \p{Present_In:1.1} \p{Age=1.1} \p{In=1.1} \p{Present_In:2.0} \p{In=2.0}
       \p{Present_In:2.1} \p{In=2.1} \p{Present_In:3.0} \p{In=3.0} \p{Present_In:3.1} \p{In=3.1} \p{Present_In:3.2} \p{In=3.2} \p{Present_In:4.0} \p{In=4.0} \p{Present_In:4.1} \p{In=4.1} \p{Present_In:5.0} \p{In=5.0} \p{Present_In:5.1} \p{In=5.1} \p{Present_In:5.2} \p{In=5.2}
       \p{Script:Common} \p{Sc=Zyyy} \p{Script:Zyyy} \p{Sentence_Break:SE} \p{Sentence_Break=Sep} \p{Sentence_Break:Sep} \p{SB=SE} \p{Word_Break:Newline} \p{WB=NL} \p{Word_Break:NL} \p{Word_Break=Newline}

U+03A3 ‹Σ› \N{ GREEK CAPITAL LETTER SIGMA }:
    \w \pL} \p{LC} \p{L_} \p{L&} \p{Lu}
    \p{All} \p{Any} \p{Alnum} \p{Alpha} \p{Alphabetic} \p{Assigned} \p{Greek} \p{Is_Greek} \p{InGreek} \p{Cased} \p{Cased_Letter} \p{LC} \p{Changes_When_Casefolded} \p{CWCF} \p{Changes_When_Casemapped} \p{CWCM} \p{Changes_When_Lowercased} \p{CWL} \p{Changes_When_NFKC_Casefolded}
       \p{CWKCF} \p{Lu} \p{L} \p{Gr_Base} \p{Grapheme_Base} \p{Graph} \p{GrBase} \p{Grek} \p{Greek_And_Coptic} \p{ID_Continue} \p{IDC} \p{ID_Start} \p{IDS} \p{Letter} \p{L_} \p{Uppercase_Letter} \p{Print} \p{Upper} \p{Uppercase} \p{Word} \p{XID_Continue} \p{XIDC} \p{XID_Start}
       \p{XIDS}
    \p{Age:1.1} \p{Bidi_Class:L} \p{Bidi_Class=Left_To_Right} \p{Bidi_Class:Left_To_Right} \p{Bc=L} \p{Block:Greek} \p{Block=Greek_And_Coptic} \p{Block:Greek_And_Coptic} \p{Blk=Greek} \p{Canonical_Combining_Class:0} \p{Canonical_Combining_Class=Not_Reordered}
       \p{Canonical_Combining_Class:Not_Reordered} \p{Ccc=NR} \p{Canonical_Combining_Class:NR} \p{Decomposition_Type:None} \p{Dt=None} \p{East_Asian_Width:A} \p{East_Asian_Width=Ambiguous} \p{East_Asian_Width:Ambiguous} \p{Ea=A} \p{Grapheme_Cluster_Break:Other} \p{GCB=XX}
       \p{Grapheme_Cluster_Break:XX} \p{Grapheme_Cluster_Break=Other} \p{Script=Greek} \p{Hangul_Syllable_Type:NA} \p{Hangul_Syllable_Type=Not_Applicable} \p{Hangul_Syllable_Type:Not_Applicable} \p{Hst=NA} \p{Joining_Group:No_Joining_Group} \p{Jg=NoJoiningGroup}
       \p{Joining_Type:Non_Joining} \p{Jt=U} \p{Joining_Type:U} \p{Joining_Type=Non_Joining} \p{Line_Break:AL} \p{Line_Break=Alphabetic} \p{Line_Break:Alphabetic} \p{Lb=AL} \p{Numeric_Type:None} \p{Nt=None} \p{Numeric_Value:NaN} \p{Nv=NaN} \p{Present_In:1.1} \p{Age=1.1} \p{In=1.1}
       \p{Present_In:2.0} \p{In=2.0} \p{Present_In:2.1} \p{In=2.1} \p{Present_In:3.0} \p{In=3.0} \p{Present_In:3.1} \p{In=3.1} \p{Present_In:3.2} \p{In=3.2} \p{Present_In:4.0} \p{In=4.0} \p{Present_In:4.1} \p{In=4.1} \p{Present_In:5.0} \p{In=5.0} \p{Present_In:5.1} \p{In=5.1}
       \p{Present_In:5.2} \p{In=5.2} \p{Script:Greek} \p{Sc=Grek} \p{Script:Grek} \p{Sentence_Break:UP} \p{Sentence_Break=Upper} \p{Sentence_Break:Upper} \p{SB=UP} \p{Word_Break:ALetter} \p{WB=LE} \p{Word_Break:LE} \p{Word_Break=ALetter}

U+221E ‹∞› \N{ INFINITY }:
    \pS \p{Sm}
    \p{All} \p{Any} \p{Assigned} \p{InMathematicalOperators} \p{Common} \p{Zyyy} \p{Sm} \p{S} \p{Gr_Base} \p{Grapheme_Base} \p{Graph} \p{GrBase} \p{Math} \p{Math_Symbol} \p{Pat_Syn} \p{Pattern_Syntax} \p{PatSyn} \p{Print} \p{Symbol}
    \p{Age:1.1} \p{Bidi_Class:ON} \p{Bidi_Class=Other_Neutral} \p{Bidi_Class:Other_Neutral} \p{Bc=ON} \p{Block:Mathematical_Operators} \p{Canonical_Combining_Class:0} \p{Canonical_Combining_Class=Not_Reordered} \p{Canonical_Combining_Class:Not_Reordered} \p{Ccc=NR}
       \p{Canonical_Combining_Class:NR} \p{Script=Common} \p{Decomposition_Type:None} \p{Dt=None} \p{East_Asian_Width:A} \p{East_Asian_Width=Ambiguous} \p{East_Asian_Width:Ambiguous} \p{Ea=A} \p{Grapheme_Cluster_Break:Other} \p{GCB=XX} \p{Grapheme_Cluster_Break:XX}
       \p{Grapheme_Cluster_Break=Other} \p{Hangul_Syllable_Type:NA} \p{Hangul_Syllable_Type=Not_Applicable} \p{Hangul_Syllable_Type:Not_Applicable} \p{Hst=NA} \p{Joining_Group:No_Joining_Group} \p{Jg=NoJoiningGroup} \p{Joining_Type:Non_Joining} \p{Jt=U} \p{Joining_Type:U}
       \p{Joining_Type=Non_Joining} \p{Line_Break:AI} \p{Line_Break=Ambiguous} \p{Line_Break:Ambiguous} \p{Lb=AI} \p{Numeric_Type:None} \p{Nt=None} \p{Numeric_Value:NaN} \p{Nv=NaN} \p{Present_In:1.1} \p{Age=1.1} \p{In=1.1} \p{Present_In:2.0} \p{In=2.0} \p{Present_In:2.1} \p{In=2.1}
       \p{Present_In:3.0} \p{In=3.0} \p{Present_In:3.1} \p{In=3.1} \p{Present_In:3.2} \p{In=3.2} \p{Present_In:4.0} \p{In=4.0} \p{Present_In:4.1} \p{In=4.1} \p{Present_In:5.0} \p{In=5.0} \p{Present_In:5.1} \p{In=5.1} \p{Present_In:5.2} \p{In=5.2} \p{Script:Common} \p{Sc=Zyyy}
       \p{Script:Zyyy} \p{Sentence_Break:Other} \p{SB=XX} \p{Sentence_Break:XX} \p{Sentence_Break=Other} \p{Word_Break:Other} \p{WB=XX} \p{Word_Break:XX} \p{Word_Break=Other}

U+FEFF ‹U+FEFF› \N{ ZERO WIDTH NO-BREAK SPACE }:
    \pC \p{Cf}
    \p{All} \p{Any} \p{Assigned} \p{InArabicPresentationFormsB} \p{C} \p{Other} \p{Case_Ignorable} \p{CI} \p{Cf} \p{Format} \p{Changes_When_NFKC_Casefolded} \p{CWKCF} \p{Common} \p{Zyyy} \p{Default_Ignorable_Code_Point} \p{DI} \p{Graph} \p{Print}
    \p{Age:1.1} \p{Bidi_Class:BN} \p{Bidi_Class=Boundary_Neutral} \p{Bidi_Class:Boundary_Neutral} \p{Bc=BN} \p{Block:Arabic_Presentation_Forms_B} \p{Canonical_Combining_Class:0} \p{Canonical_Combining_Class=Not_Reordered} \p{Canonical_Combining_Class:Not_Reordered} \p{Ccc=NR}
       \p{Canonical_Combining_Class:NR} \p{Script=Common} \p{Decomposition_Type:None} \p{Dt=None} \p{East_Asian_Width=Neutral} \p{East_Asian_Width:Neutral} \p{Grapheme_Cluster_Break:CN} \p{Grapheme_Cluster_Break=Control} \p{Grapheme_Cluster_Break:Control} \p{GCB=CN}
       \p{Hangul_Syllable_Type:NA} \p{Hangul_Syllable_Type=Not_Applicable} \p{Hangul_Syllable_Type:Not_Applicable} \p{Hst=NA} \p{Joining_Group:No_Joining_Group} \p{Jg=NoJoiningGroup} \p{Joining_Type:T} \p{Joining_Type=Transparent} \p{Joining_Type:Transparent} \p{Jt=T}
       \p{Line_Break:WJ} \p{Line_Break=Word_Joiner} \p{Line_Break:Word_Joiner} \p{Lb=WJ} \p{Numeric_Type:None} \p{Nt=None} \p{Numeric_Value:NaN} \p{Nv=NaN} \p{Present_In:1.1} \p{Age=1.1} \p{In=1.1} \p{Present_In:2.0} \p{In=2.0} \p{Present_In:2.1} \p{In=2.1} \p{Present_In:3.0}
       \p{In=3.0} \p{Present_In:3.1} \p{In=3.1} \p{Present_In:3.2} \p{In=3.2} \p{Present_In:4.0} \p{In=4.0} \p{Present_In:4.1} \p{In=4.1} \p{Present_In:5.0} \p{In=5.0} \p{Present_In:5.1} \p{In=5.1} \p{Present_In:5.2} \p{In=5.2} \p{Script:Common} \p{Sc=Zyyy} \p{Script:Zyyy}
       \p{Sentence_Break:FO} \p{Sentence_Break=Format} \p{Sentence_Break:Format} \p{SB=FO} \p{Word_Break:FO} \p{Word_Break=Format} \p{Word_Break:Format} \p{WB=FO}

U+3000 ‹U+3000› \N{ IDEOGRAPHIC SPACE }:
    \s \h \pZ \p{Zs}
    \p{All} \p{Any} \p{Assigned} \p{Blank} \p{InCJKSymbolsAndPunctuation} \p{Changes_When_NFKC_Casefolded} \p{CWKCF} \p{Common} \p{Zyyy} \p{Z} \p{Zs} \p{Gr_Base} \p{Grapheme_Base} \p{GrBase} \p{HorizSpace} \p{Print} \p{Separator} \p{Space} \p{Space_Separator} \p{SpacePerl}
       \p{White_Space} \p{WSpace}
    \p{Age:1.1} \p{Bidi_Class:White_Space} \p{Bc=WS} \p{Bidi_Class:WS} \p{Bidi_Class=White_Space} \p{Block:CJK_Symbols_And_Punctuation} \p{Canonical_Combining_Class:0} \p{Canonical_Combining_Class=Not_Reordered} \p{Canonical_Combining_Class:Not_Reordered} \p{Ccc=NR}
       \p{Canonical_Combining_Class:NR} \p{Script=Common} \p{Decomposition_Type:Non_Canon} \p{Decomposition_Type=Non_Canonical} \p{Decomposition_Type:Non_Canonical} \p{Dt=NonCanon} \p{Decomposition_Type:Wide} \p{Dt=Wide} \p{East_Asian_Width:F} \p{East_Asian_Width=Fullwidth}
       \p{East_Asian_Width:Fullwidth} \p{Ea=F} \p{Grapheme_Cluster_Break:Other} \p{GCB=XX} \p{Grapheme_Cluster_Break:XX} \p{Grapheme_Cluster_Break=Other} \p{Hangul_Syllable_Type:NA} \p{Hangul_Syllable_Type=Not_Applicable} \p{Hangul_Syllable_Type:Not_Applicable} \p{Hst=NA}
       \p{Joining_Group:No_Joining_Group} \p{Jg=NoJoiningGroup} \p{Joining_Type:Non_Joining} \p{Jt=U} \p{Joining_Type:U} \p{Joining_Type=Non_Joining} \p{Line_Break:ID} \p{Line_Break=Ideographic} \p{Line_Break:Ideographic} \p{Lb=ID} \p{Numeric_Type:None} \p{Nt=None}
       \p{Numeric_Value:NaN} \p{Nv=NaN} \p{Present_In:1.1} \p{Age=1.1} \p{In=1.1} \p{Present_In:2.0} \p{In=2.0} \p{Present_In:2.1} \p{In=2.1} \p{Present_In:3.0} \p{In=3.0} \p{Present_In:3.1} \p{In=3.1} \p{Present_In:3.2} \p{In=3.2} \p{Present_In:4.0} \p{In=4.0} \p{Present_In:4.1}
       \p{In=4.1} \p{Present_In:5.0} \p{In=5.0} \p{Present_In:5.1} \p{In=5.1} \p{Present_In:5.2} \p{In=5.2} \p{Script:Common} \p{Sc=Zyyy} \p{Script:Zyyy} \p{Sentence_Break:Sp} \p{SB=Sp} \p{Word_Break:Other} \p{WB=XX} \p{Word_Break:XX} \p{Word_Break=Other}

U+12345 ‹if you want to match "my_word"
you can use negative look behind 
?<!
and negative look ahead
?!

that will check the word is not preceded by a non word character and is not follow by a non word characters
new RegExp(`(?<![A-Za-zÀ-ÖØ-öø-ÿ])my_word(?![A-Za-zÀ-ÖØ-öø-ÿ])`, "gi");

the
-
is for interval in the ascii table. Here the Ascii table to check it is well what you need http://seamons.com/projects/js/ascii_table.html

As other anwsers have already pointed out, the JS regex engine does not consider "é" a word character. Since that's the case, and you want to match if that letter is followed by another non-word character, you can use the
\B
assertion
there:

> "Goal: Some Fancy Namé. Awesome.".replace(/\b(Fancy Namé|Namé)\B/i, '<a href="#">$1</a>');
'Goal: Some <a href="#">Fancy Namé</a>. Awesome.'