Warning: file_get_contents(/data/phpspider/zhask/data//catemap/3/html/87.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Javascript 如何修改Elasticsearch文档的_源字段_Javascript_Html_<img Src="//i.stack.imgur.com/RUiNP.png" Height="16" Width="18" Alt="" Class="sponsor Tag Img">elasticsearch - Fatal编程技术网 elasticsearch,Javascript,Html,elasticsearch" /> elasticsearch,Javascript,Html,elasticsearch" />

Javascript 如何修改Elasticsearch文档的_源字段

Javascript 如何修改Elasticsearch文档的_源字段,javascript,html,elasticsearch,Javascript,Html,elasticsearch,问题:有没有办法从文档的源中清除html? html的剥离可以是周期性的、触发的,或者理想情况下是在索引时进行的 我将数据输入到elasticsearch中,并根据一个分析器进行索引,该分析器在索引之前剥离不需要的htmls标记 在查询时/获取其_source字段,该字段包含返回给客户端的htmls的原始内容 注: 在传递给elasticsearch之前,我无法清理数据,我无法控制这一点 我的客户端从elasticsearch检索数据时,可以在呈现数据之前使用javascript进行剥离,但也

问题:有没有办法从文档的源中清除html? html的剥离可以是周期性的、触发的,或者理想情况下是在索引时进行的

我将数据输入到elasticsearch中,并根据一个分析器进行索引,该分析器在索引之前剥离不需要的htmls标记

在查询时/获取其_source字段,该字段包含返回给客户端的htmls的原始内容

注:

  • 在传递给elasticsearch之前,我无法清理数据,我无法控制这一点
  • 我的客户端从elasticsearch检索数据时,可以在呈现数据之前使用javascript进行剥离,但也不是一个选项

\u源代码只存储已编制索引的JSON。你不能改变它

现在,如果您想在索引和按原样存储内容之前完全去掉html,可以使用mapper附件插件,在该插件中定义映射时,可以将内容类型分类为“html”

mapper附件在很多方面都很有用,特别是在处理多个文档类型时,但最值得注意的是——我认为仅使用它剥离html标记就足够了(这是html_strip char过滤器无法做到的)

不过,这只是一个预先警告——不会存储任何html标记。因此,如果您确实需要这些标记,我建议您定义另一个字段来存储原始内容。另一个注意事项:不能为映射器附件文档指定多字段,因此需要将其存储在映射器附件文档之外。请参见下面的工作示例

您需要生成以下映射:

{
  "html5-es" : {
    "aliases" : { },
    "mappings" : {
      "document" : {
        "properties" : {
          "delete" : {
            "type" : "boolean"
          },
          "file" : {
            "type" : "attachment",
            "fields" : {
              "content" : {
                "type" : "string",
                "store" : true,
                "term_vector" : "with_positions_offsets",
                "analyzer" : "autocomplete"
              },
              "author" : {
                "type" : "string",
                "store" : true,
                "term_vector" : "with_positions_offsets"
              },
              "title" : {
                "type" : "string",
                "store" : true,
                "term_vector" : "with_positions_offsets",
                "analyzer" : "autocomplete"
              },
              "name" : {
                "type" : "string"
              },
              "date" : {
                "type" : "date",
               "format" : "strict_date_optional_time||epoch_millis"
              },
              "keywords" : {
                "type" : "string"
              },
              "content_type" : {
                "type" : "string"
              },
          "content_length" : {
                "type" : "integer"
              },
              "language" : {
                "type" : "string"
              }
            }
          },
          "hash_id" : {
            "type" : "string"
          },
          "path" : {
            "type" : "string"
          },
          "raw_content" : {
            "type" : "string",
            "store" : true,
            "term_vector" : "with_positions_offsets",
            "analyzer" : "raw"
          },
          "title" : {
            "type" : "string"
          }
        }
      }
    },
    "settings" : { //insert your own settings here },
    "warmers" : { }
  }
}
这样,在NEST中,我将按如下方式组装内容:

Attachment attachment = new Attachment();
attachment.Content =   Convert.ToBase64String(File.ReadAllBytes("path/to/document"));
attachment.ContentType = "html";

Document document = new Document();
document.File = attachment;
document.RawContent = InsertRawContentFromString(originalText);
我已经对此进行了测试,结果如下:

"file": {
    "_content": "PGh0bWwgeG1sbnM6TWFkQ2FwPSJodHRwOi8vd3d3Lm1hZGNhcHNvZnR3YXJlLmNvbS9TY2hlbWFzL01hZENhcC54c2QiPg0KICA8aGVhZCAvPg0KICA8Ym9keT4NCiAgICA8aDE+VG9waWMxMDwvaDE+DQogICAgPHA+RGVsZXRlIHRoaXMgdGV4dCBhbmQgcmVwbGFjZSBpdCB3aXRoIHlvdXIgb3duIGNvbnRlbnQuIENoZWNrIHlvdXIgbWFpbGJveC48L3A+DQogICAgPHA+wqA8L3A+DQogICAgPHA+YXNkZjwvcD4NCiAgICA8cD7CoDwvcD4NCiAgICA8cD4xMDwvcD4NCiAgICA8cD7CoDwvcD4NCiAgICA8cD5MYXZlbmRlci48L3A+DQogICAgPHA+wqA8L3A+DQogICAgPHA+MTAvNiAxMjowMzwvcD4NCiAgICA8cD7CoDwvcD4NCiAgICA8cD41IDA5PC9wPg0KICAgIDxwPsKgPC9wPg0KICAgIDxwPjExIDQ3PC9wPg0KICAgIDxwPsKgPC9wPg0KICAgIDxwPkhhbGxvd2VlbiBpcyBpbiBPY3RvYmVyLjwvcD4NCiAgICA8cD7CoDwvcD4NCiAgICA8cD5qb2c8L3A+DQogIDwvYm9keT4NCjwvaHRtbD4=",
    "_content_length": 0,
    "_content_type": "html",
    "_date": "0001-01-01T00:00:00",
    "_title": "Topic10"
},
"delete": false,
"raw_content": "<h1>Topic10</h1><p>Delete this text and replace it with your own content. Check your mailbox.</p><p> </p><p>asdf</p><p> </p><p>10</p><p> </p><p>Lavender.</p><p> </p><p>10/6 12:03</p><p> </p><p>5 09</p><p> </p><p>11 47</p><p> </p><p>Halloween is in October.</p><p> </p><p>jog</p>"
},
"highlight": {
"file.content": [
    "\n    <em>Topic10</em>\n\n    Delete this text and replace it with your own content. Check your mailbox.\n\n     \n\n    asdf\n\n     \n\n    10\n\n     \n\n    Lavender.\n\n     \n\n    10/6 12:03\n\n     \n\n    5 09\n\n     \n\n    11 47\n\n     \n\n    Halloween is in October.\n\n     \n\n    jog\n\n  "
    ]
}
“文件”:{
“_内容”:"PGH0BWWGEG1SBNM6TWFKQ2FWPSJODHRWOI8VD3D3LM1HZGNCHNVZNR3YXJLLMNVBS9TY2HLBWFZL01HZENHCC54C2QIPG0KICA8AGVHZCAV0KICA8YM9KET4 NCIAGICA8ADE+VG9WAWMXMDWADE+DQOGICAPHA+RGVSZXRIHROAXMGDGV4DB4CBMQGCMVWBGFJZSBPDCB3AxROIVDVD7GBIGNUGNUYENORIWZCAVZIBIK8GBIK8A+QOGIK8VCCAD8VCD8VCD8VCD8VCD8VCD8VCD8VCD8YZJJJJJJJ7VCD8CD8D4NCAGICA8CD7CODWVCD4NCAGICA8CD5MYXZLBMRLCI48L3A+DQogICAgPHA+wqA8L3A+DQogICAgPHA+MtavniaxmJowmzWVCD4NCAGICA8CD7CODwVCD4NCAGICA8CD41Ida5pC9WPG0kicagidxWPjexidQ9WPG0kicagidxWPC9WPgKicagidxWPC9WPgKicagidxWPC9WPgKicagidxWPC9WPG9WPgKicagidxWPKgIdxWPC9WPC9WPgKicagidxWPKgIdxHbHbHbHbHbHbHbHbgHbGxVD2VLBibPyBbPyBbPyBbPyByByByByByByByB,
“\u内容\u长度”:0,
“\u内容\u类型”:“html”,
“_日期”:“0001-01-01T00:00:00”,
标题:“主题10”
},
“删除”:false,
“原始内容”:“主题10删除此文本,并将其替换为您自己的内容。检查你的邮箱。

<10/6 12:03< }, “亮点”:{ “file.content”:[ “\n Topic10\n\n删除此文本并替换为您自己的内容。检查您的邮箱。\n\n\n\n asdf\n\n\n\n 10\n\n\n\n薰衣草。\n\n\n\n 10/6 12:03\n\n\n\n\n 5 09\n\n\n\n\n 11 47\n\n\n\n万圣节在十月。\n\n\n\n\n\n\n\n\n\n慢跑\n\n” ] }
使用
transform
您可以编辑/更改
\u源代码
:。但请注意,从2.0.0开始,它已被弃用,无需更换。感谢@andrei stefan,理想情况下不是通过弃用的方式。另请注意,根据
transform
文档,“结果未存储在源代码中。”“。您无法更改源。”。_源包含您输入elasticsearch的数据。您定义的所有映射都是在输入json对象上设置的,但它们不会修改源代码。