使用Logstash和Xpath提取XML数据

使用Logstash和Xpath提取XML数据,logstash,Logstash,我有以下示例XML数据: <root> <actors> <actor id="1" name="Christian Bale"></actor> <actor id="2" name="Liam Neeson"></actor> <actor id="3" nam

我有以下示例XML数据:

<root>
    <actors>
        <actor id="1" name="Christian Bale"></actor>
        <actor id="2" name="Liam Neeson"></actor>
        <actor id="3" name="Michael Caine"></actor>
    </actors>   
</root>

但我需要的是
actor
索引,该索引定义为基于
actor
创建的列,即
id
name

这是我运行配置时的日志:

使用捆绑的JDK:“OpenJDK 64位服务器VM警告:选项” UseConcMarkSweepGC在9.0版中已被弃用,可能会 已在未来版本中删除。警告:非法反射访问 操作已发生警告:用户非法进行反射访问 org.jruby.ext.openssl.SecurityHelper (文件:/C:/Users/CHEEWE~1.NGA/AppData/Local/Temp/jruby-11656/jruby5503754749915308062jopenssl.jar) 请参阅JavaSaleIt.MeasGeigIGest.Pvices警告:请考虑 将此报告给的维护人员 org.jruby.ext.openssl.SecurityHelper警告:使用 --非法访问=警告以启用进一步非法反射访问操作警告:所有非法访问操作都将被禁用 在将来的版本中将Logstash日志发送到D:/Logstash/logs时被拒绝 现在通过log4j2.properties配置 [2020-12-07T17:54:43527][INFO][logstash.runner]开始 Logstash{“Logstash.version”=>“7.10.0”,“jruby.version”=>“jruby 9.2.13.0(2.5.7)2020-08-03 9a89c94bcc OpenJDK 64位服务器VM 11.0.8+10 on 11.0.8+10+indy+jit[mswin32-x86_64]“}[2020-12-07T17:54:43843][WARN][logstash.config.source.multilocal] 忽略“pipelines.yml”文件,因为模块或命令行 指定了选项[2020-12-07T17:54:45899][INFO ][org.reflections.reflections]反射扫描1个URL需要43毫秒, 生成23个键和47个值[2020-12-07T17:54:47229][INFO ][logstash.outputs.elasticsearch][main]elasticsearch池URL 更新{:更改=>{:删除=>[],:添加=>[http://localhost:9200/]}} [2020-12-07T17:54:47482][WARN][logstash.outputs.elasticsearch][main] 已还原到ES实例的连接{:url=>“http://localhost:9200/"} [2020-12-07T17:54:47544][INFO][logstash.outputs.elasticsearch][main] 已确定ES输出版本{:ES_版本=>7} [2020-12-07T17:54:47551][WARN][logstash.outputs.elasticsearch][main] 检测到6.x及以上群集:
类型
事件字段将不被使用 要确定文档类型{:es\u version=>7} [2020-12-07T17:54:47618][INFO][logstash.outputs.elasticsearch][main] 新的Elasticsearch输出{:class=>“LogStash::Outputs::Elasticsearch”, :hosts=>[“//localhost:9200”]}[2020-12-07T17:54:47689][INFO ][logstash.outputs.elasticsearch][main]使用默认映射 模板{:es_版本=>7,:ecs_兼容性=>:disabled} [2020-12-07T17:54:47786][INFO][logstash.outputs.elasticsearch][main] 正在尝试安装模板 {:manage_template=>{“index_patterns”=>“logstash-”,“version”=>60001, “设置”=>{“索引.刷新间隔”=>“5s”,“碎片数”=>1, “index.lifecycle.name”=>“日志存储策略”, “index.lifecycle.rollover_alias”=>“logstash”}, “映射”=>{“动态模板”=>[{“消息”字段”=>{“路径匹配”=>“消息”, “匹配映射类型”=>“字符串”,“映射”=>{“类型”=>“文本”, “norms”=>false}}},{“string_字段”=>{“匹配”=>”, “匹配映射类型”=>“字符串”,“映射”=>{“类型”=>“文本”, “规范”=>false,“字段”=>{“关键字”=>{“类型”=>“关键字”, “忽略上面的”=>256}], “属性”=>{“@timestamp”=>{“类型”=>“日期”}, “@version”=>{“type”=>“keyword”},“geoip”=>{“dynamic”=>true, “属性”=>{“ip”=>{“类型”=>“ip”}, “位置”=>{“类型”=>“地理点”},“纬度”=>{“类型”=>“半浮点数”}, “经度”=>{“类型”=>“半浮”} [2020-12-07T17:54:47846][INFO][logstash.outputs.elasticsearch][main] 创建滚动别名 [2020-12-07T17:54:47964][INFO][logstash.javapipeline][main] 正在启动管道{:pipeline_id=>“main”,“pipeline.workers”=>8, “pipeline.batch.size”=>125,“pipeline.batch.delay”=>50, “管道最大飞行时间”=>1000, “pipeline.sources”=>[“D:/logstash/bin/logstash simple.conf”], :thread=>“#”}[2020-12-07T17:54:49256][INFO ][logstash.javapipeline][main]管道Java执行 初始化时间{“秒”=>1.29}[2020-12-07T17:54:49347][INFO ][logstash.javapipeline][main]管道已启动 {“pipeline.id”=>“main”}stdin插件现在正在等待输入: [2020-12-07T17:54:49446][INFO][logstash.agent]管道 正在运行{:count=>1,:正在运行_管道=>[:main], :非运行管道=>[]}[2020-12-07T17:54:49757][INFO ][logstash.agent]已成功启动logstash API 端点{:端口=>9600}


如果elasticsearch和logstash都在运行最新版本,则默认情况下启用ILM。在这种情况下,将忽略索引选项的值,默认索引名称为logstash-{now/d}-00001。如果要使用索引选项设置索引名称,请将ILM_enabled选项设置为false

input
{
    file
        {
            path => "D:/data.xml"
            start_position => "beginning"
            sincedb_path => "NUL"
            exclude => "*.gz"
            type => "xml"
            codec => multiline {
                    pattern => "<?xml " 
                    negate => "true"
                    what => "previous"
                }
        }
}

filter {

    xml{
        source => "message"
        store_xml => true target => "id"
        target => "root"
        xpath => [
            "/root/actors/actor/text()", "actor"            
        ]
    }    
}

output{

elasticsearch{
        hosts => ["http://localhost:9200/"]
        index => "actor"
    }

    stdout
    {
        codec => rubydebug
    }
}