elasticsearch 对数存储输出性能,elasticsearch,logstash,bigdata,elasticsearch,Logstash,Bigdata" /> elasticsearch 对数存储输出性能,elasticsearch,logstash,bigdata,elasticsearch,Logstash,Bigdata" />

elasticsearch 对数存储输出性能

elasticsearch 对数存储输出性能,elasticsearch,logstash,bigdata,elasticsearch,Logstash,Bigdata,我使用的是Elasticsearch 5.1.1、logstash 5.1.1,我在2小时内通过logstash将300万行从sqlserver导入elastic input { jdbc { jdbc_driver_library => "D:\Usefull_Jars\sqljdbc4-4.0.jar" jdbc_driver_class => "com.microsoft.sqlserver.jdbc.SQLServerDriver" jdbc_connection_strin

我使用的是Elasticsearch 5.1.1、logstash 5.1.1,我在2小时内通过logstash将300万行从sqlserver导入elastic

input {
jdbc {
jdbc_driver_library => "D:\Usefull_Jars\sqljdbc4-4.0.jar"
jdbc_driver_class => "com.microsoft.sqlserver.jdbc.SQLServerDriver"
jdbc_connection_string => "jdbc:sqlserver://192.168.5.14:1433;databaseName=DataSource;integratedSecurity=false;user=****;password=****;"
jdbc_user => "****"
jdbc_password => "****"
statement => "SELECT * FROM RawData"
jdbc_fetch_size => 1000
}
}
output {
elasticsearch {
hosts => "localhost"
index => "testdata"
document_type => "testfeed"
document_id => "%{id}"
flush_size => 512
}
}
我有一台带有4GB内存的单windows计算机,core I 3):是否应该添加任何其他配置以加快导入

我试图通过更改logstash.yml设置

但这并不影响

日志存储配置

input {
jdbc {
jdbc_driver_library => "D:\Usefull_Jars\sqljdbc4-4.0.jar"
jdbc_driver_class => "com.microsoft.sqlserver.jdbc.SQLServerDriver"
jdbc_connection_string => "jdbc:sqlserver://192.168.5.14:1433;databaseName=DataSource;integratedSecurity=false;user=****;password=****;"
jdbc_user => "****"
jdbc_password => "****"
statement => "SELECT * FROM RawData"
jdbc_fetch_size => 1000
}
}
output {
elasticsearch {
hosts => "localhost"
index => "testdata"
document_type => "testfeed"
document_id => "%{id}"
flush_size => 512
}
}
logstash.yml

pipeline:
     batch:
       size: 125
       delay: 2
 #
 # Or as flat keys:
 #   pipeline.batch.size: 125
 #   pipeline.batch.delay: 5
 # ------------ Pipeline Settings --------------
 # Set the number of workers that will, in parallel, execute the filters+outputs
 # stage of the pipeline.
 # This defaults to the number of the host's CPU cores.
 pipeline.workers: 5
 # How many workers should be used per output plugin instance
 pipeline.output.workers: 5
 # How many events to retrieve from inputs before sending to filters+workers
 pipeline.batch.size: 125
 # How long to wait before dispatching an undersized batch to filters+workers
 # Value is in milliseconds.
 # pipeline.batch.delay: 5
 # ------------ Queuing Settings --------------
 #
 # Internal queuing model, "memory" for legacy in-memory based queuing and
 # "persisted" for disk-based acked queueing. Defaults is memory
 #
 # queue.type: memory
 #
 # If using queue.type: persisted, the directory path where the data files will be stored.
 # Default is path.data/queue
 #
 # path.queue:
 #
 # If using queue.type: persisted, the page data files size. The queue data consists of
 # append-only data files separated into pages. Default is 250mb
 #
 # queue.page_capacity: 250mb
 #
 # If using queue.type: persisted, the maximum number of unread events in the queue.
 # Default is 0 (unlimited)
 #
 # queue.max_events: 0
 #
 # If using queue.type: persisted, the total capacity of the queue in number of bytes.
 # If you would like more unacked events to be buffered in Logstash, you can increase the 
 # capacity using this setting. Please make sure your disk drive has capacity greater than 
 # the size specified here. If both max_bytes and max_events are specified, Logstash will pick 
 # whichever criteria is reached first
 # Default is 1024mb or 1gb
 #
 # queue.max_bytes: 1024mb
 #
 # If using queue.type: persisted, the maximum number of acked events before forcing a checkpoint
 # Default is 1024, 0 for unlimited
 #
 # queue.checkpoint.acks: 1024
 #
 # If using queue.type: persisted, the maximum number of written events before forcing a checkpoint
 # Default is 1024, 0 for unlimited
 #
 # queue.checkpoint.writes: 1024
 #
 # If using queue.type: persisted, the interval in milliseconds when a checkpoint is forced on the head page
 # Default is 1000, 0 for no periodic checkpoint.
 #
 # queue.checkpoint.interval: 1000

提前感谢……

有多少工作线程在运行logstash??pipeline.workers:logstash.yml中有5个工作线程可以发布logstash.yml吗?我有一些建议。减少jdbc_fetch_size=>300和pipeline.workers:8和pipeline.output.workers:8,并在conf文件夹jvm.options文件中增加logstash的内存,并将-Xms256m-Xmx1g替换为-Xms512m-Xmx2g