Ruby on rails 日志数据分析:数据库的选择
我从各种web应用程序中获取以下格式的日志数据:Ruby on rails 日志数据分析:数据库的选择,ruby-on-rails,mongodb,postgresql,mapreduce,relational-database,Ruby On Rails,Mongodb,Postgresql,Mapreduce,Relational Database,我从各种web应用程序中获取以下格式的日志数据: Session Timestamp Event Parameters 1 1 Started Session 1 2 Logged In Username:"user1" 2 3 Started Session 1 3
Session Timestamp Event Parameters
1 1 Started Session
1 2 Logged In Username:"user1"
2 3 Started Session
1 3 Started Challenge title:"Challenge 1", level:"2"
2 4 Logged In Username:"user2"
现在,有人想要对这个日志数据进行分析(并希望在适当的转换后将其作为JSON blob接收)。例如,他可能希望接收一个JSON blob,其中日志数据按会话
分组,并且在发送数据之前添加TimeFromSessionStart
和CountOfEvents
,以便他可以执行有意义的分析。在这里,我应该返回:
[
{
"session":1,"CountOfEvents":3,"Actions":[{"TimeFromSessionStart":0,"Event":"Session Started"}, {"TimeFromSessionStart":1, "Event":"Logged In", "Username":"user1"}, {"TimeFromSessionStart":2, "Event":"Startd Challenge", "title":"Challenge 1", "level":"2" }]
},
{
"session":2, "CountOfEvents":2,"Actions":[{"TimeFromSessionStart":0,"Event":"Session Started"}, {"TimeFromSessionStart":2, "Event":"Logged In", "Username":"user2"}]
}
]
在这里,TimeFromSessionStart
,CountOfEvents
等[我们称之为合成附加数据]将不会被硬编码,我将制作一个web界面,允许用户决定在JSON blob中需要什么样的合成数据。我想为这个人提供很大的灵活性,让他决定在JSON blob中需要什么样的合成数据
我希望数据库能够存储大约100万行,并在合理的时间内执行转换
我的问题是关于数据库的选择。使用SQL数据库(如PostgreSQL v/s)和NoSQL数据库(如MongoDB)的相对优势和劣势是什么。无论我读到什么,我认为NoSQL可能无法提供足够的灵活性来添加额外的合成数据。另一方面,如果使用SQL数据库,我可能会面临数据表示的灵活性问题
我认为MongoDB和PostgreSQL的存储需求是可比的,因为我必须在这两种情况下构建相似的索引(可能!)以加快查询速度
如果我使用PostgreSQL,我可以按以下方式存储数据:
会话
和事件
可以是字符串
,时间戳
可以是日期
,参数
可以是存储
(在PostgreSQL中提供键值对)。之后,我可以使用SQL查询来计算合成(或附加)数据,将其临时存储在Rails应用程序的变量中(该应用程序将与PostgreSQL数据库交互,并充当想要JSON blob的人的接口),然后从中创建JSON blob
另一种可能的方法是使用MongoDB存储日志数据,并将其用作Rails应用程序的接口,前提是我可以获得足够的灵活性,添加额外的合成数据进行分析,并在性能/存储方面比PostgreSQL有所改进。但是,在这种情况下,我不清楚在MongoDB中存储日志数据的最佳方式是什么。此外,我还了解到MongoDB将比PostgreSQL慢一些,主要是在后台运行
编辑:
无论我在过去几天读到什么,ApacheHadoop似乎也是一个不错的选择,因为它比MongoDB(多线程)速度更快
编辑:
我不是在征求意见,而是想知道使用特定方法的具体优点或缺点。因此,我不认为这个问题是基于观点的。MongoDB是一个理想的数据库
MongoDB是实现这一目标的理想数据库
MongoDB是实现这一目标的理想数据库
MongoDB是实现这一目标的理想数据库
您应该从elasticsearch中查看logstash/kibana。该堆栈的主要用例是收集、存储和分析日志数据
如果你想自己构建Mongo,那么Mongo也是一个不错的选择,但我认为你会发现elasticsearch的产品能够很好地解决你的需求并实现你所需的集成。你应该查看elasticsearch的logstash/kibana。该堆栈的主要用例是收集、存储和分析日志数据
如果你想自己构建Mongo,那么Mongo也是一个不错的选择,但我认为你会发现elasticsearch的产品能够很好地解决你的需求并实现你所需的集成。你应该查看elasticsearch的logstash/kibana。该堆栈的主要用例是c
require 'test_helper'
require 'json'
require 'pp'
class LogDataTest < ActiveSupport::TestCase
def setup
LogData.delete_all
@log_data_analysis_pipeline = [
{'$group' => {
'_id' => '$session',
'session' => {'$first' => '$session'},
'CountOfEvents' => {'$sum' => 1},
'timestamp0' => {'$first' => '$timestamp'},
'Actions' => {
'$push' => {
'timestamp' => '$timestamp',
'event' => '$event',
'parameters' => '$parameters'}}}},
{'$project' => {
'_id' => 0,
'session' => '$session',
'CountOfEvents' => '$CountOfEvents',
'Actions' => {
'$map' => { 'input' => "$Actions", 'as' => 'action',
'in' => {
'TimeFromSessionStart' => {
'$subtract' => ['$$action.timestamp', '$timestamp0']},
'event' => '$$action.event',
'parameters' => '$$action.parameters'
}}}}
}
]
@key_names = %w(session timestamp event parameters)
@log_data = <<-EOT.gsub(/^\s+/, '').split(/\n/)
1 1 Started Session
1 2 Logged In Username:"user1"
2 3 Started Session
1 3 Started Challenge title:"Challenge 1", level:"2"
2 4 Logged In Username:"user2"
EOT
docs = @log_data.collect{|line| line_to_doc(line)}
LogData.create(docs)
assert_equal(docs.size, LogData.count)
puts
end
def line_to_doc(line)
doc = Hash[*@key_names.zip(line.split(/ +/)).flatten]
doc['session'] = doc['session'].to_i
doc['timestamp'] = doc['timestamp'].to_i
doc['parameters'] = eval("{#{doc['parameters']}}") if doc['parameters']
doc
end
test "versions" do
puts "Mongoid version: #{Mongoid::VERSION}\nMoped version: #{Moped::VERSION}"
puts "MongoDB version: #{LogData.collection.database.command({:buildinfo => 1})['version']}"
end
test "log data analytics" do
pp LogData.all.to_a
result = LogData.collection.aggregate(@log_data_analysis_pipeline)
json = <<-EOT
[
{
"session":1,"CountOfEvents":3,"Actions":[{"TimeFromSessionStart":0,"Event":"Session Started"}, {"TimeFromSessionStart":1, "Event":"Logged In", "Username":"user1"}, {"TimeFromSessionStart":2, "Event":"Started Challenge", "title":"Challenge 1", "level":"2" }]
},
{
"session":2, "CountOfEvents":2,"Actions":[{"TimeFromSessionStart":0,"Event":"Session Started"}, {"TimeFromSessionStart":2, "Event":"Logged In", "Username":"user2"}]
}
]
EOT
puts JSON.pretty_generate(result)
end
test "explain" do
LogData.collection.indexes.create('parameters.Username' => 1)
pp LogData.collection.find({'parameters.Username' => 'user2'}).to_a
pp LogData.collection.find({'parameters.Username' => 'user2'}).explain['cursor']
end
end
class LogData
include Mongoid::Document
field :session, type: Integer
field :timestamp, type: Integer
field :event, type: String
field :parameters, type: Hash
end
Run options:
# Running tests:
[1/3] LogDataTest#test_explain
[{"_id"=>"537258257f11ba8f03000005",
"session"=>2,
"timestamp"=>4,
"event"=>"Logged In",
"parameters"=>{"Username"=>"user2"}}]
"BtreeCursor parameters.Username_1"
[2/3] LogDataTest#test_log_data_analytics
[#<LogData _id: 537258257f11ba8f03000006, session: 1, timestamp: 1, event: "Started Session", parameters: nil>,
#<LogData _id: 537258257f11ba8f03000007, session: 1, timestamp: 2, event: "Logged In", parameters: {"Username"=>"user1"}>,
#<LogData _id: 537258257f11ba8f03000008, session: 2, timestamp: 3, event: "Started Session", parameters: nil>,
#<LogData _id: 537258257f11ba8f03000009, session: 1, timestamp: 3, event: "Started Challenge", parameters: {"title"=>"Challenge 1", "level"=>"2"}>,
#<LogData _id: 537258257f11ba8f0300000a, session: 2, timestamp: 4, event: "Logged In", parameters: {"Username"=>"user2"}>]
[
{
"session": 2,
"CountOfEvents": 2,
"Actions": [
{
"TimeFromSessionStart": 0,
"event": "Started Session",
"parameters": null
},
{
"TimeFromSessionStart": 1,
"event": "Logged In",
"parameters": {
"Username": "user2"
}
}
]
},
{
"session": 1,
"CountOfEvents": 3,
"Actions": [
{
"TimeFromSessionStart": 0,
"event": "Started Session",
"parameters": null
},
{
"TimeFromSessionStart": 1,
"event": "Logged In",
"parameters": {
"Username": "user1"
}
},
{
"TimeFromSessionStart": 2,
"event": "Started Challenge",
"parameters": {
"title": "Challenge 1",
"level": "2"
}
}
]
}
]
[3/3] LogDataTest#test_versions
Mongoid version: 3.1.6
Moped version: 1.5.2
MongoDB version: 2.6.1
Finished tests in 0.083465s, 35.9432 tests/s, 35.9432 assertions/s.
3 tests, 3 assertions, 0 failures, 0 errors, 0 skips