Warning: file_get_contents(/data/phpspider/zhask/data//catemap/5/ruby/20.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Ruby 哈希键问题的子字符串?_Ruby_Hash_Substring - Fatal编程技术网

Ruby 哈希键问题的子字符串?

Ruby 哈希键问题的子字符串?,ruby,hash,substring,Ruby,Hash,Substring,我有一个日志文件,需要为记录中的每个URL创建一个哈希键。记录中的每一行都被放入一个数组中,我在数组中循环分配哈希键 我需要从中得到: "2010/08/23 15:25:35 [error]: (4: No such file or directory), clent: 80.154.42.54, server: localhost, request: "GET /logschecks/scripts/setup1.php HTTP/1.1", host: "www.example.com"

我有一个日志文件,需要为记录中的每个URL创建一个哈希键。记录中的每一行都被放入一个数组中,我在数组中循环分配哈希键

我需要从中得到:

"2010/08/23 15:25:35 [error]: (4: No such file or directory), clent: 80.154.42.54, server: localhost, request: "GET /logschecks/scripts/setup1.php HTTP/1.1", host: "www.example.com" 
为此:

"/logschecks/scripts/setup1.php"
我试过使用
match
scan
split
但是它们都没能让我找到需要去的地方

我的方法当前看起来像:

def pathHistogram (rowsInFile)
  i = 0
  urlHash = Hash.new

  while i <= rowsInFile.length - 1

    urlKey = rowsInFile[i].scan(/<"GET ">/).last.first

    if urlHash.has_key?(urlKey) == true
      #get the number of stars already in there and add one. 
      urlHash[urlKey] = urlHash[urlKey] + '*'
      i = i + 1

    else 

      urlHash[urlKey] = '*'

      i = i + 1

    end
  end
end
def路径直方图(rowsInFile)
i=0
urlHash=Hash.new

而我假设每个输入行都包含
/logschecks/…

x = "2010/08/23 15:25:35 [error]: (4: No such file or directory), clent: 80.154.42.54, server: localhost, request: \"GET /logschecks/scripts/setup1.php HTTP/1.1\", host: \"www.example.com\""


x[%r(/logscheck[/\w\.]+)] # => "/logschecks/scripts/setup1.php"

假设每个输入行都包含
/logschecks/…

x = "2010/08/23 15:25:35 [error]: (4: No such file or directory), clent: 80.154.42.54, server: localhost, request: \"GET /logschecks/scripts/setup1.php HTTP/1.1\", host: \"www.example.com\""


x[%r(/logscheck[/\w\.]+)] # => "/logschecks/scripts/setup1.php"

您在对另一个答案的注释中指出,模式基本上是“GET…HTTP”
,您对
部分感兴趣。这可以很容易地提取:

line = '2010/08/23 15:25:35 [error]: (4: No such file or directory), clent: 80.154.42.54, server: localhost, request: "GET /logschecks/scripts/setup1.php HTTP/1.1", host: "www.example.com"'

line[/"GET (.*?) HTTP/, 1]
# => "/logschecks/scripts/setup1.php"

您在对另一个答案的注释中指出,模式基本上是“获取…”。。。HTTP,其中您对
部分感兴趣。可以非常容易地提取:

line = '2010/08/23 15:25:35 [error]: (4: No such file or directory), clent: 80.154.42.54, server: localhost, request: "GET /logschecks/scripts/setup1.php HTTP/1.1", host: "www.example.com"'

line[/"GET (.*?) HTTP/, 1]
# => "/logschecks/scripts/setup1.php"

扫描HTTP日志并不困难,但如何扫描取决于格式。在您提供的示例中,它比标准日志更容易,因为您可以查找一些地标:

  • 使用类似于:

    /request: "\S+ (\S+)/i
    
    /request: "(\S+) (\S+)/i
    
    该模式将跳过
    GET
    POST
    HEAD
    或请求使用的任何方法

    log_line[/request: "\S+ (\S+)/i, 1] # => "/logschecks/scripts/setup1.php"
    
    如果您正在挖掘日志,您可能想知道这一点。那样的话

  • 使用类似于:

    /request: "\S+ (\S+)/i
    
    /request: "(\S+) (\S+)/i
    
    你会像这样使用它:

    method, url = log_line.match(/request: "(\S+) (\S+)/i).captures # => ["GET", "/logschecks/scripts/setup1.php"]
    method # => "GET"
    url # => "/logschecks/scripts/setup1.php"
    
  • 您还可以,然后将其拆分以获取零件:

    /request: "([^"]+)"/i
    
    例如:

    log_line = %[2010/08/23 15:25:35 [error]: (4: No such file or directory), clent: 80.154.42.54, server: localhost, request: "GET /logschecks/scripts/setup1.php HTTP/1.1", host: "www.example.com"]
    method, url, http_ver = log_line[/request: "([^"]+)"/i, 1].split # => ["GET", "/logschecks/scripts/setup1.php", "HTTP/1.1"]
    method # => "GET"
    url # => "/logschecks/scripts/setup1.php"
    http_ver # => "HTTP/1.1"
    
  • 或者,使用并减少代码:

    log_line = %[2010/08/23 15:25:35 [error]: (4: No such file or directory), clent: 80.154.42.54, server: localhost, request: "GET /logschecks/scripts/setup1.php HTTP/1.1", host: "www.example.com"]
    /request: "(?<method>\S+) (?<url>\S+) (?<http_ver>\S+)"/i =~ log_line
    method # => "GET"
    url # => "/logschecks/scripts/setup1.php"
    http_ver # => "HTTP/1.1"
    
    log_line=%[2010/08/23 15:25:35[错误]:(4:没有这样的文件或目录),clent:80.154.42.54,服务器:localhost,请求:“GET/logschecks/scripts/setup1.php HTTP/1.1”,主机:“www.example.com”]
    /请求:“(?\S+)(\S+)(\S+)”/i=~log\u行
    方法#=>“获取”
    url#=>“/logschecks/scripts/setup1.php”
    http_ver#=>“http/1.1”
    

扫描HTTP日志并不困难,但如何扫描取决于格式。在您提供的示例中,它比标准日志更容易,因为您可以查找一些地标:

  • 使用类似于:

    /request: "\S+ (\S+)/i
    
    /request: "(\S+) (\S+)/i
    
    该模式将跳过
    GET
    POST
    HEAD
    或请求使用的任何方法

    log_line[/request: "\S+ (\S+)/i, 1] # => "/logschecks/scripts/setup1.php"
    
    如果您正在挖掘日志,您可能想知道这一点。那样的话

  • 使用类似于:

    /request: "\S+ (\S+)/i
    
    /request: "(\S+) (\S+)/i
    
    你会像这样使用它:

    method, url = log_line.match(/request: "(\S+) (\S+)/i).captures # => ["GET", "/logschecks/scripts/setup1.php"]
    method # => "GET"
    url # => "/logschecks/scripts/setup1.php"
    
  • 您还可以,然后将其拆分以获取零件:

    /request: "([^"]+)"/i
    
    例如:

    log_line = %[2010/08/23 15:25:35 [error]: (4: No such file or directory), clent: 80.154.42.54, server: localhost, request: "GET /logschecks/scripts/setup1.php HTTP/1.1", host: "www.example.com"]
    method, url, http_ver = log_line[/request: "([^"]+)"/i, 1].split # => ["GET", "/logschecks/scripts/setup1.php", "HTTP/1.1"]
    method # => "GET"
    url # => "/logschecks/scripts/setup1.php"
    http_ver # => "HTTP/1.1"
    
  • 或者,使用并减少代码:

    log_line = %[2010/08/23 15:25:35 [error]: (4: No such file or directory), clent: 80.154.42.54, server: localhost, request: "GET /logschecks/scripts/setup1.php HTTP/1.1", host: "www.example.com"]
    /request: "(?<method>\S+) (?<url>\S+) (?<http_ver>\S+)"/i =~ log_line
    method # => "GET"
    url # => "/logschecks/scripts/setup1.php"
    http_ver # => "HTTP/1.1"
    
    log_line=%[2010/08/23 15:25:35[错误]:(4:没有这样的文件或目录),clent:80.154.42.54,服务器:localhost,请求:“GET/logschecks/scripts/setup1.php HTTP/1.1”,主机:“www.example.com”]
    /请求:“(?\S+)(\S+)(\S+)”/i=~log\u行
    方法#=>“获取”
    url#=>“/logschecks/scripts/setup1.php”
    http_ver#=>“http/1.1”
    

我们需要看看您尝试了什么。实际上,您已经向我们提供了收到的错误,但没有代码可供处理。您是如何使用
match
scan
split
的?你的散列应该是什么样子的?用更多信息编辑。谢谢。我想你需要在输入行周围加上单引号,而在开始时不需要(不匹配的)双引号。卡里,我不确定我是否明白你的意思。输入行是指GET吗?是否只需要GET请求?或者,你还需要捕捉头部、发帖和其他动作吗?我们需要看看你尝试了什么。实际上,您已经向我们提供了收到的错误,但没有代码可供处理。您是如何使用
match
scan
split
的?你的散列应该是什么样子的?用更多信息编辑。谢谢。我想你需要在输入行周围加上单引号,而在开始时不需要(不匹配的)双引号。卡里,我不确定我是否明白你的意思。输入行是指GET吗?是否只需要GET请求?或者,你是否也需要捕捉头部、发帖和其他动作。并非所有内容都包含“/logschecks”,但它们确实都在URL前面包含“GET”,后面包含“HHTP”。将代码更改为[%r(GET[/\w\.]+])会导致一大堆空行。当然,并非所有内容都包含“/logschecks”,但它们确实都在URL前面包含“GET”,后面包含“HHTP”。将代码更改为[%r(GET[/\w\.]+)]会导致一堆空行。这就是解决方案,谢谢。一个问题,为什么是“?”?我知道。*表示任意数量的字符,但是?做再次感谢。@joerdie它将使匹配的非贪婪(或;懒惰),也就是说,它将在遇到HTTP时立即停止,而不是试图使HTTP成为
*
的一部分。例如,如果行是“GET/some/path HTTP/1.1 other\u log\u stuff HTTP/X.Y”,我们使用
/“GET(.*)HTTP/
(没有
),它将返回
“/some/path HTTP/1.1 other\u log\u stuff”
(它将第一个HTTP匹配为
*
,并且仅在最后一个HTTP停止)。你可以在谷歌上搜索“正则表达式惰性量词”以获取更多信息。使用惰性量词,该字符串上的匹配将是
“/some/path”
,就像我们想要的那样。这就是解决方案,谢谢。一个问题,为什么是“?”?我知道。*表示任意数量的字符,但是?做再次感谢。@joerdie它将使匹配的非贪婪(或;懒惰),也就是说,它将在遇到HTTP时立即停止,而不是试图使HTTP成为
*
的一部分。例如,如果行是“GET/some/path HTTP/1.1 other\u log\u stuff HTTP/X.Y”,我们使用
/“GET(.*)HTTP/
(没有
),它将返回
“/some/path HTTP/1.1 other\u log\u stuff”
(它将第一个HTTP匹配为
*
,并且仅在最后一个HTTP停止)。你可以用谷歌搜索“regex lazy quantif”