Swift 如何快速解析Apache日志文件?
假设我有一个日志文件,我已经将它拆分为一个字符串数组。例如,我这里有这些行 123.4.5.1---[03/Sep/2013:18:38:48-0600]“GET/products/car/HTTP/1.1”200 3327-“Mozilla/5.0(Macintosh;英特尔Mac OS X 10_8_4)AppleWebKit/537.36(KHTML,类似Gecko)Chrome/29.0.1547.65 Safari/537.36” 123.4.5.6--[03/Sep/2013:18:38:58-0600]“GET/jobs/HTTP/1.1”500 821“-”Mozilla/5.0(Macintosh;Intel Mac OS X 10.8;rv:23.0)Gecko/20100101 Firefox/23.0” 我可以用典型的字符串操作来解析它们,但是我认为有更好的方法用正则表达式来解析。我试图遵循一个类似的模式,有人用过,但我不太明白。这是我的尝试 这是模式:([(\d])+)--[(.?)]“(.?)”(\d+)-“(.?)”(.?)”(.?)”(.?)“当我尝试使用它时,我没有得到匹配项Swift 如何快速解析Apache日志文件?,swift,regex,parsing,Swift,Regex,Parsing,假设我有一个日志文件,我已经将它拆分为一个字符串数组。例如,我这里有这些行 123.4.5.1---[03/Sep/2013:18:38:48-0600]“GET/products/car/HTTP/1.1”200 3327-“Mozilla/5.0(Macintosh;英特尔Mac OS X 10_8_4)AppleWebKit/537.36(KHTML,类似Gecko)Chrome/29.0.1547.65 Safari/537.36” 123.4.5.6--[03/Sep/2013:18:
let lines = contents.split(separator: "\n")
let pattern = "([(\\d\\.)]+) - - \\[(.*?)\\] \"(.*?)\" (\\d+) - \"(.*?)\" \"(.*?)\""
let regex = try! NSRegularExpression(pattern: pattern, options: [])
for line in lines {
let range = NSRange(location: 0, length: line.utf16.count)
let parsedData = regex.firstMatch(in: String(line), options: [], range: range)
print(parsedData)
}
如果我能将数据提取到一个模型中,那将是最好的。我需要确保代码的性能和速度,因为可能有数千行我应该考虑
预期结果
我将寻找解析后的行,将其分解为各个部分<代码>IP,
操作系统
,操作系统版本
,浏览器
浏览器版本
等。。对数据进行任何真正的解析就足够了。模式没有匹配项,因为在连字符处有1+个数字
为了提高模式的性能,您可以使用“([^”]*)”
捕获“
之间除”
以外的任何字符
捕获组1,匹配1+个数字,并重复1+次a(\d+(?:\。\d+)
和1+个数字
逐字匹配-
匹配\[([^\]\[]+)\]
捕获组2中除[
和[
之外的任何字符的1+倍,并匹配]
]
匹配“([^”]*)”
捕获组3中除“
以外的任何字符的1+倍,并匹配“
”
捕获第4组和第5组匹配的1+位数(\d+)(\d+)
与前面的“[^”]+”
机制相同,但仅匹配“
与第6组中先前的“([^”]+)”
机制相同“
let pattern = #"(\d+(?:\.\d+)+) - - \[([^\]\[]+)\] "([^"]*)" (\d+) (\d+) "[^"]+" "([^"]+)""#
let regex = try! NSRegularExpression(pattern: pattern, options: .anchorsMatchLines)
let testString = #"123.4.5.1 - - [03/Sep/2013:18:38:48 -0600] "GET /products/car/ HTTP/1.1" 200 3327 "-" "Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/29.0.1547.65 Safari/537.36""#
let stringRange = NSRange(location: 0, length: testString.utf16.count)
let matches = regex.matches(in: testString, range: stringRange)
var result: [[String]] = []
for match in matches {
var groups: [String] = []
for rangeIndex in 1 ..< match.numberOfRanges {
groups.append((testString as NSString).substring(with: match.range(at: rangeIndex)))
}
if !groups.isEmpty {
result.append(groups)
}
}
print(result)
有了你们展示的样品,你们能试一下下面的吗
^((?:\d+\.){3}\d+).*?\[([^]]*)\].*?"([^"]*)"\s*(\d+)\s*(\d+)\s*"-"\s*"([^"]*)"$
说明:添加上述内容的详细说明
^( ##Starting a capturing group checking from starting of value here.
(?:\d+\.){3}\d+ ##In a non-capturing group matching 3 digits followed by . with 1 or more digits
) ##Closing 1st capturing group here.
.*?\[ ##Matching non greedy till [ here.
([^]]*) ##Creating 2nd capturing group till ] here.
\].*?" ##Matching ] and non greedy till " here.
([^"]*) ##Creating 3rd capturing group which has values till " here.
"\s* ##Matching " spaces one or more occurrences here.
(\d+) ##Creating 4th capturing group here which has all digits here.
\s* ##Matching spaces one or more occurrences here.
(\d+) ##Creating 5th capturing group here which has all digits here.
\s*"-"\s*" ##Spaces 1 or more occurrences "-" followed by spaces 1 or more occurrences " here.
([^"]*) ##Creating 6th capturing group till " here.
"$ ##Matching " at last.
正确的正则表达式模式是@RavinderSingh13提供的,但是我还想添加我所做的工作,使其在我的代码中正常运行,以便其他人将来可以使用它,而不必搜索所有StackOverflow的答案 我需要找到一种方法,将Apache日志文件解析为swift中的可用对象 实现扩展 解析数据 我想指出,正则表达式模式返回一个[[String]],因此您必须从返回的总体组中获取子组。类似于解析JSON
func parseData() {
let documentsUrl:URL = FileManager.default.urls(for: .documentDirectory, in: .userDomainMask).first!
let destinationFileUrl = documentsUrl.appendingPathComponent("logfile.log")
do {
let contents = try String(contentsOf: destinationFileUrl, encoding: .utf8)
let lines = contents.split(separator: "\n")
let pattern = "^((?:\\d+\\.){3,}\\d).*?\\[([^]]*)\\].*?\"([^\"]*)\"\\s*(\\d+)\\s+(\\d+)\\s*\"-\"\\s*\"([^\"]*)\"$"
for line in lines {
let group = String(line).groups(for: pattern)
let subGroup = group[0]
let ipAddress = subGroup[1]
let date = subGroup[2]
let getMethod = subGroup[3]
let statusCode = subGroup[4]
let secondStatusCode = subGroup[5]
let versionInfo = subGroup[6]
DispatchQueue.main.async {
self.eventLogs.append(EventLog(ipAddress: ipAddress, date: date, getMethod: getMethod, statusCode: statusCode, secondStatusCode: secondStatusCode, versionInfo: versionInfo))
}
}
} catch {
print(error.localizedDescription)
}
}
在我看来,这更像是一个Apache日志文件。@MartinR是的,这是我的错别字。它被更正了。我不知道为什么我不能让这个模式工作。每当我试图使用它时,我都会抛出一个无效的正则表达式错误。@xTwisteDx我看到了它,你必须转义字符类
\[([^\]\[]+]中的方括号。]
我已经更新了答案。你显然是一个正则表达式之神。这个解决方案奏效了。
^((?:\d+\.){3}\d+).*?\[([^]]*)\].*?"([^"]*)"\s*(\d+)\s*(\d+)\s*"-"\s*"([^"]*)"$
^( ##Starting a capturing group checking from starting of value here.
(?:\d+\.){3}\d+ ##In a non-capturing group matching 3 digits followed by . with 1 or more digits
) ##Closing 1st capturing group here.
.*?\[ ##Matching non greedy till [ here.
([^]]*) ##Creating 2nd capturing group till ] here.
\].*?" ##Matching ] and non greedy till " here.
([^"]*) ##Creating 3rd capturing group which has values till " here.
"\s* ##Matching " spaces one or more occurrences here.
(\d+) ##Creating 4th capturing group here which has all digits here.
\s* ##Matching spaces one or more occurrences here.
(\d+) ##Creating 5th capturing group here which has all digits here.
\s*"-"\s*" ##Spaces 1 or more occurrences "-" followed by spaces 1 or more occurrences " here.
([^"]*) ##Creating 6th capturing group till " here.
"$ ##Matching " at last.
extension String {
func groups(for regexPattern: String) -> [[String]] {
do {
let text = self
let regex = try NSRegularExpression(pattern: regexPattern)
let matches = regex.matches(in: text,
range: NSRange(text.startIndex..., in: text))
return matches.map { match in
return (0..<match.numberOfRanges).map {
let rangeBounds = match.range(at: $0)
guard let range = Range(rangeBounds, in: text) else {
return ""
}
return String(text[range])
}
}
} catch let error {
print("invalid regex: \(error.localizedDescription)")
return []
}
}
}
class EventLog {
let ipAddress: String
let date: String
let getMethod: String
let statusCode: String
let secondStatusCode: String
let versionInfo: String
init(ipAddress: String, date: String, getMethod: String, statusCode: String, secondStatusCode: String, versionInfo: String ){
self.ipAddress = ipAddress
self.date = date
self.getMethod = getMethod
self.statusCode = statusCode
self.secondStatusCode = secondStatusCode
self.versionInfo = versionInfo
}
}
func parseData() {
let documentsUrl:URL = FileManager.default.urls(for: .documentDirectory, in: .userDomainMask).first!
let destinationFileUrl = documentsUrl.appendingPathComponent("logfile.log")
do {
let contents = try String(contentsOf: destinationFileUrl, encoding: .utf8)
let lines = contents.split(separator: "\n")
let pattern = "^((?:\\d+\\.){3,}\\d).*?\\[([^]]*)\\].*?\"([^\"]*)\"\\s*(\\d+)\\s+(\\d+)\\s*\"-\"\\s*\"([^\"]*)\"$"
for line in lines {
let group = String(line).groups(for: pattern)
let subGroup = group[0]
let ipAddress = subGroup[1]
let date = subGroup[2]
let getMethod = subGroup[3]
let statusCode = subGroup[4]
let secondStatusCode = subGroup[5]
let versionInfo = subGroup[6]
DispatchQueue.main.async {
self.eventLogs.append(EventLog(ipAddress: ipAddress, date: date, getMethod: getMethod, statusCode: statusCode, secondStatusCode: secondStatusCode, versionInfo: versionInfo))
}
}
} catch {
print(error.localizedDescription)
}
}