Java 在Scala中有效地从字符串中提取日期

Java 在Scala中有效地从字符串中提取日期,java,scala,scala-collections,scalatest,Java,Scala,Scala Collections,Scalatest,我想从Scala格式的多个字符串中提取日期(例如,2015-01-01): val s = "basedir/somedir/tmp/BLAH/2015-01-01.txt" 我知道我可以做基本的字符串分割修剪条操作来实现这一点,但是在Scala中有没有更干净的方法呢?我可以使用Scala提供的一些很好的正则表达式“隐藏特性”来做到这一点吗 我试过了,但没有成功: val s = "basedir/somedir/tmp/BLAH/2015-01-01.txt" val regex = "(\

我想从Scala格式的多个字符串中提取日期(例如,
2015-01-01
):

val s = "basedir/somedir/tmp/BLAH/2015-01-01.txt"
我知道我可以做基本的字符串分割修剪条操作来实现这一点,但是在Scala中有没有更干净的方法呢?我可以使用Scala提供的一些很好的正则表达式“隐藏特性”来做到这一点吗

我试过了,但没有成功:

val s = "basedir/somedir/tmp/BLAH/2015-01-01.txt"
val regex = "(\\d+)-(\\d+)-(\\d+).txt"
val regex(year, month, date) = s

使用正则表达式提取器使用模式匹配

val regex = ".*/(\\d{4}-\\d{2}-\\d{2}).txt".r //remove / after .* if you think its not needed.

str match {
  case regex(date) => Some(date)
  case _ => None
}
使用上面的代码而不是下面的代码,因为下面的代码会在运行时导致匹配错误

val regex(a) = "basedir/somedir/tmp/BLAH/2015-01-01.txt"
而不是正则表达式前面的
*
。您可以使用
unchored

val regex = "(\\d{4}-\\d{2}-\\d{2}).txt".r.unanchored
Scala REPL

scala>  val regex = "(\\d{4}-\\d{2}-\\d{2}).txt".r.unanchored
regex: scala.util.matching.UnanchoredRegex = (\d{4}-\d{2}-\d{2}).txt

scala> val regex(a) = "basedir/somedir/tmp/BLAH/2015-01-01.txt"
a: String = 2015-01-01
scala> val regex = ".*/(\\d{4}-\\d{2}-\\d{2}).txt".r
regex: scala.util.matching.Regex = .*/(\d{4}-\d{2}-\d{2}).txt

scala> val regex(a) = "basedir/somedir/tmp/BLAH/2015-01-01.txt"
a: String = 2015-01-01
scala> val str = "basedir/somedir/tmp/BLAH/2015-01-01.txt"
str: String = basedir/somedir/tmp/BLAH/2015-01-01.txt

scala>  val regex = ".*/(\\d{4}-\\d{2}-\\d{2}).txt".r
regex: scala.util.matching.Regex = .*/(\d{4}-\d{2}-\d{2}).txt

scala>
     |     str match {
     |       case regex(date) => Some(date)
     |       case _ => None
     |     }
res21: Option[String] = Some(2015-01-01)
Scala REPL

scala>  val regex = "(\\d{4}-\\d{2}-\\d{2}).txt".r.unanchored
regex: scala.util.matching.UnanchoredRegex = (\d{4}-\d{2}-\d{2}).txt

scala> val regex(a) = "basedir/somedir/tmp/BLAH/2015-01-01.txt"
a: String = 2015-01-01
scala> val regex = ".*/(\\d{4}-\\d{2}-\\d{2}).txt".r
regex: scala.util.matching.Regex = .*/(\d{4}-\d{2}-\d{2}).txt

scala> val regex(a) = "basedir/somedir/tmp/BLAH/2015-01-01.txt"
a: String = 2015-01-01
scala> val str = "basedir/somedir/tmp/BLAH/2015-01-01.txt"
str: String = basedir/somedir/tmp/BLAH/2015-01-01.txt

scala>  val regex = ".*/(\\d{4}-\\d{2}-\\d{2}).txt".r
regex: scala.util.matching.Regex = .*/(\d{4}-\d{2}-\d{2}).txt

scala>
     |     str match {
     |       case regex(date) => Some(date)
     |       case _ => None
     |     }
res21: Option[String] = Some(2015-01-01)
Scala REPL

scala>  val regex = "(\\d{4}-\\d{2}-\\d{2}).txt".r.unanchored
regex: scala.util.matching.UnanchoredRegex = (\d{4}-\d{2}-\d{2}).txt

scala> val regex(a) = "basedir/somedir/tmp/BLAH/2015-01-01.txt"
a: String = 2015-01-01
scala> val regex = ".*/(\\d{4}-\\d{2}-\\d{2}).txt".r
regex: scala.util.matching.Regex = .*/(\d{4}-\d{2}-\d{2}).txt

scala> val regex(a) = "basedir/somedir/tmp/BLAH/2015-01-01.txt"
a: String = 2015-01-01
scala> val str = "basedir/somedir/tmp/BLAH/2015-01-01.txt"
str: String = basedir/somedir/tmp/BLAH/2015-01-01.txt

scala>  val regex = ".*/(\\d{4}-\\d{2}-\\d{2}).txt".r
regex: scala.util.matching.Regex = .*/(\d{4}-\d{2}-\d{2}).txt

scala>
     |     str match {
     |       case regex(date) => Some(date)
     |       case _ => None
     |     }
res21: Option[String] = Some(2015-01-01)
如果您还想匹配目录,则

scala val regex = ".*/(.*)/(\\d{4}-\\d{2}-\\d{2}).txt".r
regex: scala.util.matching.Regex = .*/(.*)/(\d{4}-\d{2}-\d{2}).txt

scala> val s = "basedir/somedir/tmp/BLAH/2015-01-01.txt"
s: String = "basedir/somedir/tmp/BLAH/2015-01-01.txt"

scala> val regex(dir, date) = s
dir: String = "BLAH"
date: String = "2015-01-01"

昨天我不得不解析一些日志文件,每行有三到四个不同的时间表示

我推荐最小正则表达式和最大类型的东西

$ scala
Welcome to Scala 2.12.0 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_101).
Type in expressions for evaluation. Or try :help.

scala> val s = "basedir/somedir/tmp/BLAH/2015-01-01.txt"
s: String = basedir/somedir/tmp/BLAH/2015-01-01.txt

scala> val r = raw".*/([\d-]*)\.txt".r
r: scala.util.matching.Regex = .*/([\d-]*)\.txt

scala> val r(date) = s
date: String = 2015-01-01

scala> import java.time._, format._, DateTimeFormatter._
import java.time._
import format._
import DateTimeFormatter._

scala> ISO_LOCAL_DATE.parse(date)
res0: java.time.temporal.TemporalAccessor = {},ISO resolved to 2015-01-01

scala> Instant.from(res0)
java.time.DateTimeException: Unable to obtain Instant from TemporalAccessor: {},ISO resolved to 2015-01-01 of type java.time.format.Parsed
  at java.time.Instant.from(Instant.java:378)
  ... 27 elided
Caused by: java.time.temporal.UnsupportedTemporalTypeException: Unsupported field: InstantSeconds
  at java.time.format.Parsed.getLong(Parsed.java:203)
  at java.time.Instant.from(Instant.java:373)
  ... 27 more

scala> LocalDate.from(res0)
res2: java.time.LocalDate = 2015-01-01
为方便起见,其他变体:

scala> object LocalDateX { def unapply(s: String): Option[LocalDate] = util.Try(LocalDate.from(ISO_LOCAL_DATE.parse(s))).toOption }
defined object LocalDateX

scala> val r(LocalDateX(date)) = s
date: java.time.LocalDate = 2015-01-01


你可以分两行来做

import java.text.SimpleDateFormat
val date_format = new java.text.SimpleDateFormat("yyyy-MM-dd")
date_format.format(date_format.parse("2017-10-26 09:15:54.127"))
res39: String = 2017-10-26

如果我还想提取“YYYY-MM-dd.txt”文件所在的目录,该怎么办?所以,假设我想提取(“废话”,“2015-01-01”)作为一个元组,而不仅仅是“2015-01-01”"? 谢谢。@Darth.Vader编辑了答案,并在结尾添加了您想要的内容。有一个look@Darth.Vader基本上更改正则表达式以合并dir-alsoI-am,在包含txt文件的目录列表上运行此解析器。如何将结果放在一个hashmap中,目录名作为键,文件名作为值?我认为时间libs周围有包装器,但我不知道它们是否提供了方便的提取器。