String Matlab中如何从混合字符串中提取日期时间
我在Matlab中有一个100K行的字符串向量。每行包括字母、数字和[./@;,]的混合 我需要检测每行中是否存在这些模式:String Matlab中如何从混合字符串中提取日期时间,string,matlab,datetime,String,Matlab,Datetime,我在Matlab中有一个100K行的字符串向量。每行包括字母、数字和[./@;,]的混合 我需要检测每行中是否存在这些模式: MM/dd/YYYY HH:mm MM.dd.YYYY HH:mm MM/dd/YY HH:mm MM.dd.YY HH:mm 例如: "Read back and verified on 1/15/13 1935 CM;" "Was negative on 02.10.2015 @ 2015;" "Result came back positive 4.2.2016
MM/dd/YYYY HH:mm
MM.dd.YYYY HH:mm
MM/dd/YY HH:mm
MM.dd.YY HH:mm
例如:
"Read back and verified on 1/15/13 1935 CM;"
"Was negative on 02.10.2015 @ 2015;"
"Result came back positive 4.2.2016 0821;"
输出应为(日期时间格式):
您可以使用以下一组正则表达式。我这样对待两位数的年份: 如果年份大于今天的年份减去2000,则假定为
19xx
,否则假定为20xx
。预计2099年后会出现问题;-)
Edit:我使用了strings
作为变量,但它也是一个命令。我已将变量名更改为string\u list
patterns = [ ...
'(?<month>\d{1,2})/(?<day>\d{1,2})/(?<year>\d{4}) @? ?(?<hours>\d{2})(?<minutes>\d{2})|' ... % {m}m/{d}d/YYYY {@ }hhmm
'(?<month>\d{1,2})\.(?<day>\d{1,2})\.(?<year>\d{4}) @? ?(?<hours>\d{2})(?<minutes>\d{2})|' ... % {m}m.{d}d.YYYY {@ }hhmm
'(?<month>\d{1,2})/(?<day>\d{1,2})/(?<year>\d{2}) @? ?(?<hours>\d{2})(?<minutes>\d{2})|' ... % {m}m/{d}d/YY {@ }hhmm
'(?<month>\d{1,2})\.(?<day>\d{1,2})\.(?<year>\d{2}) @? ?(?<hours>\d{2})(?<minutes>\d{2})' ... % {m}m.{d}d.YY{@ }hhmm
];
string_list = [ ...
"Read back and verified on 1/15/13 1935 CM;"
"Was negative on 02.10.2015 @ 2015;"
"Result came back positive 4.2.2016 0821;"
"Some test for a year earlier than 2000 4.2.89 0821;"
];
matches = regexp(string_list, patterns,'names');
today = datetime('today');
currentyear = today.Year - 2000;
dates = cell(size(matches));
for i = 1:numel(matches)
year = str2double(matches{i}.year);
if year < 100
if year > currentyear
year = year + 1900;
else
year = year + 2000;
end
end
dates{i} = datetime(year, str2double(matches{i}.month), str2double(matches{i}.day), str2double(matches{i}.hours), str2double(matches{i}.minutes), 0);
end
也许结尾处的
for
-循环可以简化。首选输出中的第二个2015
应该是20:15
(这是一个时间,不是吗?)并且MM.dd.YYYY HH:MM
中的YYYY
应该是yyyyy
,不是吗?
可以签名吗(您在模式中没有提到)出现在任何模式中,或仅出现在带有MM.dd.YYYY
的模式中?@PatrickHappel@可能出现在任何模式中。如果有“at”而不是@怎么办?我如何为该模式编写模式?重复模式中的四行,并将@
替换为(?:at)?
。确保模式
中除最后一行以外的所有行都以|
结尾。
patterns = [ ...
'(?<month>\d{1,2})/(?<day>\d{1,2})/(?<year>\d{4}) @? ?(?<hours>\d{2})(?<minutes>\d{2})|' ... % {m}m/{d}d/YYYY {@ }hhmm
'(?<month>\d{1,2})\.(?<day>\d{1,2})\.(?<year>\d{4}) @? ?(?<hours>\d{2})(?<minutes>\d{2})|' ... % {m}m.{d}d.YYYY {@ }hhmm
'(?<month>\d{1,2})/(?<day>\d{1,2})/(?<year>\d{2}) @? ?(?<hours>\d{2})(?<minutes>\d{2})|' ... % {m}m/{d}d/YY {@ }hhmm
'(?<month>\d{1,2})\.(?<day>\d{1,2})\.(?<year>\d{2}) @? ?(?<hours>\d{2})(?<minutes>\d{2})' ... % {m}m.{d}d.YY{@ }hhmm
];
string_list = [ ...
"Read back and verified on 1/15/13 1935 CM;"
"Was negative on 02.10.2015 @ 2015;"
"Result came back positive 4.2.2016 0821;"
"Some test for a year earlier than 2000 4.2.89 0821;"
];
matches = regexp(string_list, patterns,'names');
today = datetime('today');
currentyear = today.Year - 2000;
dates = cell(size(matches));
for i = 1:numel(matches)
year = str2double(matches{i}.year);
if year < 100
if year > currentyear
year = year + 1900;
else
year = year + 2000;
end
end
dates{i} = datetime(year, str2double(matches{i}.month), str2double(matches{i}.day), str2double(matches{i}.hours), str2double(matches{i}.minutes), 0);
end
dates =
4×1 cell array
{[15-Jan-2013 19:35:00]}
{[10-Feb-2015 20:15:00]}
{[02-Apr-2016 08:21:00]}
{[02-Apr-1989 08:21:00]}