PHP,解析数据的正则表达式
我有以下格式的数据: 足球比赛-101只卡罗莱纳黑豹+15-110 足球-101名卡罗莱纳黑豹队/匹兹堡钢人队36½-110岁以下比赛 足球-匹兹堡钢人队102人-上半场9-120PHP,解析数据的正则表达式,php,regex,Php,Regex,我有以下格式的数据: 足球比赛-101只卡罗莱纳黑豹+15-110 足球-101名卡罗莱纳黑豹队/匹兹堡钢人队36½-110岁以下比赛 足球-匹兹堡钢人队102人-上半场9-120 如何将其转换为PHP数组: $game_data[] = array( 'sport_type' => 'Football', 'game_number' => 101, 'game_name' => '
如何将其转换为PHP数组:
$game_data[] = array( 'sport_type' => 'Football',
'game_number' => 101,
'game_name' => 'Carolina Panthers',
'runline_odd' => '+15 -110',
'total_odd' => '',
'odd_type' => 'runline',
'period' => 'Game' );
$game_data[] = array( 'sport_type' => 'Football',
'game_number' => 101,
'game_name' => 'Carolina Panthers/Pittsburgh Steelers',
'runline_odd' => '',
'total_odd' => 'under 36½ -110',
'odd_type' => 'total_odd',
'period' => 'Game' );
$game_data[] = array( 'sport_type' => 'Football',
'game_number' => 102,
'game_name' => 'Pittsburgh Steelers',
'runline_odd' => '-9 -120',
'total_odd' => '',
'odd_type' => 'runline',
'period' => '1st Half' );
以下作品,gmae名称后面有一个“下”字的情况除外:
/([^-]+)\s*-\s*(\d+)\s*([^\d+-]+)\s*((?:under\s*)?[\d\s+-]+)\s*for\s*(.+)/
说明:
([^-]+): Match anything other than -, which is separating gmae name from other details.
\s*-\s*: - surrounded with spaces
(\d+) : Game number
([^\d+-]+): Anything other than +, -, a digit. Matches gmae name.
((?:under\s*)?[\d\s+-]+): runline odd or total odd.
附言:
通常我不会为某个人解决整个问题,但
½
字符使它足够有趣。现在,我不是正则表达式的超级专家,所以这可能不是最优化或最优雅的解决方案,但它似乎完成了任务。至少使用提供的示例输入
编辑:哎呀。没有发现下的实际上是运行行数据的一部分。所以这实际上并没有完成任务。我会回来的
EDIT2:稍微修改了正则表达式,现在它在runline\u odd
和runline\u total
之间正确匹配
<?php
$input = array(
'Football - 101 Carolina Panthers +15 -110 for Game',
'Football - 101 Carolina Panthers/Pittsburgh Steelers under 36½ -110 for Game',
'Football - 102 Pittsburgh Steelers -9 -120 for 1st Half'
);
$regex = '^(?<sport_type>[[:alpha:]]*) - '.
'(?<game_number>[0-9]*) '.
'('.
'(?<game_nameb>[[:alpha:]\/ ]*?) '.
'(?<runline_total>(under ([0-9\x{00BD}]+){1}) ((-|\+)?([-+0-9\x{00BD}]+){1})) for '.
'|'.
'(?<game_namea>[[:alpha:]\/ ]*) '.
'(?<runline_odd>((-|\+)?([0-9\x{00BD}]+){1}) ((-|\+)?([-+0-9\x{00BD}]+){1})) for '.
')'.
'(?<period>.*)$';
$game_data = array();
foreach ($input as $in) {
$matches = false;
$cnt = preg_match('/' . $regex . '/ui', $in, $matches);
if ($cnt && is_array($matches) && count($matches)) {
if (empty($matches['game_nameb'])) {
$game_name = $matches['game_namea'];
$runline_odd = $matches['runline_odd'];
$total_odd = '';
} else {
$game_name = $matches['game_nameb'];
$runline_odd = '';
$total_odd = $matches['runline_total'];
}
$result = array(
'sport_type' => $matches['sport_type'],
'game_number' => $matches['game_number'],
'game_name' => $game_name,
'runline_odd' => $runline_odd,
'total_odd' => $total_odd,
'period' => $matches['period']
);
array_push($game_data, $result);
}
}
var_dump($game_data);
到目前为止你试过什么?(你的问题读起来好像是在要求一个现成的解决方案,但根本不想学习和/或理解正则表达式。)欢迎来到StackOverflow!将来,如果你想显示大块代码,你可以像我重新格式化你的问题一样,在每行前面加上四个空格。@Antal S-Z:谢谢:3你可以想象用户有三行以上的输入,而且不会每次都起作用…@danip,没错,我甚至不懂足球,不知道所有可能的情况。但是基于这三行(除了我提到的under case),我确实看到了一种模式:(string)-(number)(string)(number)for(string)
@Sarah:我知道这一点。我花了一段时间修改正则表达式,但现在它工作了。它不匹配第二个字符串:),这导致数组包含2个元素。@Sarah:从我提供的示例输出中可以看出,它确实工作得很好,所以我怀疑问题出在其他地方。你能检查一下吗:a)你在哪个PHP版本上运行它?b) 数据采用哪种编码?--我认为这可能是一个编码问题。带有u
标志的preg\u match
需要unicode输入。也许你需要对你的输入进行utf8\u编码。@Sarah:太棒了,很高兴能为你提供帮助:)
$ /usr/local/bin/php preg-match.php
array(3) {
[0]=>
array(6) {
["sport_type"]=>
string(8) "Football"
["game_number"]=>
string(3) "101"
["game_name"]=>
string(17) "Carolina Panthers"
["runline_odd"]=>
string(8) "+15 -110"
["total_odd"]=>
string(0) ""
["period"]=>
string(4) "Game"
}
[1]=>
array(6) {
["sport_type"]=>
string(8) "Football"
["game_number"]=>
string(3) "101"
["game_name"]=>
string(37) "Carolina Panthers/Pittsburgh Steelers"
["runline_odd"]=>
string(0) ""
["total_odd"]=>
string(15) "under 36½ -110"
["period"]=>
string(4) "Game"
}
[2]=>
array(6) {
["sport_type"]=>
string(8) "Football"
["game_number"]=>
string(3) "102"
["game_name"]=>
string(19) "Pittsburgh Steelers"
["runline_odd"]=>
string(7) "-9 -120"
["total_odd"]=>
string(0) ""
["period"]=>
string(8) "1st Half"
}
}