需要一些正则表达式(PHP)的帮助

需要一些正则表达式(PHP)的帮助,php,regex,Php,Regex,我想使用preg_replace将txt文件解析为HTML以添加格式。 文件的格式如下所示: 09:19:49 13-12-15 Sunday Hello World 1234567 Today is a beautiful day 1234568 Tomorrow will be even better 1234569 December is the best month of the year! 应将其视为一个组并解析为一个表,如: <table> <tr><

我想使用preg_replace将txt文件解析为HTML以添加格式。 文件的格式如下所示:

09:19:49 13-12-15 Sunday Hello World
1234567 Today is a beautiful day
1234568 Tomorrow will be even better
1234569 December is the best month of the year!
应将其视为一个组并解析为一个表,如:

<table>
<tr><td>09:19:49 13-12-15</td><td>Sunday</td><td>Hello World</td></tr>
<tr><td>1234567</td><td>(optional)</td><td>Today is a beautiful day</td></tr>
<tr><td>1234568</td><td>(optional)</td><td>Tomorrow will be even better</td></tr>
<tr><td>1234569</td><td>(optional)</td><td>December is the best month of the year!</td></tr>
</table>
因此,我只想捕捉并替换第一个块和最后一个块,从时间/日期和下面几行开始,从7位ID开始

到目前为止,感谢您的阅读;)

我认为这就完成了你想要做的事情

有一行我不清楚为什么应该忽略它:

1234570应忽略此行

该行符合
7位数字和一些文本的要求

我想出的正则表达式是:

/^(\d{2}:\d{2}:\d{2}\h*\d{1,2}-\d{1,2}-\d{1,2}|\d{7})\h*([a-zA-Z]{3}day)?\h*(.+)/m
以下是regex101演示:

在PHP使用中:

$string = '09:19:49 13-12-15 Sunday Hello World
1234567 Today is a beautiful day
1234568 Tomorrow will be even better
1234569 December is the best month of the year!

Liverpool - WBA 2-2

1234570 This line should be ignored

19:29:59 13-12-15 Sunday Hello World
1234571 Today is a beautiful day
1234572 Tomorrow will be even better';
echo preg_replace('/^(\d{2}:\d{2}:\d{2}\h*\d{1,2}-\d{1,2}-\d{1,2}|\d{7})\h*([a-zA-Z]{3}day)?\h*(.+)/m', '<td>$1</td><td>$2</td><td>$3</td>', $string);
$string='09:19:49 13-12-15 Sunday Hello World
1234567今天是美好的一天
1234568明天会更好
1234569年12月是一年中最好的月份!
利物浦-WBA 2-2
1234570应忽略此行
19:29:59 13-12-15星期日你好,世界
今天是美好的一天
1234572明天会更好”;
echo preg_replace('/^({2}:\d{2}:\d{2}\h*\d{1,2}-\d{1,2}-\d{1,2}\d{124; 7})\h*([a-zA-Z]{3}天)\h*(.+)/m','1$2$3',$string);
输出:

<td>09:19:49 13-12-15</td><td>Sunday</td><td>Hello World</td>
<td>1234567</td><td></td><td>Today is a beautiful day</td>
<td>1234568</td><td></td><td>Tomorrow will be even better</td>
<td>1234569</td><td></td><td>December is the best month of the year!</td>

Liverpool - WBA 2-2

<td>1234570</td><td></td><td>This line should be ignored</td>

<td>19:29:59 13-12-15</td><td>Sunday</td><td>Hello World</td>
<td>1234571</td><td></td><td>Today is a beautiful day</td>
<td>1234572</td><td></td><td>Tomorrow will be even better</td>
<table><tr><td>09:19:49 13-12-15</td><td>Sunday</td><td>Hello World</td></tr><tr><td>1234567</td><td>Today is a beautiful day</td></tr><tr><td>1234568</td><td>Tomorrow will be even better</td></tr><tr><td>1234569</td><td>December is the best month of the year!</td></tr></table>
Liverpool - WBA 2-2

1234570 This line should be ignored
<table><tr><td>19:29:59 13-12-15</td><td>Sunday</td><td>Hello World</td></tr><tr><td>1234571</td><td>Today is a beautiful day</td></tr><tr><td>1234572</td><td>Tomorrow will be even better</td></tr></table>
09:19:49 13-12-15星期日Hello World
1234567今天是美好的一天
1234568明天会更好
1234569十二月是一年中最好的月份!
利物浦-WBA 2-2
1234570应忽略此行
19:29:59 13-12-15 SundayHello世界
今天是美好的一天
1234572明天会更好

好的,根据您的更新,它有点复杂,但我认为这可以做到:

$string = '09:19:49 13-12-15 Sunday Hello World
1234567 Today is a beautiful day
1234568 Tomorrow will be even better
1234569 December is the best month of the year!

Liverpool - WBA 2-2

1234570 This line should be ignored

19:29:59 13-12-15 Sunday Hello World
1234571 Today is a beautiful day
1234572 Tomorrow will be even better';
echo preg_replace_callback('/(?:^|\n)(\d{2}:\d{2}:\d{2}\h*\d{1,2}-\d{1,2}-\d{1,2})\h+([a-zA-Z]{3}day)?\h*(.+?)\n((\d{7})\h+(.+?)(\n|$))+/', 
                    function ($matches) {
                        $lines = explode("\n", $matches[0]);
                        $theoutput = '<table><tr>';
                        foreach($lines as $line) {
                            if(preg_match('/(?:^|\n)(\d{2}:\d{2}:\d{2}\h*\d{1,2}-\d{1,2}-\d{1,2})\h+([a-zA-Z]{3}day)?\h*(.*)/', $line, $output)) {
                                //it is the first date string line;
                                foreach($output as $key => $values) {
                                    if(!empty($key)) {
                                        $theoutput .= '<td>' . $values . '</td>';
                                    }
                                }
                            } else {
                                if(preg_match('/(\d{7})\h*(.*)/', $line, $output)) {
                                    $theoutput .= '</tr><tr>';
                                    foreach($output as $key => $values) {
                                        if(!empty($key)) {
                                            $theoutput .= '<td>' . $values . '</td>';
                                        }
                                    }
                                }
                            }
                        }
                        $theoutput .= '</tr></table>';
                        return $theoutput;
                    }, $string);
$string='09:19:49 13-12-15 Sunday Hello World
1234567今天是美好的一天
1234568明天会更好
1234569年12月是一年中最好的月份!
利物浦-WBA 2-2
1234570应忽略此行
19:29:59 13-12-15星期日你好,世界
今天是美好的一天
1234572明天会更好”;
echo preg\u replace\u callback('/(?:^\n)(\d{2}:\d{2}:\d{2}\h*\d{1,2}-\d{1,2}-\d{1,2})\h+([a-zA-Z]{3}天)\h*(.+?)\n((\d{7})\h)(.+?)(,
函数($matches){
$lines=explode(“\n”,$matches[0]);
$theoutput='';
foreach($line作为$line){
if(preg_match('/(?:^ |\n)(\d{2}:\d{2}:\d{2}\h*\d{1,2}-\d{1,2}-\d{1,2})\h+([a-zA-Z]{3}天)\h*(.*/',$line,$output)){
//它是第一个日期字符串行;
foreach($key=>$value的输出){
如果(!空($key)){
$theoutput.=''.$values';
}
}
}否则{
if(preg_match('/(\d{7})\h*(.*)/',$line,$output)){
$theoutput.='';
foreach($key=>$value的输出){
如果(!空($key)){
$theoutput.=''.$values';
}
}
}
}
}
$theoutput.='';
返回$theoutput;
},$string);
输出:

<td>09:19:49 13-12-15</td><td>Sunday</td><td>Hello World</td>
<td>1234567</td><td></td><td>Today is a beautiful day</td>
<td>1234568</td><td></td><td>Tomorrow will be even better</td>
<td>1234569</td><td></td><td>December is the best month of the year!</td>

Liverpool - WBA 2-2

<td>1234570</td><td></td><td>This line should be ignored</td>

<td>19:29:59 13-12-15</td><td>Sunday</td><td>Hello World</td>
<td>1234571</td><td></td><td>Today is a beautiful day</td>
<td>1234572</td><td></td><td>Tomorrow will be even better</td>
<table><tr><td>09:19:49 13-12-15</td><td>Sunday</td><td>Hello World</td></tr><tr><td>1234567</td><td>Today is a beautiful day</td></tr><tr><td>1234568</td><td>Tomorrow will be even better</td></tr><tr><td>1234569</td><td>December is the best month of the year!</td></tr></table>
Liverpool - WBA 2-2

1234570 This line should be ignored
<table><tr><td>19:29:59 13-12-15</td><td>Sunday</td><td>Hello World</td></tr><tr><td>1234571</td><td>Today is a beautiful day</td></tr><tr><td>1234572</td><td>Tomorrow will be even better</td></tr></table>
09:19:49 13-12-15星期日Hello World1234567今天是美好的一天1234568明天会更好1234569十二月是一年中最好的月份!
利物浦-WBA 2-2
1234570应忽略此行
19:29:59 13-12-15星期日Hello World1234571今天是美好的一天1234572明天会更好

看起来很不错,谢谢!到目前为止,我只能让第二个正则表达式获取第一个ID行,而不能获取后面的ID行。我会努力想办法的!:)它挂在你身上的是什么?它挂在什么输入上?我想我现在已经让它工作了。获取ID行的preg_replace_回调的最后一部分是这样的:[\t]*((\d{7})[\t]*(.+?)(\n |$)但是在捕获组中也应该包括空格:([\t]*(\d{7})[\t]*(.+?)(\n |$)顺便问一下,\h做什么?我已将其更改为[\t]
\h
是一个水平空白(取决于运行它的语言/软件);表示空格或制表符。您可以使用regex101调试特定位;或regex101的一个示例。在右上角有一个关于正则表达式的描述。做了一些调整,现在它大约是我想要的99%,仍然是微调。谢谢你的帮助,克里斯,非常感谢:)