Php 解析.srt文件_Php_Parsing_Srt

Php 解析.srt文件

php parsing

Php 解析.srt文件,php,parsing,srt,Php,Parsing,Srt,我需要用php解析srt文件，并用变量打印文件中的所有子文件我找不到正确的注册表表达式。在执行此操作时，我需要获取id、时间和字幕变量。当打印时，不能没有array（）等。必须打印与原始文件中相同的内容我的意思是我必须像印刷品一样印刷 1 00:00:00,074 --> 00:00:02,564 Previously on Breaking Bad... 2 00:00:02,663 --> 00:00:04,393 Words... $number（例如1） $time（

我需要用php解析srt文件，并用变量打印文件中的所有子文件

我找不到正确的注册表表达式。在执行此操作时，我需要获取id、时间和字幕变量。当打印时，不能没有array（）等。必须打印与原始文件中相同的内容

我的意思是我必须像印刷品一样印刷

1
00:00:00,074 --> 00:00:02,564
Previously on Breaking Bad...

2
00:00:02,663 --> 00:00:04,393
Words...

$number
（例如1）
$time
（例如00:00:00074-->00:00:02564）
$subtitle
（例如，前一篇关于打破坏习惯…）

顺便说一下，我有这个密码。但它看不到线条。必须对其进行编辑，但如何编辑

$number <br> (e.g. 1)
$time <br> (e.g. 00:00:00,074 --> 00:00:02,564)
$subtitle <br> (e.g. Previously on Breaking Bad...)

$srt\u file=file（'test.srt'，file\u IGNORE\u NEW\u行）；
$regex=“/^（+\d）+（[\d]+：[\d]+：[\d]+，[\d]+）-->（[\d]+：[\d]+：[\d]+，[\d]+）（\w.+）/”；
foreach（$srt\u文件为$srt）{
预匹配（$regex、$srt、$srt_行）；
打印（srt线）；
回声“
”；
}

不匹配，因为您的$srt\u文件数组可能如下所示：

$srt_file = file('test.srt',FILE_IGNORE_NEW_LINES);
$regex = "/^(\d)+ ([\d]+:[\d]+:[\d]+,[\d]+) --> ([\d]+:[\d]+:[\d]+,[\d]+) (\w.+)/";

foreach($srt_file as $srt){

    preg_match($regex,$srt,$srt_lines);

    print_r($srt_lines);
    echo '<br />';

}

Array
([0] => '1',
[1] => '00:00:00,074 --> 00:00:02,564',
[2] => 'Previously on Breaking Bad...'.
[3] => '',
[4] => '2',
...
)

你的正则表达式不会匹配任何这些元素

如果您的目的是将整个文件读入一个长字符串，那么使用file_get_contents将整个文件内容读入一个字符串。然后使用preg_match_all获得所有正则表达式匹配

否则，您可能会尝试在数组中循环，并尝试匹配各种正则表达式模式，以确定该行是id、时间范围还是文本，然后进行适当的操作。显然，您可能还需要一些逻辑来确保以正确的顺序（id、时间范围、文本）获取值。

使用将

文件（）

数组分成4个块，然后省略最后一个条目，因为它是这样的空行：

$srt_file = file('test.srt',FILE_IGNORE_NEW_LINES);
$regex = "/^(\d)+ ([\d]+:[\d]+:[\d]+,[\d]+) --> ([\d]+:[\d]+:[\d]+,[\d]+) (\w.+)/";

foreach($srt_file as $srt){

    preg_match($regex,$srt,$srt_lines);

    print_r($srt_lines);
    echo '<br />';

}

Array
([0] => '1',
[1] => '00:00:00,074 --> 00:00:02,564',
[2] => 'Previously on Breaking Bad...'.
[3] => '',
[4] => '2',
...
)

foreach（数组块（文件（'test.srt'），4）作为$entry）{
列表（$number、$time、$subtitle）=$entry；
回显$number.“
”；
回显$time.“
”；
echo$subtitle.“
”；
}

下面是一个简短的状态机，用于逐行解析SRT文件：

foreach( array_chunk( file( 'test.srt'), 4) as $entry) {
    list( $number, $time, $subtitle) = $entry;
    echo $number . '<br />';
    echo $time . '<br />';
    echo $subtitle . '<br />';
}

结果:

define('SRT_STATE_SUBNUMBER', 0);
define('SRT_STATE_TIME',      1);
define('SRT_STATE_TEXT',      2);
define('SRT_STATE_BLANK',     3);

$lines   = file('test.srt');

$subs    = array();
$state   = SRT_STATE_SUBNUMBER;
$subNum  = 0;
$subText = '';
$subTime = '';

foreach($lines as $line) {
    switch($state) {
        case SRT_STATE_SUBNUMBER:
            $subNum = trim($line);
            $state  = SRT_STATE_TIME;
            break;

        case SRT_STATE_TIME:
            $subTime = trim($line);
            $state   = SRT_STATE_TEXT;
            break;

        case SRT_STATE_TEXT:
            if (trim($line) == '') {
                $sub = new stdClass;
                $sub->number = $subNum;
                list($sub->startTime, $sub->stopTime) = explode(' --> ', $subTime);
                $sub->text   = $subText;
                $subText     = '';
                $state       = SRT_STATE_SUBNUMBER;

                $subs[]      = $sub;
            } else {
                $subText .= $line;
            }
            break;
    }
}

if ($state == SRT_STATE_TEXT) {
    // if file was missing the trailing newlines, we'll be in this
    // state here.  Append the last read text and add the last sub.
    $sub->text = $subText;
    $subs[] = $sub;
}

print_r($subs);

然后，您可以在sub数组上循环或通过数组偏移量访问它们：

Array
(
    [0] => stdClass Object
        (
            [number] => 1
            [stopTime] => 00:00:24,400
            [startTime] => 00:00:20,000
            [text] => Altocumulus clouds occur between six thousand
        )

    [1] => stdClass Object
        (
            [number] => 2
            [stopTime] => 00:00:27,800
            [startTime] => 00:00:24,600
            [text] => and twenty thousand feet above ground level.
        )

)

要通过在每个接头上循环并显示来显示所有接头，请执行以下操作：

echo $subs[0]->number . ' says ' . $subs[0]->text . "\n";

foreach（$subs作为$sub）{
echo$sub->number.。从“$sub->startTime”开始。
'并在'$sub->stopTime'结束。文本为：
'。
foreach($subs as $sub) {
    echo $sub->number . ' begins at ' . $sub->startTime .
         ' and ends at ' . $sub->stopTime . '.  The text is: <br /><pre>' .
         $sub->text . "</pre><br />\n";
}

$sub->text。“
\n”；
}

进一步阅读：

我创建了一个类来将.srt文件转换为数组。数组的每个条目都具有以下属性：

id：表示字幕id的数字（2）
开始：浮动，以秒为单位的开始时间（24.443）
结束：浮动，以秒为单位的结束时间（27.647）
startString：人类可读格式的开始时间（00:00:24.443）
endString：人类可读格式的结束时间（00:00:24.647）
持续时间：字幕的持续时间，单位为毫秒（3204）
文字：字幕文字（孔雀统治拱门城）

代码为php7：

<?php

namespace VideoSubtitles\Srt;


class SrtToArrayTool
{


    public static function getArrayByFile(string $file): array
    {

        $ret = [];

        $gen = function ($filename) {
            $file = fopen($filename, 'r');
            while (($line = fgets($file)) !== false) {
                yield rtrim($line);
            }
            fclose($file);
        };

        $c = 0;
        $item = [];
        $text = '';
        $n = 0;
        foreach ($gen($file) as $line) {

            if ('' !== $line) {
                if (0 === $n) {
                    $item['id'] = $line;
                    $n++;
                }
                elseif (1 === $n) {
                    $p = explode('-->', $line);
                    $start = str_replace(',', '.', trim($p[0]));
                    $end = str_replace(',', '.', trim($p[1]));
                    $startTime = self::toMilliSeconds(str_replace('.', ':', $start));
                    $endTime = self::toMilliSeconds(str_replace('.', ':', $end));
                    $item['start'] = $startTime / 1000;
                    $item['end'] = $endTime / 1000;
                    $item['startString'] = $start;
                    $item['endString'] = $end;
                    $item['duration'] = $endTime - $startTime;
                    $n++;
                }
                else {
                    if ($n >= 2) {
                        if ('' !== $text) {
                            $text .= PHP_EOL;
                        }
                        $text .= $line;
                    }
                }
            }
            else {
                if (0 !== $n) {
                    $item['text'] = $text;
                    $ret[] = $item;
                    $text = '';
                    $n = 0;
                }
            }
            $c++;
        }
        return $ret;
    }


    private static function toMilliSeconds(string $duration): int
    {
        $p = explode(':', $duration);
        return (int)$p[0] * 3600000 + (int)$p[1] * 60000 + (int)$p[2] * 1000 + (int)$p[3];
    }


}

您可以使用此项目：
示例代码：
<?php
require_once __DIR__.'/../vendor/autoload.php';

use Captioning\Format\SubripFile;

try {
    $file = new SubripFile('your_file.srt');

    foreach ($file->getCues() as $line) {
        echo 'start: ' . $line->getStart() . "<br />\n";
        echo 'stop: ' . $line->getStop() . "<br />\n";
        echo 'startMS: ' . $line->getStartMS() . "<br />\n";
        echo 'stopMS: ' . $line->getStopMS() . "<br />\n";
        echo 'text: ' . $line->getText() . "<br />\n";
        echo "=====================<br />\n";
    }

} catch(Exception $e) {
    echo "Error: ".$e->getMessage()."\n";
}

可以使用php换行符来完成。
我可以成功地做到
让我看看我的代码
$srt=preg_split("/\\r\\n\\r\\n/",trim($movie->SRT));
            $result[$i]['IMDBID']=$movie->IMDBID;
            $result[$i]['TMDBID']=$movie->TMDBID;

这里$movie->SRT是在这个问题中发布格式u的副标题。
正如我们看到的，每个时空是两条新的线，
希望你得到答案。简单、自然、琐碎的解决方案
srt SUB如下所示，由两条换行线分隔：
3
00:00:07,350 --> 00:00:09,780
The ability to destroy a planet is
nothing next to the power of the force

显然，您希望使用Java中已经存在的dateFormat.parse
解析时间，因此它是即时的
class Sub {
    float start;
    String text;

    Sub(String block) {
        this.start = null; this.text = null;
        String[] lines = block.split("\n");
        if (lines.length < 3) { return; }

        String timey = lines[1].replaceAll(" .+$", "");
        try {
            DateFormat dateFormat = new SimpleDateFormat("HH:mm:ss,SSS");
            Date zero = dateFormat.parse("00:00:00,000");
            Date date = dateFormat.parse(timey);
            this.start = (float)(date.getTime() - zero.getTime()) / 1000f;
        } catch (ParseException e) {
            e.printStackTrace();
        }

        this.text = TextUtils.join(" ", Arrays.copyOfRange(lines, 2, lines.length) );
    }
}

class子类{
浮动启动；
字符串文本；
子（串块）{
this.start=null；this.text=null；
String[]line=block.split（“\n”）；
if（lines.length<3）{return；}
字符串timey=行[1].replaceAll（“.+$”，”）；
试一试{
DateFormat DateFormat=新的SimpleDateFormat（“HH:mm:ss，SSS”）；
datezero=dateFormat.parse（“00:00:00000”）；
Date-Date=dateFormat.parse（timey）；
this.start=（float）（date.getTime（）-zero.getTime（））/1000f；
}捕获（解析异常）{
e、 printStackTrace（）；
}
this.text=TextUtils.join（“，Arrays.copyOfRange（lines，2，lines.length））；
}
}

显然，要得到文件中的所有sub
    List<Sub> subs = new ArrayList<>();
    String[] tt = fileText.split("\n\n");
    for (String s:tt) { subs.add(new Sub(s)); }

List subs=new ArrayList（）；
字符串[]tt=fileText.split（“\n\n”）；
for（String s:tt）{subs.add（new Sub（s））；}
似乎web上已经有一些关于这个问题的资料，比如库，您可能希望避免重新发明轮子。谷歌“解析srt文件php”，如果您有疑问；）。我在谷歌上搜索了很多次。没有好结果。有些结果可以工作，但不能打印所有字幕并显示array（）内容。顺便说一句，这会创建一个新的srt文件等。这并不是我想做的吗？我必须用php打印srt文件的全部内容，与srt文件中的内容完全相同。但在这样做的时候，我必须有一个循环和3个变量来做一些工作。我链接的库将srt文件转换为对象，因此从这个起点开始，你可以对这些对象做任何你想做的事情，主页上的代码只是你可以做的事情的一个示例。现在做起来非常简单。我为2021年写了一个答案，唯一不起作用的原因是SubRip格式说你可以有一行或多行文本，以空行结尾。谢谢你的信息-从OP的示例中我不知道。顺便说一句，我注意到了一些东西。它无法捕捉某些字幕的ID、文本和时间。为什么会这样？我喜欢这个解决方案，比我的要短得多。我想你可能会很棘手，检查最后一个条目是否不是空白行，你可以在循环中从数组中弹出一个元素，直到你击中空行，然后重新调整你的副标题，这是非常好的。但表现在阵列外观上。我试过你的上一个代码，它只显示了第一个字幕。一切都好。但是，如何像srt文件的视图一样打印所有字幕？我是说一行一行。没有其他