Warning: file_get_contents(/data/phpspider/zhask/data//catemap/1/php/241.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
PHP-按顺序填充数组,直到达到最大长度_Php_Arrays_Regex_Loops_Sequential - Fatal编程技术网

PHP-按顺序填充数组,直到达到最大长度

PHP-按顺序填充数组,直到达到最大长度,php,arrays,regex,loops,sequential,Php,Arrays,Regex,Loops,Sequential,我需要使用PHP从以这种方式格式化的文本文件中提取数据: BEGIN #1 #2 #3 #4 #5 #6 1 2015-05-31 2001-11-24 'Name Surname' ID_1 0 2 2011-04-01 ? ? ID_2 1 2 2013-02-24 ? ? ID_3

我需要使用PHP从以这种方式格式化的文本文件中提取数据:

BEGIN
#1 
#2 
#3 
#4 
#5 
#6 
1       2015-05-31  2001-11-24  'Name Surname'      ID_1        0 
2       2011-04-01  ?           ?                   ID_2        1 
2       2013-02-24  ?           ?                   ID_3        1 
2       2014-02-28  ?           'Name Surname'      ID_4        2 
END
信息按如下数组逻辑组织:

Array ( [#1] => 1 [#2] => 2015-05-31 [#3] => 2001-11-24 [#4] => 'Name Surname' [#5] => ID_1 [#6] => 0 )
Array ( [#1] => 2 [#2] => 2011-04-01 [#3] => ?           [#4] => ?             [#5] => ID_2 [#6] => 1 )
Array ( [#1] => 2 [#2] => 2013-02-24 [#3] => ?           [#4] => ?             [#5] => ID_3 [#6] => 1 )
Array ( [#1] => 2 [#2] => 2014-02-28 [#3] => ?           [#4] => 'Name Surname' [#5] => ID_4 [#6] => 2 )
;This is some text
in multiline with "double 
quotes" too
;
我正在寻找一种方法来获得这个结果。我正在使用以下代码:

<?php 
    //ini_set('max_execution_time', 300); //300 seconds = 5 minutes

    function startsWith($str, $char){
        return $str[0] === $char;
    }

    $txt_path = "./test.txt";
    $txt_data = @file_get_contents($txt_path) or die("Could not access file: $txt_path");
    //echo $txt_data;

    $loop_pattern = "/BEGIN(.*?)END/s";
    preg_match_all($loop_pattern, $txt_data, $matches);
    $loops = $matches[0];
    //print_r($loops);
    $loops_count = count($loops);
    //echo $loops_count; // number of loops into the file
    foreach ($loops as $key => $value) {
        $value = trim($value);
        $pattern = array("/[[:blank:]]+/", "/BEGIN(.*)/", "/END(.*)/");
        $replacement = array(" ", "", "");
        $value = preg_replace($pattern, $replacement, $value);
        //print_r($value);
        //echo "<br><br>";
        $value_array = explode("\n", $value);
        $value_array_clean = array_filter($value_array, 'strlen');
        $value_array_clean_reindex = array_values($value_array_clean);
        //print_r($value_array_clean_reindex);
        //echo "<br><br>";
        $keys = array();
        $values = array();
        foreach ($value_array_clean_reindex as $key => $value) {
            $value = trim($value);
            if ( startsWith($value, "#") ) {
                array_push($keys, $value);
                $keys_count = count($keys);
            } else {
                array_push($values, $value);
                $values_count = count($values);

                $loop_dic = array();
                foreach ($values as $key => $value) {
                    $value = trim($value);
                    preg_match_all("/'(?:.|[^'])*'|\S+/", $value, $matches);
                    //print_r($matches[0]);
                    $loop_dic = array_combine($keys, $matches[0]);
                }

                print_r($loop_dic);
                echo "<br><br>";
            }
        }
    }
?>
但有时在指挥层会出现问题:

$loop_dic = array_combine($keys, $matches[0]);
$loop_pattern = "/BEGIN(.*?)END/s";
preg_match_all($loop_pattern, $txt_data, $matches);
$loops = $matches[0];
//print_r($loops);
$loops_count = count($loops);
//echo $loops_count; // number of loops into the file
我了解到,在原始文本文件中,有很长的行,这些行被打断,生成新行;而不是:

2       2014-02-28  ?           'Name Surname'      ID_4        2 
这条线是这样断的:

2       2014-02-28  ?           'Name Surname'      
ID_4        2 
因此,当我将字符串分解为
\n
时,两个数组的长度出现了一个错误,然后我将其合并

我想问您一个解决这个问题的替代方案,即获得长度相等的数组,如果原始文件中出现中断也可以

在网上搜索,我发现;也许,如果我知道(通过
count
)每个循环的数组中的键数([#1],…,[#6]),就可以循环并填充数组中的值,按顺序添加它们,直到每个数组中的值的最大长度为止

感谢您的关注和帮助

编辑#1

感谢@fusion3k的解决方案! 通过检查一些输入文件的行为,可以发现另外两个问题:

1)分析一些错误,我发现有时输入文件使用双引号(而不是单引号),分号之间也有多行文本块,如下所示:

Array ( [#1] => 1 [#2] => 2015-05-31 [#3] => 2001-11-24 [#4] => 'Name Surname' [#5] => ID_1 [#6] => 0 )
Array ( [#1] => 2 [#2] => 2011-04-01 [#3] => ?           [#4] => ?             [#5] => ID_2 [#6] => 1 )
Array ( [#1] => 2 [#2] => 2013-02-24 [#3] => ?           [#4] => ?             [#5] => ID_3 [#6] => 1 )
Array ( [#1] => 2 [#2] => 2014-02-28 [#3] => ?           [#4] => 'Name Surname' [#5] => ID_4 [#6] => 2 )
;This is some text
in multiline with "double 
quotes" too
;
需要将其视为给定键的单个值,该值需要内联,如@fusion3k code do,将
\n
替换为
(空格)。我正在尝试将@fusion3k的工作代码与为解决此行为而精心设计的代码合并。文件结构可以如下所示:

BEGIN
#1 
#2 
#3 
#4 
#5 
#6 
1       2015-05-31  2001-11-24  "Name Surname"      ID_1        0 
2       2011-04-01  ?           ?                   ID_2        1 
2       2013-02-24  ?           ?                   ID_3        1 
2       2014-02-28  ?           "Name Surname"      ID_4        2 
;This is some text
in multiline with "double 
quotes" too
;
2016-01-22  ?           "Name Surname"      ID_5        2 
END
if ( preg_match('/;(.*?);|\'(.*?)\'/', $value, $matches) ) {// semicolon with single quotes in the $value string
    $value = str_replace( "\n", " ", $value );
    $origin = array("/[[:blank:]]+/", "/'/", "/;/");
    $replacement = array(" ", "' ", "; ");
    $value = preg_replace($origin, $replacement, $value);
    $pattern = '/'.str_repeat( "([;'])\s+", count( $keys ) ).'/';
    print_r(array_filter(preg_split( $pattern, $value ), 'strlen')); // I would have an array of values of the same length of the array for the keys
    echo "<br><br>";
} elseif ( preg_match('/;(.*?);|"(.*?)"/', $value, $matches) ) {// semicolon with double quotes in the $value string
    $value = str_replace( "\n", " ", $value );
    $origin = array("/[[:blank:]]+/", "/\"/", "/;/");
    $replacement = array(" ", "\" ", "; ");
    $value = preg_replace($origin, $replacement, $value);
    $pattern = '/'.str_repeat( "([;\"])\s+", count( $keys ) ).'/';
    print_r(array_filter(preg_split( $pattern, $value ), 'strlen')); // I would have an array of values of the same length of the array for the keys
    echo "<br><br>";
} else {// neither single quotes (or double quotes) nor semicolon in the $value string
    $pattern = '/'.str_repeat( "(\S+)\s+", count( $keys ) ).'/';
    preg_match_all( $pattern, $value, $matches );
    //print_r($matches);
    //echo "<br><br>";
    $loop_dic = array_combine( $keys, array_slice( $matches, 1 ) );
    print_r( $loop_dic ); // this is good...maybe in a better way?
    echo "<br><br>";
}
它应该生成类似于上述工作代码的内容,但考虑到存在不同的文本块定界符,如分号(
)、单引号(
)或双引号(
)等,以分隔必须视为键的单个值的文本块,如与上述文本文件内容相关的此数组:

Array ( [#1] => Array ( [0] => 1 [1] => 2 [2] => 2 [3] => 2 [4] => This is some text in multiline with "double quotes" too ) [#2] => Array ( [0] => 2015-05-31 [1] => 2011-04-01 [2] => 2013-02-24 [3] => 2014-02-28 [4] => 2016-01-22 ) [#3] => Array ( [0] => 2001-11-24 [1] => ? [2] => ? [3] => ? [4] => ? ) [#4] => Array ( [0] => Name Surname [1] => ? [2] => ? [3] => Name Surname [4] => Name Surname ) [#5] => Array ( [0] => ID_1 [1] => ID_2 [2] => ID_3 [3] => ID_4 [4] => ID_5 ) [#6] => Array ( [0] => 0 [1] => 1 [2] => 1 [3] => 2 [4] => 2 ) )
我研究了一个简单的字符串,以找到一个考虑(分号)和(单引号或双引号)的“工作”正则表达式.目前我还没有找到使用所有三种定界符来分隔文本块的文件,但似乎可以找到分号+单引号或分号+双引号或仅单引号或仅双引号;最好在同一文本文件中找到所有三种定界符的解决方案…:

$string = 'something here 
;and there
;
oh, "that\'s all!"';
$string = str_replace( "\n", " ", $string );
$origin = array("/[[:blank:]]+/", "/\"/", "/;/");
$replacement = array(" ", "\" ", "; ");
$string = preg_replace($origin, $replacement, $string);
$pattern = '/([;"])\s+/';
print_r(array_filter(preg_split( $pattern, $string ), 'strlen'));
这是输出(根据需要):

请注意分号之间的文本块:它总是在新行中开始,以分号开头,在新行中以分号结尾,然后开始另一个新行

我不知道它是否能以更好、最快的方式编写……然后我尝试将它与@fusion3k的代码合并,处理上述文本文件内容,但没有成功。我尝试了一种类似以下的
if/elseif/else
构造:

BEGIN
#1 
#2 
#3 
#4 
#5 
#6 
1       2015-05-31  2001-11-24  "Name Surname"      ID_1        0 
2       2011-04-01  ?           ?                   ID_2        1 
2       2013-02-24  ?           ?                   ID_3        1 
2       2014-02-28  ?           "Name Surname"      ID_4        2 
;This is some text
in multiline with "double 
quotes" too
;
2016-01-22  ?           "Name Surname"      ID_5        2 
END
if ( preg_match('/;(.*?);|\'(.*?)\'/', $value, $matches) ) {// semicolon with single quotes in the $value string
    $value = str_replace( "\n", " ", $value );
    $origin = array("/[[:blank:]]+/", "/'/", "/;/");
    $replacement = array(" ", "' ", "; ");
    $value = preg_replace($origin, $replacement, $value);
    $pattern = '/'.str_repeat( "([;'])\s+", count( $keys ) ).'/';
    print_r(array_filter(preg_split( $pattern, $value ), 'strlen')); // I would have an array of values of the same length of the array for the keys
    echo "<br><br>";
} elseif ( preg_match('/;(.*?);|"(.*?)"/', $value, $matches) ) {// semicolon with double quotes in the $value string
    $value = str_replace( "\n", " ", $value );
    $origin = array("/[[:blank:]]+/", "/\"/", "/;/");
    $replacement = array(" ", "\" ", "; ");
    $value = preg_replace($origin, $replacement, $value);
    $pattern = '/'.str_repeat( "([;\"])\s+", count( $keys ) ).'/';
    print_r(array_filter(preg_split( $pattern, $value ), 'strlen')); // I would have an array of values of the same length of the array for the keys
    echo "<br><br>";
} else {// neither single quotes (or double quotes) nor semicolon in the $value string
    $pattern = '/'.str_repeat( "(\S+)\s+", count( $keys ) ).'/';
    preg_match_all( $pattern, $value, $matches );
    //print_r($matches);
    //echo "<br><br>";
    $loop_dic = array_combine( $keys, array_slice( $matches, 1 ) );
    print_r( $loop_dic ); // this is good...maybe in a better way?
    echo "<br><br>";
}
不接受文件(大文件)中的所有循环。 我想答案可能是这样。所以,设置:

ini_set('max_execution_time', 300); // 300 seconds = 5 minutes
ini_set("pcre.backtrack_limit", "100000000"); // default 100k = "100000"
似乎可以解决这个问题,但我不知道这是否是唯一的方法:事实上,如果文件很大(17MB或更大),在页面加载完成之前,浏览器会有一点不响应时间(我在Firefox上测试最新版本)……将整个文件分块解析到其完整大小可能会很好,但如何做到呢


非常感谢您的关注和帮助

为了解决您的问题,常用的方法是对检索到的匹配进行计数,如果它们少于键数,则继续循环,而无需重新初始化
$loop\u dic

我建议您使用一种反向方法:在检索值之前,不要逐行分解字符串,而是用空格替换换行符:您的字符串结构足够坚固,可以使用这种方法,并且您知道字段编号,因此这种方法应该有效

main
foreach
循环之外的代码不会更改。同样,检索由
BEGIN…END
包装的文本的代码也不会更改:

foreach( $loops as $key => $value ) 
{
    $value = trim( $value );
    $pattern = array( "/[[:blank:]]+/", "/BEGIN(.*)/", "/END(.*)/" );
    $replacement = array( " ", "", "" );
    $value = preg_replace( $pattern, $replacement, $value );
要检索密钥,我们使用
preg\u match\u all()
,然后使用
preg\u replace()
删除相关行:

现在,在
$value
中,我们只有数据行。我们用空格替换所有新行:

    $value = str_replace( "\n", " ", $value );
然后,我们通过重复关键字编号的字段模式来构造行模式,并通过
preg\u match\u all()
检索所有行:

最后,我们使用
array\u slice()
删除全局匹配,并将其与
$keys
组合,得到了预期的结果。可以关闭
foreach
循环:

    $values = array_combine( $keys, array_slice( $matches, 1 ) );
}

我的
$values
和你的
$loop\u dic
之间的主要区别在于,在
$values
主数组中,你有列,但是如果你喜欢按行数组,你可以很容易地变换它


我已经用许多不同的“断线”测试了代码,它是有效的。我建议您仔细地用不同的字符串测试它,看看它在任何情况下是否正常工作。

如果断线被
ID\u
打破,您可以预处理字符串:
$txt\u data=str\u replace(“\nID\u”,“ID\u”,“$txt\u data”)
谢谢@fusion3k。不,行中的中断级别没有规则。我只知道问题是出于这个原因,所以
\n
explode
命令失败,生成不同长度的数组太棒了!!!我理解了你的方法:我明天会用一些输入文件来检查它一切!非常感谢您的时间!!RegardsI迫不及待:我在不同的文件上测试了它,它看起来非常完美,速度非常快!!!对于一个170925行17569148个字符的文件,它需要0.2635