Php 将具有特定图案的行分组为一行作为csv文本文件_Php_Csv_Parsing_Multidimensional Array_Text Parsing

Php 将具有特定图案的行分组为一行作为csv文本文件

php csv parsing

Php 将具有特定图案的行分组为一行作为csv文本文件,php,csv,parsing,multidimensional-array,text-parsing,Php,Csv,Parsing,Multidimensional Array,Text Parsing,我正在为文本数据编写解析器。我几乎完成了。。。但现在轮到php脚本在php版本为5.3.13的服务器上运行了。而且没有办法升级。所以我试着重新写剧本，但是。。。我想我把它弄坏了。这根本不起作用首先，这里是我需要解析的源文本数据： 27 may 15:28 Id: 42 #1 Random Text Info: 3 Location: Street Guests: 2 (Text header 1) Apple 15 (T

我正在为文本数据编写解析器。我几乎完成了。。。但现在轮到php脚本在php版本为5.3.13的服务器上运行了。而且没有办法升级。所以我试着重新写剧本，但是。。。我想我把它弄坏了。这根本不起作用

首先，这里是我需要解析的源文本数据：

27 may 15:28 Id: 42 #1 Random Text

Info: 3 Location: Street Guests: 2              



(Text header 1) Apple                    15
(Text header 2) Milk          2
(Text header 1) Ice cream                   4
(Text header 3) Pencil            1
(Text header 1) Box                    1
   (Text header 2) Cardboard                 x1
   (Text header 3) White                 x1
   (Text header 1) Cube              x1
(Text header 1) Phone     1
   (Text header 1) Specific text                x1
   (Text header 1) Symbian                x1

其次，这里是所需的输出，我需要的结果文本文件：

42 ; 15:28
Apple ; 15 ; NOHANDLE ; NOHANDLE
Milk ; 2 ; NOHANDLE ; NOHANDLE
Ice cream ; 4 ; NOHANDLE ; NOHANDLE
Pencil ; 1 ; NOHANDLE ; NOHANDLE
Box ; 1 ; Cardboard, White, Cube ; NOHANDLE
Phone ; 1 ; Symbian ; Specific text

NOHANDLE是必需的，因为正如您所看到的，它是一个CSV文件。为了使CSV正常工作，每行需要具有相同的列数。因此，每当没有“child”字符串时，我都必须添加NOHADLE

最后，这里是我试图以正确的方式获得工作的代码：

<?php

$data = trim(file_get_contents('inbox_file_utf8_clean.txt'));


$all_lines = preg_split("/\r?\n/", $data);
$date_id_line = array_shift($all_lines);
if(!preg_match('/^\d+\s\w+\s(?<time>\d+:\d+)\sId:\s(?<id>\d+).*/', $date_id_line, $matches)) {
  trigger_error('Failed to match ID and timestamp', E_USER_ERROR);
}
$output_data = array(
  'info' => array(
    'id' => $matches['id'],
    'time' => $matches['time']
  ),
  'data' => array()
);

$all_text_headers = array_values(preg_grep('/^\s*\(/', $all_lines));

// The first "Text header" is a parent.
// Count the number of leading whitespaces to determine other parents
preg_match('/^\x20*/', $all_text_headers[0], $leading_space_matches);
$leading_spaces = $leading_space_matches[0];
$num_leading_spaces = strlen($leading_spaces);
$parent_lead = str_repeat(' ', $num_leading_spaces) . '(';
$parent = NULL;
foreach($all_text_headers as $index => $header_line) {
  array($lead, $item_value) = explode( ") ", $header_line);
  array($topic, $topic_count) = array_map('trim',
    preg_split('/\s{2,}/', $item_value, -1, PREG_SPLIT_NO_EMPTY)
  );

  $topic_count = (int) $topic_count;

  if($is_parent = ($parent === NULL || strpos($lead, $parent_lead) === 0)) {
    $parent = $topic;
  }

  // This only goes one level deep
  if($is_parent) {
    $output_data['data'][$parent] = array(
      'values' => array(),
      'count' => $topic_count
    );
  } else {
    $output_data['data'][$parent]['values'][] = $topic;
  }
};

$csv_delimiter = ';';

$handle = fopen('output_file.csv', 'wb');

fputcsv($handle, array_values($output_data['info']), $csv_delimiter);

foreach($output_data['data'] as $key => $values) {

  $row = [
    $key,
    $values['count'],
    implode(', ', $values['values']) ?: 'NOHANDLE',
    'NOHANDLE'
  ];
  fputcsv($handle, $row, $csv_delimiter);
}

fclose($handle);

?>

你说得对，你必须使用array（）而不是[]

和错误线

array($lead, $item_value) = explode( ") ", $header_line);

必须是这样的：

list($lead, $item_value) = explode(') ', $header_line);

在下一行中，您必须使用list（）

我试图做出所有更正：

<?php

$data = trim(file_get_contents('inbox_file_utf8_clean.txt'));


$all_lines = preg_split("/\r?\n/", $data);
$date_id_line = array_shift($all_lines);
if(!preg_match('/^\d+\s\w+\s(?<time>\d+:\d+)\sId:\s(?<id>\d+).*/', $date_id_line, $matches)) {
  trigger_error('Failed to match ID and timestamp', E_USER_ERROR);
}
$output_data = array(
  'info' => array(
    'id' => $matches['id'],
    'time' => $matches['time']
  ),
  'data' => array()
);

$all_text_headers = array_values(preg_grep('/^\s*\(/', $all_lines));

// The first "Text header" is a parent.
// Count the number of leading whitespaces to determine other parents
preg_match('/^\x20*/', $all_text_headers[0], $leading_space_matches);
$leading_spaces = $leading_space_matches[0];
$num_leading_spaces = strlen($leading_spaces);
$parent_lead = str_repeat(' ', $num_leading_spaces) . '(';
$parent = NULL;
foreach($all_text_headers as $index => $header_line) {
  list($lead, $item_value) = explode(') ', $header_line);
  list($topic, $topic_count) = array_map('trim',
    preg_split('/\s{2,}/', $item_value, -1, PREG_SPLIT_NO_EMPTY)
  );

  $topic_count = (int) $topic_count;

  if($is_parent = ($parent === NULL || strpos($lead, $parent_lead) === 0)) {
    $parent = $topic;
  }

  // This only goes one level deep
  if($is_parent) {
    $output_data['data'][$parent] = array(
      'values' => array(),
      'count' => $topic_count
    );
  } else {
    $output_data['data'][$parent]['values'][] = $topic;
  }
};

$csv_delimiter = ';';

$handle = fopen('output_file.csv', 'wb');

fputcsv($handle, array_values($output_data['info']), $csv_delimiter);

foreach($output_data['data'] as $key => $values) {

  $row = array(
    $key,
    $values['count'],
    implode(', ', $values['values']) ?: 'NOHANDLE',
    'NOHANDLE'
  );
  fputcsv($handle, $row, $csv_delimiter);
}

fclose($handle);

?>

数组（$lead，$item\u value）=分解（“）”，$header\u行）这是无效的语法。嗯。。。如何纠正？欢迎来到SO！你的要求有点不清楚。您能解释一下决定子项是逗号分隔还是分号分隔的逻辑吗？看起来两个共享“文本标题1”时，它们是以分号分隔的，但我不知道在更复杂的场景中应该如何组织它们。它是有效的！谢谢但并没有为“子”行指定编号。
<?php

$data = trim(file_get_contents('inbox_file_utf8_clean.txt'));


$all_lines = preg_split("/\r?\n/", $data);
$date_id_line = array_shift($all_lines);
if(!preg_match('/^\d+\s\w+\s(?<time>\d+:\d+)\sId:\s(?<id>\d+).*/', $date_id_line, $matches)) {
  trigger_error('Failed to match ID and timestamp', E_USER_ERROR);
}
$output_data = array(
  'info' => array(
    'id' => $matches['id'],
    'time' => $matches['time']
  ),
  'data' => array()
);

$all_text_headers = array_values(preg_grep('/^\s*\(/', $all_lines));

// The first "Text header" is a parent.
// Count the number of leading whitespaces to determine other parents
preg_match('/^\x20*/', $all_text_headers[0], $leading_space_matches);
$leading_spaces = $leading_space_matches[0];
$num_leading_spaces = strlen($leading_spaces);
$parent_lead = str_repeat(' ', $num_leading_spaces) . '(';
$parent = NULL;
foreach($all_text_headers as $index => $header_line) {
  list($lead, $item_value) = explode(') ', $header_line);
  list($topic, $topic_count) = array_map('trim',
    preg_split('/\s{2,}/', $item_value, -1, PREG_SPLIT_NO_EMPTY)
  );

  $topic_count = (int) $topic_count;

  if($is_parent = ($parent === NULL || strpos($lead, $parent_lead) === 0)) {
    $parent = $topic;
  }

  // This only goes one level deep
  if($is_parent) {
    $output_data['data'][$parent] = array(
      'values' => array(),
      'count' => $topic_count
    );
  } else {
    $output_data['data'][$parent]['values'][] = $topic;
  }
};

$csv_delimiter = ';';

$handle = fopen('output_file.csv', 'wb');

fputcsv($handle, array_values($output_data['info']), $csv_delimiter);

foreach($output_data['data'] as $key => $values) {

  $row = array(
    $key,
    $values['count'],
    implode(', ', $values['values']) ?: 'NOHANDLE',
    'NOHANDLE'
  );
  fputcsv($handle, $row, $csv_delimiter);
}

fclose($handle);

?>