使用PHP删除.txt文件中的重复行_Php_Unix_For Loop_Fopen

使用PHP删除.txt文件中的重复行

php unix for-loop

使用PHP删除.txt文件中的重复行,php,unix,for-loop,fopen,Php,Unix,For Loop,Fopen,我有多个带有目录的txt文件。文本文件都包含相同的标题。我正在读取所有的txt文件，并将其全部输出到一个文件中因为每个单独的文件都包含相同的头，所以它会将它们全部插入到新的合并文件中。如何删除新合并文件中的所有标题并将其中一个保留在顶部我一直在研究unix中的sort命令 sort filename | uniq 此命令起作用，但删除所有其他重复数据。是否只删除特定字符串“thisaheader”而将其保留在顶部当前代码 $header = array( "XX-XXXXXXXXX-XX

我有多个带有目录的txt文件。文本文件都包含相同的标题。我正在读取所有的txt文件，并将其全部输出到一个文件中

因为每个单独的文件都包含相同的头，所以它会将它们全部插入到新的合并文件中。如何删除新合并文件中的所有标题并将其中一个保留在顶部

我一直在研究unix中的sort命令

sort filename | uniq

此命令起作用，但删除所有其他重复数据。是否只删除特定字符串“thisaheader”而将其保留在顶部

当前代码

$header = array( "XX-XXXXXXXXX-XXXXXXX-X        XXXXXXXXXXXX" );


$files = glob( "/path/to/folder/*.txt" );

$output_file = "newfile_".date( "YmdHis" ).".txt";

$out = fopen( $output_file, "w" );

foreach( $header as $inputHeader ) {

    fwrite( $out, $inputHeader );
}

    foreach( $files as $file ) {

        $in = fopen( $file, "r" );

            while ( $line = fgets( $in ) ) {

                if( $header !== $line ) {

                    fwrite( $out, $line );

                }

            }

        fclose( $in );

     }

fclose( $out );

//the headers that were in the file with duplicates
$header1 = "DD-LLDRHD045-UHSTAYL-MR        LOCKFMDLA111;
$header2 = "DD-LLDRHD045-UHSTAYL-MR        LOCKFMDLA222";
$header3 = "DD-LLDRHD045-UHSTAYL-MR        LOCKFMDLA333";
$header4 = "DD-LLDRHD045-UHSTAYL-MR        LOCKFMDLA444";

//get all the files to be merged
$files = glob( "/PATH/TO/FILES/*.txt" );

//set the output filename
$output_file = "NewFile".date( "YmdHis" ).".txt";

//open the output file
$out = fopen( $output_file, "w" );

    //loop through the files to be merged
    foreach( $files as $file ) {

        //open each file
        $in = fopen( $file, "r" );

            //while each line in each file
            while ( $line = fgets( $in ) ) {

                //if the current line is not equal to header1, header2, header3 or header4
                if( preg_replace('/\s+/', '', $line ) !=
                    preg_replace('/\s+/', '', $header1 )&&
                    preg_replace('/\s+/', '', $line ) !=
                    preg_replace('/\s+/', '', $header2 )&&
                    preg_replace('/\s+/', '', $line ) !=
                    preg_replace('/\s+/', '', $header3 )&&
                    preg_replace('/\s+/', '', $line ) !=
                    preg_replace('/\s+/', '', $header4 ) ) {

                       //write that line to the output file
                       fwrite( $out, $line );

                       //echo $line."\n";

                }else{
                       //write blank line to the file
                       fwrite( $out, "\n" );

                  }

            }
        //close the file
        fclose( $in );

     }
//close the output file
fclose( $out );

//get the contents of the output file
$header1 .= file_get_contents( $output_file );

//add the header to the top of the output file
file_put_contents( $output_file, $header1 );

重复多次的行

试着在开始书写时输入标题，然后在阅读行时检查标题

//cache our header lines
$header = "Header line";

$files = glob( "/path/to/files*.txt" );

//print_r($files);

$output_file = "newfile".date( "YmdHis" ).".txt";

$out = fopen( $output_file, "w" );

//input the header line at the top of our new file

fwrite( $out, $header);



foreach( $files as $file ) {

    $in = fopen( $file, "r" );

        while ( $line = fgets( $in ) ) {
            //header check, dont output header lines to new file
            if($header !== preg_replace('/\s+/', '', $line)){
                 fwrite( $out, $line );
            }
        }

    fclose( $in );
}

fclose( $out );

创建新文件后，添加此行将删除重复的行

$lines = array_unique(file("your_file.txt"));

如果文件只有1个头

$header_exist = false;

foreach($files as $file) {

  $in = fopen($file, "r");

  while($line = fgets($in)) {
    if(strpos($line, "This is a header") === false) {
      fwrite($out, $line);
    }
    else {
      if($header_exist === false) {
        $header_exist = true;
        fwrite($out, $line);
      }
    }
  }
  fclose($in);
}

所以我在@WillParky93的帮助下解决了这个问题。我在文件中有4个不同的头，它们都是重复的。在使用逻辑运算符之后

最终代码

$header = array( "XX-XXXXXXXXX-XXXXXXX-X        XXXXXXXXXXXX" );


$files = glob( "/path/to/folder/*.txt" );

$output_file = "newfile_".date( "YmdHis" ).".txt";

$out = fopen( $output_file, "w" );

foreach( $header as $inputHeader ) {

    fwrite( $out, $inputHeader );
}

    foreach( $files as $file ) {

        $in = fopen( $file, "r" );

            while ( $line = fgets( $in ) ) {

                if( $header !== $line ) {

                    fwrite( $out, $line );

                }

            }

        fclose( $in );

     }

fclose( $out );

//the headers that were in the file with duplicates
$header1 = "DD-LLDRHD045-UHSTAYL-MR        LOCKFMDLA111;
$header2 = "DD-LLDRHD045-UHSTAYL-MR        LOCKFMDLA222";
$header3 = "DD-LLDRHD045-UHSTAYL-MR        LOCKFMDLA333";
$header4 = "DD-LLDRHD045-UHSTAYL-MR        LOCKFMDLA444";

//get all the files to be merged
$files = glob( "/PATH/TO/FILES/*.txt" );

//set the output filename
$output_file = "NewFile".date( "YmdHis" ).".txt";

//open the output file
$out = fopen( $output_file, "w" );

    //loop through the files to be merged
    foreach( $files as $file ) {

        //open each file
        $in = fopen( $file, "r" );

            //while each line in each file
            while ( $line = fgets( $in ) ) {

                //if the current line is not equal to header1, header2, header3 or header4
                if( preg_replace('/\s+/', '', $line ) !=
                    preg_replace('/\s+/', '', $header1 )&&
                    preg_replace('/\s+/', '', $line ) !=
                    preg_replace('/\s+/', '', $header2 )&&
                    preg_replace('/\s+/', '', $line ) !=
                    preg_replace('/\s+/', '', $header3 )&&
                    preg_replace('/\s+/', '', $line ) !=
                    preg_replace('/\s+/', '', $header4 ) ) {

                       //write that line to the output file
                       fwrite( $out, $line );

                       //echo $line."\n";

                }else{
                       //write blank line to the file
                       fwrite( $out, "\n" );

                  }

            }
        //close the file
        fclose( $in );

     }
//close the output file
fclose( $out );

//get the contents of the output file
$header1 .= file_get_contents( $output_file );

//add the header to the top of the output file
file_put_contents( $output_file, $header1 );

“这个命令可以工作，但会删除所有其他重复的数据”-确切地说，这就是该命令应该做的。“是否只删除特定字符串“This a header”但将其保留在顶部？”当然，您回答了自己的问题，

uniq

也会这样做：尝试此操作后，它会停止输出文件。我做错什么了吗？运行

error\u reporting（E\u ALL）的输出是什么；ini设置（“显示错误”，1）在页面顶部？它不会给出任何错误。我将$header标记从您输入的更改为我要查找的字符串。对吗？$header仍然是数组吗？你能贴上你新的$header标签吗？（如果必须，请忽略个人数据）是的，标题标记仍然是一个数组。但是，我注意到，当我运行脚本时，它在某个地方陷入了一个循环中。在fclose之前和foreach语句之后很高兴看到您已经解决了问题！现在我们应该重新考虑使用数组，因为我们知道我们的页眉没有对齐，这将有助于提高可读性，并为您添加/删除标题中的灵活性提供更多的帮助。future@WillParky93我会那样做的。谢谢你的帮助。