使用PHP删除.txt文件中的重复行
我有多个带有目录的txt文件。文本文件都包含相同的标题。我正在读取所有的txt文件,并将其全部输出到一个文件中 因为每个单独的文件都包含相同的头,所以它会将它们全部插入到新的合并文件中。如何删除新合并文件中的所有标题并将其中一个保留在顶部 我一直在研究unix中的sort命令使用PHP删除.txt文件中的重复行,php,unix,for-loop,fopen,Php,Unix,For Loop,Fopen,我有多个带有目录的txt文件。文本文件都包含相同的标题。我正在读取所有的txt文件,并将其全部输出到一个文件中 因为每个单独的文件都包含相同的头,所以它会将它们全部插入到新的合并文件中。如何删除新合并文件中的所有标题并将其中一个保留在顶部 我一直在研究unix中的sort命令 sort filename | uniq 此命令起作用,但删除所有其他重复数据。是否只删除特定字符串“thisaheader”而将其保留在顶部 当前代码 $header = array( "XX-XXXXXXXXX-XX
sort filename | uniq
此命令起作用,但删除所有其他重复数据。是否只删除特定字符串“thisaheader”而将其保留在顶部
当前代码
$header = array( "XX-XXXXXXXXX-XXXXXXX-X XXXXXXXXXXXX" );
$files = glob( "/path/to/folder/*.txt" );
$output_file = "newfile_".date( "YmdHis" ).".txt";
$out = fopen( $output_file, "w" );
foreach( $header as $inputHeader ) {
fwrite( $out, $inputHeader );
}
foreach( $files as $file ) {
$in = fopen( $file, "r" );
while ( $line = fgets( $in ) ) {
if( $header !== $line ) {
fwrite( $out, $line );
}
}
fclose( $in );
}
fclose( $out );
//the headers that were in the file with duplicates
$header1 = "DD-LLDRHD045-UHSTAYL-MR LOCKFMDLA111;
$header2 = "DD-LLDRHD045-UHSTAYL-MR LOCKFMDLA222";
$header3 = "DD-LLDRHD045-UHSTAYL-MR LOCKFMDLA333";
$header4 = "DD-LLDRHD045-UHSTAYL-MR LOCKFMDLA444";
//get all the files to be merged
$files = glob( "/PATH/TO/FILES/*.txt" );
//set the output filename
$output_file = "NewFile".date( "YmdHis" ).".txt";
//open the output file
$out = fopen( $output_file, "w" );
//loop through the files to be merged
foreach( $files as $file ) {
//open each file
$in = fopen( $file, "r" );
//while each line in each file
while ( $line = fgets( $in ) ) {
//if the current line is not equal to header1, header2, header3 or header4
if( preg_replace('/\s+/', '', $line ) !=
preg_replace('/\s+/', '', $header1 )&&
preg_replace('/\s+/', '', $line ) !=
preg_replace('/\s+/', '', $header2 )&&
preg_replace('/\s+/', '', $line ) !=
preg_replace('/\s+/', '', $header3 )&&
preg_replace('/\s+/', '', $line ) !=
preg_replace('/\s+/', '', $header4 ) ) {
//write that line to the output file
fwrite( $out, $line );
//echo $line."\n";
}else{
//write blank line to the file
fwrite( $out, "\n" );
}
}
//close the file
fclose( $in );
}
//close the output file
fclose( $out );
//get the contents of the output file
$header1 .= file_get_contents( $output_file );
//add the header to the top of the output file
file_put_contents( $output_file, $header1 );
重复多次的行
试着在开始书写时输入标题,然后在阅读行时检查标题
//cache our header lines
$header = "Header line";
$files = glob( "/path/to/files*.txt" );
//print_r($files);
$output_file = "newfile".date( "YmdHis" ).".txt";
$out = fopen( $output_file, "w" );
//input the header line at the top of our new file
fwrite( $out, $header);
foreach( $files as $file ) {
$in = fopen( $file, "r" );
while ( $line = fgets( $in ) ) {
//header check, dont output header lines to new file
if($header !== preg_replace('/\s+/', '', $line)){
fwrite( $out, $line );
}
}
fclose( $in );
}
fclose( $out );
创建新文件后,添加此行将删除重复的行
$lines = array_unique(file("your_file.txt"));
如果文件只有1个头
$header_exist = false;
foreach($files as $file) {
$in = fopen($file, "r");
while($line = fgets($in)) {
if(strpos($line, "This is a header") === false) {
fwrite($out, $line);
}
else {
if($header_exist === false) {
$header_exist = true;
fwrite($out, $line);
}
}
}
fclose($in);
}
所以我在@WillParky93的帮助下解决了这个问题。我在文件中有4个不同的头,它们都是重复的。在使用逻辑运算符之后 最终代码
$header = array( "XX-XXXXXXXXX-XXXXXXX-X XXXXXXXXXXXX" );
$files = glob( "/path/to/folder/*.txt" );
$output_file = "newfile_".date( "YmdHis" ).".txt";
$out = fopen( $output_file, "w" );
foreach( $header as $inputHeader ) {
fwrite( $out, $inputHeader );
}
foreach( $files as $file ) {
$in = fopen( $file, "r" );
while ( $line = fgets( $in ) ) {
if( $header !== $line ) {
fwrite( $out, $line );
}
}
fclose( $in );
}
fclose( $out );
//the headers that were in the file with duplicates
$header1 = "DD-LLDRHD045-UHSTAYL-MR LOCKFMDLA111;
$header2 = "DD-LLDRHD045-UHSTAYL-MR LOCKFMDLA222";
$header3 = "DD-LLDRHD045-UHSTAYL-MR LOCKFMDLA333";
$header4 = "DD-LLDRHD045-UHSTAYL-MR LOCKFMDLA444";
//get all the files to be merged
$files = glob( "/PATH/TO/FILES/*.txt" );
//set the output filename
$output_file = "NewFile".date( "YmdHis" ).".txt";
//open the output file
$out = fopen( $output_file, "w" );
//loop through the files to be merged
foreach( $files as $file ) {
//open each file
$in = fopen( $file, "r" );
//while each line in each file
while ( $line = fgets( $in ) ) {
//if the current line is not equal to header1, header2, header3 or header4
if( preg_replace('/\s+/', '', $line ) !=
preg_replace('/\s+/', '', $header1 )&&
preg_replace('/\s+/', '', $line ) !=
preg_replace('/\s+/', '', $header2 )&&
preg_replace('/\s+/', '', $line ) !=
preg_replace('/\s+/', '', $header3 )&&
preg_replace('/\s+/', '', $line ) !=
preg_replace('/\s+/', '', $header4 ) ) {
//write that line to the output file
fwrite( $out, $line );
//echo $line."\n";
}else{
//write blank line to the file
fwrite( $out, "\n" );
}
}
//close the file
fclose( $in );
}
//close the output file
fclose( $out );
//get the contents of the output file
$header1 .= file_get_contents( $output_file );
//add the header to the top of the output file
file_put_contents( $output_file, $header1 );
“这个命令可以工作,但会删除所有其他重复的数据”-确切地说,这就是该命令应该做的。“是否只删除特定字符串“This a header”但将其保留在顶部?”当然,您回答了自己的问题,
uniq
也会这样做:尝试此操作后,它会停止输出文件。我做错什么了吗?运行error\u reporting(E\u ALL)的输出是什么;ini设置(“显示错误”,1)代码>在页面顶部?它不会给出任何错误。我将$header标记从您输入的更改为我要查找的字符串。对吗?$header仍然是数组吗?你能贴上你新的$header标签吗?(如果必须,请忽略个人数据)是的,标题标记仍然是一个数组。但是,我注意到,当我运行脚本时,它在某个地方陷入了一个循环中。在fclose之前和foreach语句之后很高兴看到您已经解决了问题!现在我们应该重新考虑使用数组,因为我们知道我们的页眉没有对齐,这将有助于提高可读性,并为您添加/删除标题中的灵活性提供更多的帮助。future@WillParky93我会那样做的。谢谢你的帮助。