File 在Fortran语言中,如何通过自下而上的行块从一个文件向后写入另一个文件
我有一个ASCII文件,看起来像:File 在Fortran语言中,如何通过自下而上的行块从一个文件向后写入另一个文件,file,fortran,out-of-memory,on-the-fly,File,Fortran,Out Of Memory,On The Fly,我有一个ASCII文件,看起来像: ____________________________________________ Header1 ... Header2 ... Header3 ... block(1)data1 block(1)data2 block(1)data3 block(1)data4 block(1)data5 block(1)data6 block(2)data1 block(2)data2 block(2)data3 block(2)data4 block(2)dat
____________________________________________
Header1 ...
Header2 ...
Header3 ...
block(1)data1 block(1)data2 block(1)data3
block(1)data4 block(1)data5 block(1)data6
block(2)data1 block(2)data2 block(2)data3
block(2)data4 block(2)data5 block(2)data6
...
block(n)data1 block(n)data2 block(n)data3
block(n)data4 block(n)data5 block(n)data6
____________________________________________
我想将其转换为ASCII文件,如下所示:____________________________________________
HeaderA ...
HeaderB ...
block(n)data1 block(n)data2 block(n)data3
block(n)data4 block(n)data5 block(n)data6
block(n-1)data1 block(n-1)data2 block(n-1)data3
block(n-1)data4 block(n-1)data5 block(n-1)data6
....
block(1)data1 block(1)data2 block(1)data3
block(1)data4 block(1)data5 block(1)data6
____________________________________________
数据主要是实数,数据集的大小太大,无法使用可分配数组。所以我不得不在飞行中读写
我找不到在文件中向后读写的方法。我不会直接使用Fortran,而是一系列Linux命令(或Windows上的Cygwin/GNU utils)。Fortran也是可能的(见第二种可能性) 大纲(基于操作系统命令):
- 获取行的总数(例如,
)wc
- 从文件中取出前3行(例如,使用
)到文件标题
结果文件
- 加工主体
- 只取最后3行(例如,带
尾部
)
- 通过连接相关行的
脚本运行结果awk
- 对结果运行
tac
- 运行另一个
脚本拆分行awk
- 将结果附加到
结果文件
- 只取最后3行(例如,带
- 使用每个块的起始文件位置创建一个数组(因此
的结果)ftell
- 将标题移动到新文件
- 从头到尾运行上面创建的数组
- 对指示位置执行
fseek
- 阅读相关行数,并再次写出
- 对指示位置执行
header(1)
header(2)
header(3)
block(1).data1 block(1).data2 block(1).data3
block(1).data4 block(1).data5 block(1).data6
block(2).data1 block(2).data2 block(2).data3
block(2).data4 block(2).data5 block(2).data6
...
block(9999998).data1 block(9999998).data2 block(9999998).data3
block(9999998).data4 block(9999998).data5 block(9999998).data6
block(9999999).data1 block(9999999).data2 block(9999999).data3
block(9999999).data4 block(9999999).data5 block(9999999).data6
如果文件大小为1.2GB,则可以通过以下小awk脚本反转:
#!/usr/bin/awk
# if line contains word "header", print immediately, move on to next line.
/header/ {print; next}
# move every line to memory.
{
line[n++] = $0
}
# When finished, print them out in order n-1, n, n-3, n-2, n-5, n-4, ...
END {
for (i=n-2; i>=0; i-=2) {
print(line[i])
print(line[i+1])
}
}
不到2分钟
如果这真的不可能,您需要按照@high performance mark所说的去做,并在可管理的块中读取它,在内存中反转它,然后在最后将它们连接在一起。以下是我的版本:
program reverse_order
use iso_fortran_env, only: IOSTAT_END
implicit none
integer, parameter :: max_blocks_in_memory = 10000
integer, parameter :: max_line_length=100
character(len=max_line_length) :: line
character(len=max_line_length) :: data(2, max_blocks_in_memory)
character(len=*), parameter :: INFILE='data.txt'
character(len=*), parameter :: OUTFILE='reversed_data.txt'
character(len=*), parameter :: TMP_FILE_FORMAT='("/tmp/", I10.10,".txt")'
character(len=len("/tmp/XXXXXXXXXX.txt")) :: tmp_file_name
integer :: in_unit, out_unit, tmp_unit
integer :: num_headers, i, j, tmp_file_number
integer :: ios
! Open the input and output files
open(newunit=in_unit, file=INFILE, action="READ", status='OLD')
open(newunit=out_unit, file=OUTFILE, action='WRITE', status='REPLACE')
! Transfer the headers to the output file immediately.
num_headers = 0
do
read(in_unit, '(A)') line
if (index(line, 'header') == 0) exit
num_headers = num_headers + 1
write(out_unit, '(A)') trim(line)
end do
! We've already read the first data line, so let's rewind and start anew.
rewind(in_unit)
! move past the headers.
do i = 1, num_headers
read(in_unit, *)
end do
tmp_file_number = 0
! Read the data from the input line max_blocks_in_memory blocks at a time.
read_loop : do
do i = 1, max_blocks_in_memory
read(in_unit, '(A)', iostat=ios) data(1, i)
if (ios == IOSTAT_END) then ! Reached the end of the input file.
if (i > 1) then ! Still have final values in memory, write them
! to output immediately.
do j = i-1, 1, -1
write(out_unit, '(A)') trim(data(1, j))
write(out_unit, '(A)') trim(data(2, j))
end do
end if
exit read_loop
end if
read(in_unit, '(A)') data(2, i)
end do
! Reasd a block of data, write it in reverse order into a temporary file.
tmp_file_number = tmp_file_number + 1
write(tmp_file_name, TMP_FILE_FORMAT) tmp_file_number
open(newunit=tmp_unit, file=tmp_file_name, action="WRITE", status="NEW")
do j = max_blocks_in_memory, 1, -1
write(tmp_unit, '(A)') data(1, j)
write(tmp_unit, '(A)') data(2, j)
end do
close(tmp_unit)
end do read_loop
! Finished with input file, don't need it any more.
close(unit=in_unit)
! Concatenate all the temporary files in reverse order to the output file.
do j = tmp_file_number, 1, -1
write(tmp_file_name, TMP_FILE_FORMAT) j
open(newunit=tmp_unit, file=tmp_file_name, action="READ", status="OLD")
do
read(tmp_unit, '(A)', iostat=ios) line
if (ios == IOSTAT_END) exit
write(out_unit, '(A)') trim(line)
end do
close(tmp_unit, status="DELETE") ! Done with this file, delete it after closing.
end do
close(unit=out_unit)
end program reverse_order
嗯,我有一个答案,但它不起作用,可能是由于编译器的错误或我对Fortran中文件定位的初步理解。我试图用
access='stream'
和form='formatted'
打开输入文件。这样我就可以把线的位置推到堆栈上,然后把它们弹出,这样它们就会以相反的顺序出现。然后,以相反的顺序遍历这些行,我可以将它们写入ourput文件
program readblk
implicit none
integer iunit, junit
integer i, size
character(20) line
type LLnode
integer POS
type(LLnode), pointer :: next => NULL()
end type LLnode
type(LLNODE), pointer :: list => NULL(), current => NULL()
integer POS, temp(2)
open(newunit=iunit,file='readblk.txt',status='old',access='stream',form='formatted')
open(newunit=junit,file='writeblk.txt',status='replace')
do i = 1, 3
do
read(iunit,'(a)',advance='no',EOR=10,size=size) line
write(junit,'(a)',advance='no') line
end do
10 continue
write(junit,'(a)') line(1:size)
end do
do
inquire(iunit,POS=POS)
allocate(current)
current%POS = POS
current%next => list
list => current
read(iunit,'()',end=20)
end do
20 continue
current => list
list => current%next
deallocate(current)
do while(associated(list))
temp(2) = list%POS
current => list%next
deallocate(list)
temp(1) = current%POS
list => current%next
deallocate(current)
do i = 1, 2
write(*,*) temp(i)
read(iunit,'(a)',advance='no',EOR=30,size=size,POS=temp(i)) line
write(junit,'(a)',advance='no') line
do
read(iunit,'(a)',advance='no',EOR=30,size=size) line
write(junit,'(a)',advance='no') line
end do
30 continue
write(junit,'(a)') line(1:size)
end do
end do
end program readblk
这是我的输入文件:
Header line 1
Header line 2
Header line 3
1a34567890123456789012345678901234567890
1b34567890123456789012345678901234567890
2a34567890123456789012345678901234567890
2b34567890123456789012345678901234567890
3a34567890123456789012345678901234567890
3b34567890123456789012345678901234567890
现在,使用ifort
将我的文件位置打印为
214
256
130
172
44
88
214
256
130
172
46
88
请注意,第一行位于记录3的末尾,而不是记录4的开头。输出文件为
Header line 1
Header line 2
Header line 3
3a34567890123456789012345678901234567890
3b34567890123456789012345678901234567890
2a34567890123456789012345678901234567890
2b34567890123456789012345678901234567890
1a34567890123456789012345678901234567890
使用gfortran,文件位置打印为
214
256
130
172
44
88
214
256
130
172
46
88
这一次,正如我所预料的,第一行在记录4的开头。但是,输出文件中有不正确的内容
Header line 1
Header line 2
Header line 3
3a34567890123456789012345678901234567890
3b34567890123456789012345678901234567890
2a34567890123456789012345678901234567890
2b34567890123456789012345678901234567890
3a34567890123456789012345678901234567890
3b345678901234567890123456789012341a34567890123456789012345678901234567890
我曾希望有一个更积极的结果。我不知道我的结果是否是由于糟糕的编程或编译器错误造成的,但我发布了帖子,以防其他人可能会使用我的纯Fortran解决方案。欢迎使用堆栈溢出!能否请您发布一个您尝试的示例,并具体说明您被困在哪里。您为什么将此
标记为内存不足?这个文件有多大?读取行1..n
,其中n
由工作内存的大小决定。按“相反”顺序将其写入临时文件1。重复上述步骤,直到有一堆临时文件,然后按正确的顺序连接它们。