shell脚本中块的内存存储_Shell_Awk

shell脚本中块的内存存储

shell awk

shell脚本中块的内存存储,shell,awk,Shell,Awk,我需要一些关于以下方面的建议我写了一个脚本，下面是算法 Step 1: Get the RID's from file 1 and store that in "temp1" Step 2: With the help of "temp1" get the corresponding blocks from file 2 and accumulate it Step 3: With the help of the accumulated block, get the counts of eac

我需要一些关于以下方面的建议我写了一个脚本，下面是算法

Step 1: Get the RID's from file 1 and store that in "temp1"
Step 2: With the help of "temp1" get the corresponding blocks from file 2 and accumulate it
Step 3: With the help of the accumulated block, get the counts of each of the fields

代码

我面临的问题

我面临的问题是，由于输入输出操作，程序运行缓慢。正如您在步骤1中所看到的，我正在步骤1中生成临时文件TEMP 1，而在步骤2中，TEMP 2文件被附加了多次，因此由于这个巨大的IO操作，我的程序运行缓慢

解决方案同样的解决方案是将temp1和temp2作为内存变量

需要的建议要使temp1作为内存，我需要知道如何从终端读取输出并将其存储在数组中。。你能告诉我怎么做吗

与此类似，我需要将temp2中获得的输出存储为一个数组

你能帮我一下吗。多谢各位

SAMPLE DATA
**TEMP 1**
RID 1= 472349723478923489
RID 2= 672349723478923489
RID 3= 772349723478923489
RID 4= 872349723478923489
RID 5= 972349723478923489
RID 6= 372349723478923489

**FILE 1**
asjdghasdh23712893712983712893qwsdhaksdhask **RID 1= 472349723478923489**

**FILE 2**
Starting of block 1
time
date 
hour
parameter 1
parameter 2
parameter 3
RID 1= 472349723478923489
parameter 3
parameter 4
parameter 5
Ending of block 1

Starting of block 2
time
date 
hour
parameter 1
parameter 2
parameter 3
RID 57= 3423423423423234
parameter 3
parameter 4
parameter 5
Ending of block 2

Starting of block 3
time
date 
hour
parameter 1
parameter 2
parameter 3
RID 3= 772349723478923489
parameter 3
parameter 4
parameter 5
Ending of block 3

TEMP 2 
block 1 and block 3 from file 2 as in block 2 RID is 57 which is not present in temp 1. So this will not be contained here

这可能适合您：

awk 'NR==FNR{requestArray[$0];next};$0 in requestArray{print "Found"}' TEMP1 FILE2

说明：

在开始时读取TEMP1一次并存储在数组中。NR和FNR是awk使用的变量。每次读取记录时，它们都会递增，除非FNR在文件更改时重置

另一种方法是使用BEGIN block，请参见

现在，因为你想在没有痛苦的情况下从鼻子里吃意大利面条，我们将阅读你的想法猜猜你想做什么。我们有两种选择，一种是把意大利面磨成液体，另一种是打开你的门嘴巴

尽管如此，我还是猜测问题可能是什么 file1是一种结构化数据，它包含一个名为RID的键值对。我不知道它是什么，也不知道它做了什么，或代表了什么，但我在字里行间阅读，文件2有一个相应的条目

现在，file2也是以某种方式构造的，看起来好像有一些键值对分组了转换成某种形式的结构化块，并且有多个块

现在的问题是，我们的最终目标是什么？在两行之间读，我猜你想要整个块，当一个键值对块，即RID键，匹配某个条件。这是正确的吗

那么具体情况是什么呢？我的猜测是，你想做一些与当文件1中存在清除块时，从文件2中选择整个块

那么我们有两个选择。假设您的file2小于9000PB，并且条目数也小于什么 IEEE 52位fp可以处理、存储在内存中

echo ""| awk '
function cmd( E, A, this,v){ A[0]=0;while((E |getline v)>0)A[A[0]+=1]=v;A["RETURN_CODE"]=close(E);}
function grep( o, re, p, B, this, a,v ){
 B[0]=0;if(o~"-v"){while((getline v < p)>0){if(!match(v,re))B[B[0]+=1]=v;}return B[0];};
 if(o~"-o"){while((getline v < p)>0){a=v;while(match(a,re)){B[B[0]+=1]=substr(a,RSTART,RLENGTH);
 a=substr(a,RSTART+RLENGTH);}};return B[0];};while((getline v < p)>0){if(match(v,re))B[B[0]+=1]=v;}return B[0];
}
function dbg_printarray(ary , x , s,e, this , i ){x=(x=="")?"A":x;for(i=((s)?s:1);i<=((e)?e:ary[0]);i++){print x"["i"]=["ary[i]"]"}}
function agrep( o, re, A, B, this, a, i,k ){
 B[0]=0;k=0;if(o~"-v"){for(i=1;i<=A[0];i++){if(!match(A[i],re)) B[k+=1]=A[i];}B[0]=k;return k;};
 if(o~"-o"){for(i=1;i<=A[0];i++){a=A[i];while(match(a,re)){B[B[0]+=1]=substr(a,RSTART,RLENGTH);a=substr(a,RSTART+RLENGTH);};
 };B[0]=k;return k;};for(i=1;i<=A[0];i++){if(match(A[i],re))B[k+=1]=A[i];};B[0]=k;return k;
}
{
    SAFETY_SINCE_WE_WALK_IN_THE_DARK=2000;
    input="file1";
    lookup_file="file2";
    output="output.data";
    we_died = 0;
    # Instead of -i option to zgrep, use [Aa][Nn][Tt] way of representation. 
    PATTERN = "[Rr][Ii][Dd]=[0-9a-zA-Z]*";
    if(for_gods_sake_we_are_scanning_normal_uncompressed_content){
        grep("-o", PATTERN , input , A);
    }else{
        cmd("zgrep -o \""PATTERN"\" \""input"\" ",A);
    }
    # Now, A has matching data. A[0] holds total. A[1] to A[A[0]] holds data.

    # Lets read lookup_file, block at a time. 
    # Since you did not give any specific caracteristics of file2, we can not optimeze in any way.
    # Oh well.
    while((getline v < lookup_file)>0){
        # Throw away head until we reach a valid block header
        if(v!~"^Starting of block ") continue;
        # We are inside block.
        blockid = substr(r,match(r,"[0-9]*$"));
        # get whatever data inside block untill we reach end
        c=0;
        delete B;
        B[0]=0;
        B[B[0]+=1]=blockid;
        while(((getline v < lookup_file)>0)&&v!~"^Ending of block" && c < SAFETY_SINCE_WE_WALK_IN_THE_DARK){
            B[B[0]+=1]=v;
            if(v~"RID"){
                # store it so we can later play with it
                B["RID"]=v;
            };
            c++;# We are fucked as the structure EOB was missing.
        }
        # we ither died, or end of block.
        if(c >= SAFETY_SINCE_WE_WALK_IN_THE_DARK){
             we_died = 1;
            break;
       }
       # We assume B has whole block. B[0] has total. B[1] .. B[B[0]] has data. B["RID"] has RID for fast reference.
       # Now, since the data format of file2 is not explained at all, I am guessing
       # A[n] == "RID=DEADBEEF"
       # and
       # B["RID"] == "RID=DEADBEEF"
       # holds true, which is totally unlikely. what if it is "RID          =     \t\t\t DeadBeEf"
       # so this is really impossible to guess as the OP is not even sure what format they are using.
       #   sub("^[Rr][Ii][Dd][ \t]*=[ \t]*","",B["RID"])
       # or something should be done so we can compare the damn thing.
       matched_block = 0; matched_idx = 0;
       for(i=1;i<=A[0];i++){
         if(A[i]==B["RID"]){matched_idx = i; matched_block=1; break;}
      }
      if(matched_block){
         # This block in B[] also matches A[matched_idx];
        # Do what ever you want to do with it.
        dbg_printarray(B,"B");
        print "A["matched_idx"]=["A[matched_idx]"]";
        print "Have fun";
      }
    }
}'

我不得不说，如果你不清楚自己想要实现什么，只有一个人能够帮助你。下次，不要认为你是提出逻辑的最佳人选。一般来说，有些专家擅长创造逻辑，而我们大多数人，选择适合我们的问题。试图解决问题并不意味着你必须重新发明轮子。只需描述一下你拥有什么，以及你想要实现什么。有可能是有人干的，

很可能是整个框架和管道胶带将东西粘在一起。

或者问题在于处理问题的方式。你能给出文件1和文件2的一些行，以及预期的输出吗？我已经编辑了这个问题。这是我现在面临的问题，恐怕这并不能使问题变得更清楚。拥有样本数据将有助于理解您想要做什么。谢谢您的回复。我提供了一些样本数据。。请看一看

awk 'NR==FNR{requestArray[$0];next};$0 in requestArray{print "Found"}' TEMP1 FILE2

echo ""| awk '
function cmd( E, A, this,v){ A[0]=0;while((E |getline v)>0)A[A[0]+=1]=v;A["RETURN_CODE"]=close(E);}
function grep( o, re, p, B, this, a,v ){
 B[0]=0;if(o~"-v"){while((getline v < p)>0){if(!match(v,re))B[B[0]+=1]=v;}return B[0];};
 if(o~"-o"){while((getline v < p)>0){a=v;while(match(a,re)){B[B[0]+=1]=substr(a,RSTART,RLENGTH);
 a=substr(a,RSTART+RLENGTH);}};return B[0];};while((getline v < p)>0){if(match(v,re))B[B[0]+=1]=v;}return B[0];
}
function dbg_printarray(ary , x , s,e, this , i ){x=(x=="")?"A":x;for(i=((s)?s:1);i<=((e)?e:ary[0]);i++){print x"["i"]=["ary[i]"]"}}
function agrep( o, re, A, B, this, a, i,k ){
 B[0]=0;k=0;if(o~"-v"){for(i=1;i<=A[0];i++){if(!match(A[i],re)) B[k+=1]=A[i];}B[0]=k;return k;};
 if(o~"-o"){for(i=1;i<=A[0];i++){a=A[i];while(match(a,re)){B[B[0]+=1]=substr(a,RSTART,RLENGTH);a=substr(a,RSTART+RLENGTH);};
 };B[0]=k;return k;};for(i=1;i<=A[0];i++){if(match(A[i],re))B[k+=1]=A[i];};B[0]=k;return k;
}
{
    SAFETY_SINCE_WE_WALK_IN_THE_DARK=2000;
    input="file1";
    lookup_file="file2";
    output="output.data";
    we_died = 0;
    # Instead of -i option to zgrep, use [Aa][Nn][Tt] way of representation. 
    PATTERN = "[Rr][Ii][Dd]=[0-9a-zA-Z]*";
    if(for_gods_sake_we_are_scanning_normal_uncompressed_content){
        grep("-o", PATTERN , input , A);
    }else{
        cmd("zgrep -o \""PATTERN"\" \""input"\" ",A);
    }
    # Now, A has matching data. A[0] holds total. A[1] to A[A[0]] holds data.

    # Lets read lookup_file, block at a time. 
    # Since you did not give any specific caracteristics of file2, we can not optimeze in any way.
    # Oh well.
    while((getline v < lookup_file)>0){
        # Throw away head until we reach a valid block header
        if(v!~"^Starting of block ") continue;
        # We are inside block.
        blockid = substr(r,match(r,"[0-9]*$"));
        # get whatever data inside block untill we reach end
        c=0;
        delete B;
        B[0]=0;
        B[B[0]+=1]=blockid;
        while(((getline v < lookup_file)>0)&&v!~"^Ending of block" && c < SAFETY_SINCE_WE_WALK_IN_THE_DARK){
            B[B[0]+=1]=v;
            if(v~"RID"){
                # store it so we can later play with it
                B["RID"]=v;
            };
            c++;# We are fucked as the structure EOB was missing.
        }
        # we ither died, or end of block.
        if(c >= SAFETY_SINCE_WE_WALK_IN_THE_DARK){
             we_died = 1;
            break;
       }
       # We assume B has whole block. B[0] has total. B[1] .. B[B[0]] has data. B["RID"] has RID for fast reference.
       # Now, since the data format of file2 is not explained at all, I am guessing
       # A[n] == "RID=DEADBEEF"
       # and
       # B["RID"] == "RID=DEADBEEF"
       # holds true, which is totally unlikely. what if it is "RID          =     \t\t\t DeadBeEf"
       # so this is really impossible to guess as the OP is not even sure what format they are using.
       #   sub("^[Rr][Ii][Dd][ \t]*=[ \t]*","",B["RID"])
       # or something should be done so we can compare the damn thing.
       matched_block = 0; matched_idx = 0;
       for(i=1;i<=A[0];i++){
         if(A[i]==B["RID"]){matched_idx = i; matched_block=1; break;}
      }
      if(matched_block){
         # This block in B[] also matches A[matched_idx];
        # Do what ever you want to do with it.
        dbg_printarray(B,"B");
        print "A["matched_idx"]=["A[matched_idx]"]";
        print "Have fun";
      }
    }
}'