Warning: file_get_contents(/data/phpspider/zhask/data//catemap/1/dart/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Awk 根据bash中的另一个文件删除文件的特定部分_Awk_Fasta - Fatal编程技术网

Awk 根据bash中的另一个文件删除文件的特定部分

Awk 根据bash中的另一个文件删除文件的特定部分,awk,fasta,Awk,Fasta,如果文件1和文件2中存在后的文本,我正在寻找一个bash命令来删除部分文本 这里有一个例子 File1: CUI02270 CUI02272 CUI02271 CUI02290 CUI02289 CUI022799 File2: >CUI02270 |hypothetical protein pCPXV0248[Cowpox virus] MGTVFVPYLLVKLALRVLVISNGYCHVPLKYIVLMIAHRVLLSSIVESTTLDIPDLRSTM ELILLTASRLK

如果
文件1
文件2
中存在
后的文本,我正在寻找一个bash命令来删除部分文本

这里有一个例子

File1:

CUI02270
CUI02272
CUI02271
CUI02290
CUI02289
CUI022799


File2:

>CUI02270 |hypothetical protein pCPXV0248[Cowpox virus]
MGTVFVPYLLVKLALRVLVISNGYCHVPLKYIVLMIAHRVLLSSIVESTTLDIPDLRSTM
ELILLTASRLKFNLYRPNL
>CUI02271 |CPXV043 protein[Cowpox virus]
MLAFCYSLPNVGDVLKGKVYENGYALYIDLFDYPHSEAILAESVQMHMNRYFKYRDKLVG
KTVKVKVIRVDYTKGYIDVNYKRMCKHQ
>CUI02272 |hypothetical protein pCPXV0245[Cowpox virus]
MFTHPFVIDIYISFCIINSNHFNFYSFPYQFIPIFKISIHMHLNTLCQDSFRVRIVKKIN
V
>CUI02273 |CPXV044 protein[Cowpox virus]
MNPDNTIAVITETIPIGMQFDKVYLSTFNVWREILSNTTKTLDISSFYWSLLDEVGTNFG
TTILNEIVQLPKRGVRVRVAVNKSNKPLKDVETLQMAGVEVRYIDITNILGGVLHTKFWI
SDNTHIYLGSANMDWRSLTQVKELGIAIFNNRNLAADLTQIFEVYWYLGVNNLPYNWKNF
YPAYYNTDHPLSMNVSGVPHSVFIASAPQQLCTMERTNDLTALLSCIGNASKFVYVSVMN
FIPIIYSKAGNILFWPYIEDELRRTAIDRKVSVKLLISCWQRSSFIMRNFLRSIAMLKSK
NIDIEVKLFIVPDTDPPIPYSRVNHAKYMVTDKTAYIGTSNWTGNYFTDTCGTSINITPD
DGLGLRQQLEDIFMRDWNSKYSYELYDTSPTKRCRLLKNMKQCTNDIYSDEIQPEKEIPE
YSLE
>CUI02274 |CPXV045 protein[Cowpox virus]
MSANCMFNLDNDYIYCKYWKPITYPKALVFISHGAGEHSGRYDELAENISSLGILVFSHD
HIGHGRSNGEKMMIDDFGTYVRDVVQHVVTIKSTYPGVPVFLLGHSMGATISILAAYENP
NLFTAMILMSPLVNAEAVPRLNLLAAKLMGAITPNAPVGKLCPESVSRDMDEVYKYQYDP
LVNHEKIKAGFASQVLKATNKVRKIIPKINTPSLILQGTNNEISDVSGAYYFMQHANCNR
EIKIYEGAKHHLHKETDEVKKSVMKEIETWIFNRVK
>CUI022799 |CPXV046 protein[Cowpox virus]
MATKSDYEDAVFYFVDDDEICSRDSIIDLIDEYITWRNHVIVFNKDITSCGRLYKELMKF
DDAAIRYYGIDKINEIVEAMSEGDHYINLTEVHDQESLFATIGICAKITEHWGYKKISES
RFQSLGNITDLMTDDNINILILFLEKKLN
>CUI02276 |hypothetical protein pCPXV0240[Cowpox virus]
MDFCKIDVVVSFAHSLDNLINFINTIVPYSSIIELHQFLVESSTTGNIFVKHYNMISPRD
IFIY
我应该有一个新的
文件3
,例如:

>CUI02273 |CPXV044 protein[Cowpox virus]
MNPDNTIAVITETIPIGMQFDKVYLSTFNVWREILSNTTKTLDISSFYWSLLDEVGTNFG
TTILNEIVQLPKRGVRVRVAVNKSNKPLKDVETLQMAGVEVRYIDITNILGGVLHTKFWI
SDNTHIYLGSANMDWRSLTQVKELGIAIFNNRNLAADLTQIFEVYWYLGVNNLPYNWKNF
YPAYYNTDHPLSMNVSGVPHSVFIASAPQQLCTMERTNDLTALLSCIGNASKFVYVSVMN
FIPIIYSKAGNILFWPYIEDELRRTAIDRKVSVKLLISCWQRSSFIMRNFLRSIAMLKSK
NIDIEVKLFIVPDTDPPIPYSRVNHAKYMVTDKTAYIGTSNWTGNYFTDTCGTSINITPD
DGLGLRQQLEDIFMRDWNSKYSYELYDTSPTKRCRLLKNMKQCTNDIYSDEIQPEKEIPE
YSLE
>CUI02274 |CPXV045 protein[Cowpox virus]
MSANCMFNLDNDYIYCKYWKPITYPKALVFISHGAGEHSGRYDELAENISSLGILVFSHD
HIGHGRSNGEKMMIDDFGTYVRDVVQHVVTIKSTYPGVPVFLLGHSMGATISILAAYENP
NLFTAMILMSPLVNAEAVPRLNLLAAKLMGAITPNAPVGKLCPESVSRDMDEVYKYQYDP
LVNHEKIKAGFASQVLKATNKVRKIIPKINTPSLILQGTNNEISDVSGAYYFMQHANCNR
EIKIYEGAKHHLHKETDEVKKSVMKEIETWIFNRVK
>CUI02276 |hypothetical protein pCPXV0240[Cowpox virus]
MDFCKIDVVVSFAHSLDNLINFINTIVPYSSIIELHQFLVESSTTGNIFVKHYNMISPRD
IFIY
在哪里

CUI02270;CUI02272;CUI02271;CUI022799
已被删除,因为
文件1和2中都存在where

有人有主意吗


谢谢你的帮助

您正在处理FASTA文件,您可以使用awk轻松处理这些文件:

$ awk '(NR==FNR){list[$1];next}
       /^>/{key=$0;sub(/^> */,"",key);sub(/ *[|].*$/,"",key);f=1}
       (key in list) {f=0}
       f' file1 file2
其工作方式如下:

  • (NR==FNR){list[$1];next}
    如果我们读取第一个文件(
    NR==FNR
    ),将条目存储在
    列表中并移动到下一条记录
  • /^>/{key=$0;sub(/^>*/,”,key);sub(/*[|].$/,”,key);f=1}
    如果遇到序列名,请提取位于
    和第一个
    之间的密钥。用
    sub
    删除所有不相关的部分,并将标志
    f
    初始化为
    1
    。此标志指示是否要打印序列
  • 检查键为
    的序列是否在
    列表中。如果是,请将标志
    f
    设置为
    0
    ,因为我们不想打印
  • f
    如果
    f==1,这将打印该行

  • 对不起,这不是StackOverflow的工作方式。形式为“我想做X,请给我提示和/或示例代码”的问题被认为是离题的。请访问并阅读,特别是阅读