Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/322.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/perl/10.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 解析以制表符和逗号分隔的文件_Python_Perl_Ubuntu_Awk - Fatal编程技术网

Python 解析以制表符和逗号分隔的文件

Python 解析以制表符和逗号分隔的文件,python,perl,ubuntu,awk,Python,Perl,Ubuntu,Awk,我有一张桌子,上面有几千行这样的字 A GO:0008150,GO:0050789,GO:0050794,GO:0051726,GO:0065007 B GO:0008150,GO:0050789,GO:0050794,GO:0051726,GO:0065007 我想用以下格式解析我的表 A GO:0008150 A GO:0050789 A GO:0050794 A GO:0051726 A GO:0065007 B GO:0008150 B GO:00507

我有一张桌子,上面有几千行这样的字

A   GO:0008150,GO:0050789,GO:0050794,GO:0051726,GO:0065007
B   GO:0008150,GO:0050789,GO:0050794,GO:0051726,GO:0065007

我想用以下格式解析我的表

A   GO:0008150
A   GO:0050789
A   GO:0050794
A   GO:0051726
A   GO:0065007
B   GO:0008150
B GO:0050789
B GO:0050794
B GO:0051726
C GO:0065007

任何帮助都将不胜感激。感谢轻松使用
awk
:只需第二列并通过切片循环即可:

$ awk '{n=split($2, a, ","); for (i=1;i<=n;i++) print $1,a[i]}' file
A GO:0008150
A GO:0050789
A GO:0050794
A GO:0051726
A GO:0065007
B GO:0008150
B GO:0050789
B GO:0050794
B GO:0051726
B GO:0065007

$awk'{n=split($2,a,“,”);对于(i=1;i您可以将Python与
re
模块一起使用

import re
text = '''A   GO:0008150,GO:0050789,GO:0050794,GO:0051726,GO:0065007
B   GO:0008150,GO:0050789,GO:0050794,GO:0051726,GO:0065007'''
pattern = {
'A': re.compile('A\s+(GO.*)\n'),
'B': re.compile('B\s+(GO.*)\n*')
}
A = 'A  ' + '\nA  '.join(pattern['A'].findall(text)[0].split(','))
B = 'B  ' + '\nB  '.join(pattern['B'].findall(text)[0].split(','))
print A
print B
输出:

A  GO:0008150
A  GO:0050789
A  GO:0050794
A  GO:0051726
A  GO:0065007
B  GO:0008150
B  GO:0050789
B  GO:0050794
B  GO:0051726
B  GO:0065007

awk
不带循环,需要多字符

$ awk -v RS=",|\n" 'NF==2{t=$1;$1=$2} {print t,$1}' file

可能是代码示例?“我想用以下格式解析我的表。”你尝试了什么?祝你好运。为什么是python标记?谢谢。你能解释一下代码第一部分中的“a”是什么吗?@pali从我提供的链接中可以看出,它是存储切片的数组。谢谢你提供的信息