Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/unix/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 按几列对“唯一”排序,然后合并不匹配列中的值_Python_Unix - Fatal编程技术网

Python 按几列对“唯一”排序,然后合并不匹配列中的值

Python 按几列对“唯一”排序,然后合并不匹配列中的值,python,unix,Python,Unix,考虑包含遗传变异的bed文件: CHR开始停止RSID参考/ALT表型PVALUE 1 987654321 987654322 rs123456 A/T高度6E-9 1 987654321 987654322 rs123456 A/T冲程8E-15 我想按前5列对unique进行排序,然后合并具有唯一值的列中的内容: 示例输出: CHR开始停止RSID参考/ALT表型PVALUE 1 987654321 987654322 rs123456 A/T高度,行程6E-9,8E-15 这在Python

考虑包含遗传变异的bed文件:

CHR开始停止RSID参考/ALT表型PVALUE
1 987654321 987654322 rs123456 A/T高度6E-9
1 987654321 987654322 rs123456 A/T冲程8E-15 我想按前5列对unique进行排序,然后合并具有唯一值的列中的内容:

示例输出:

CHR开始停止RSID参考/ALT表型PVALUE
1 987654321 987654322 rs123456 A/T高度,行程6E-9,8E-15 这在Python或Unix中可能吗?还是我需要写剧本

如果可以在Python或Unix中使用,那么什么函数允许我这样做


这个问题已经解决了,但从未解决。

Python 在Python中,可以使用和使用自定义lambda函数
lambda x:','。join(x)

将熊猫作为pd导入
从io导入StringIO
text=''CHR开始停止RSID REF/ALT表型PVALUE
1 987654321 987654322 rs123456 A/T高度6E-9
1 987654321 987654322 rs123456 A/T冲程8E-15''
df=pd.read_csv(StringIO(text),sep='',dtype={'PVALUE':str})
打印(df)
CHR开始停止RSID参考/ALT表型PVALUE
0 1 987654321 987654322 rs123456 A/T高度6E-9
1 987654321 987654322 rs123456 A/T冲程8E-15
df_res=(df.groupby(['CHR','START','STOP','RSID','REF/ALT']))
.agg({'ephentice':lambda x:','.join(x),
“PVALUE”:lambda x:','。连接(x)})
.reset_index())
打印(df_res)
CHR开始停止RSID参考/ALT表型PVALUE
0 1 987654321 987654322 rs123456 A/T高度,行程6E-9,8E-15
用于按所需顺序对df_res进行排序。

Python
import pandas as pd
data = pd.read_csv('file_name.txt',dtype={'PVALUE':'object'}, sep = ' ' )
PVALUE = data.groupby(['CHR', 'START', 'STOP', 'RSID', 'REF/ALT'])['PVALUE'].apply(','.join).reset_index()['PVALUE']
data = data.groupby(['CHR', 'START', 'STOP', 'RSID', 'REF/ALT'])['PHENOTYPE'].apply(','.join).reset_index()
data['PVALUE'] = PVALUE
print(data)
在Python中,可以使用和使用自定义lambda函数
lambda x:','。join(x)

将熊猫作为pd导入
从io导入StringIO
text=''CHR开始停止RSID REF/ALT表型PVALUE
1 987654321 987654322 rs123456 A/T高度6E-9
1 987654321 987654322 rs123456 A/T冲程8E-15''
df=pd.read_csv(StringIO(text),sep='',dtype={'PVALUE':str})
打印(df)
CHR开始停止RSID参考/ALT表型PVALUE
0 1 987654321 987654322 rs123456 A/T高度6E-9
1 987654321 987654322 rs123456 A/T冲程8E-15
df_res=(df.groupby(['CHR','START','STOP','RSID','REF/ALT']))
.agg({'ephentice':lambda x:','.join(x),
“PVALUE”:lambda x:','。连接(x)})
.reset_index())
打印(df_res)
CHR开始停止RSID参考/ALT表型PVALUE
0 1 987654321 987654322 rs123456 A/T高度,行程6E-9,8E-15
用于按所需顺序对df_res进行排序

import pandas as pd
data = pd.read_csv('file_name.txt',dtype={'PVALUE':'object'}, sep = ' ' )
PVALUE = data.groupby(['CHR', 'START', 'STOP', 'RSID', 'REF/ALT'])['PVALUE'].apply(','.join).reset_index()['PVALUE']
data = data.groupby(['CHR', 'START', 'STOP', 'RSID', 'REF/ALT'])['PHENOTYPE'].apply(','.join).reset_index()
data['PVALUE'] = PVALUE
print(data)

下面是实现这一点的一般python方法:

from collections import defaultdict

# Open both files for reading and writing
with open("input.txt") as fin, open("output.txt", mode="w") as fout:
    grouped_columns = defaultdict(list)

    # Extract headers
    headers = next(fin)

    # Collect grouped columns in defaultdict, using first 5 columns as the key
    for line in fin:
        line = line.strip().split()
        grouped_columns[tuple(line[:5])].append(line[5:])

    # Write out result from dictionary
    fout.write(headers)
    for key, value in grouped_columns.items():
        fout.write(
            "%s %s"
            % (
                " ".join(key),
                " ".join("%s,%s" % (ptype, pval) for ptype, pval in zip(*value)),
            ) 
        )
output.txt

CHR START STOP RSID REF/ALT PHENOTYPE PVALUE
1 987654321 987654322 rs123456 A/T Height,Stroke 6E-9,8E-15

下面是实现这一点的一般python方法:

from collections import defaultdict

# Open both files for reading and writing
with open("input.txt") as fin, open("output.txt", mode="w") as fout:
    grouped_columns = defaultdict(list)

    # Extract headers
    headers = next(fin)

    # Collect grouped columns in defaultdict, using first 5 columns as the key
    for line in fin:
        line = line.strip().split()
        grouped_columns[tuple(line[:5])].append(line[5:])

    # Write out result from dictionary
    fout.write(headers)
    for key, value in grouped_columns.items():
        fout.write(
            "%s %s"
            % (
                " ".join(key),
                " ".join("%s,%s" % (ptype, pval) for ptype, pval in zip(*value)),
            ) 
        )
output.txt

CHR START STOP RSID REF/ALT PHENOTYPE PVALUE
1 987654321 987654322 rs123456 A/T Height,Stroke 6E-9,8E-15