Python 拷贝号文件格式问题（需要修改结构）_Python_R_Format

Python 拷贝号文件格式问题（需要修改结构）

python r

Python 拷贝号文件格式问题（需要修改结构）,python,r,format,Python,R,Format,我有一个特殊格式的文件.cns，这是一个用于分析拷贝数的分段文件。它是一个文本文件，如下所示（第一行加上标题）：染色体，起始，结束，基因，log2 CHR1134402861395，“LOC102725121，DDX11L1，OR4F5，LOC10013331，LOC100132062，LOC100132287，LOC10013331，LINC00115，SAMD11”，-0.28067 我们将其转换为.csv，这样就可以通过选项卡将其分隔开（但效果不好）。.cns由逗号分隔，但genes是由

我有一个特殊格式的文件.cns，这是一个用于分析拷贝数的分段文件。它是一个文本文件，如下所示（第一行加上标题）：

染色体，起始，结束，基因，log2 CHR1134402861395，“LOC102725121，DDX11L1，OR4F5，LOC10013331，LOC100132062，LOC100132287，LOC10013331，LINC00115，SAMD11”，-0.28067

我们将其转换为.csv，这样就可以通过选项卡将其分隔开（但效果不好）。.cns由逗号分隔，但genes是由引号分隔的单个字符串。我希望这是有用的。我需要的输出如下：

基因log2

LOC102725121-0.28067

DDX11L1-0.28067

OR4F5-0.28067

PIK3CA 0.35475

3.35475尼泊尔卢比

第一步是，用逗号分隔所有内容，然后转置列？最后打印包含在由引号分隔的字符串中的每个基因的de log2值。如果你能帮我写一个R或者python脚本，那会很有帮助。也许awk也能起作用。我正在使用LInux UBuntu V16.04 我不确定我是否清楚，让我知道这是否有用。

谢谢大家!

希望以下Python代码对您有所帮助

import csv

list1 = []
with open('copynumber.cns','r') as file:
    exampleReader = csv.reader(file)
    for row in exampleReader:
        list1.append(row)

for row in list1:
    strings = row[3].split(',')   # Get fourth column in CSV, i.e. gene column, and split on occurrance of comma
    for string in strings:  # Loop through each string
        print(string + ' ' + str(row[4]))

您已经为这项工作确定了几种工具。你试过什么？那些尝试有什么不起作用？看起来你想让我们为你写一些代码。虽然许多用户愿意为陷入困境的程序员编写代码，但他们通常只在海报已经试图自己解决问题时才提供帮助。展示这一努力的一个好方法是包含一个。检查你应该在发布前完成的任务，尤其是。。这通常意味着你需要的是半个小时的时间和当地的导师在一起，或是走一走，而不是堆积如山。

import csv

list1 = []
with open('copynumber.cns','r') as file:
    exampleReader = csv.reader(file)
    for row in exampleReader:
        list1.append(row)

for row in list1:
    strings = row[3].split(',')   # Get fourth column in CSV, i.e. gene column, and split on occurrance of comma
    for string in strings:  # Loop through each string
        print(string + ' ' + str(row[4]))