在python中汇总列表并生成新矩阵
我有一个列表,如以下小示例:在python中汇总列表并生成新矩阵,python,Python,我有一个列表,如以下小示例: [['chr19', '35789598', '35789629', '21', 'chr19', '35510000', '36200000'], ['chr19', '35789598', '35789629', '24', 'chr19', '35510000', '36200000'], ['chr19', '35789598', '35789629', '52', 'chr19', '35510000', '36200000'], ['chr19', '3
[['chr19', '35789598', '35789629', '21', 'chr19', '35510000', '36200000'], ['chr19', '35789598', '35789629', '24', 'chr19', '35510000', '36200000'], ['chr19', '35789598', '35789629', '52', 'chr19', '35510000', '36200000'], ['chr19', '35789598', '35789629', '88', 'chr19', '35510000', '36200000'], ['chr19', '35798974', '35799005', '56', 'chr19', '35510000', '36200000'], ['chr19', '35883830', '35883861', '16', 'chr19', '35510000', '36200000'], ['chr19', '35884320', '35884351', '51', 'chr19', '35510000', '36200000']]
如您所见,每个内部列表都有7个元素。我想制作一个新的列表,其中没有具有类似第一、第二和第三元素的内部列表。事实上,如果有一些第1、第2和第3元素相似的内部列表,我将只取第一个内部列表并删除其他内部列表。小示例的预期输出如下所示:
预期产出:
[['chr19', '35789598', '35789629', '21', 'chr19', '35510000', '36200000'], ['chr19', '35798974', '35799005', '56', 'chr19', '35510000', '36200000'], ['chr19', '35883830', '35883861', '16', 'chr19', '35510000', '36200000'], ['chr19', '35884320', '35884351', '51', 'chr19', '35510000', '36200000']]
以下是python中的代码,它不会返回我期望的结果:
result = []
for i in mat:
for j in i:
if j == j-1:
result.append(j)
我会使用熊猫:
import pandas as pd
data = [['chr19', '35789598', '35789629', '21', 'chr19', '35510000', '36200000'],
['chr19', '35789598', '35789629', '24', 'chr19', '35510000', '36200000'],
['chr19', '35789598', '35789629', '52', 'chr19', '35510000', '36200000'],
['chr19', '35789598', '35789629', '88', 'chr19', '35510000', '36200000'],
['chr19', '35798974', '35799005', '56', 'chr19', '35510000', '36200000'],
['chr19', '35883830', '35883861', '16', 'chr19', '35510000', '36200000'],
['chr19', '35884320', '35884351', '51', 'chr19', '35510000', '36200000']]
# Convert your list of list to a DataFrame
df = pd.DataFrame(data)
0 1 2 3 4 5 6
0 chr19 35789598 35789629 21 chr19 35510000 36200000
1 chr19 35789598 35789629 24 chr19 35510000 36200000
2 chr19 35789598 35789629 52 chr19 35510000 36200000
3 chr19 35789598 35789629 88 chr19 35510000 36200000
4 chr19 35798974 35799005 56 chr19 35510000 36200000
5 chr19 35883830 35883861 16 chr19 35510000 36200000
6 chr19 35884320 35884351 51 chr19 35510000 36200000
df = df.drop_duplicates([0, 1, 2], keep='first')
0 1 2 3 4 5 6
0 chr19 35789598 35789629 21 chr19 35510000 36200000
4 chr19 35798974 35799005 56 chr19 35510000 36200000
5 chr19 35883830 35883861 16 chr19 35510000 36200000
6 chr19 35884320 35884351 51 chr19 35510000 36200000
# If you need the data as the list of lists still output like this:
output = df.values
array([['chr19', '35789598', '35789629', '21', 'chr19', '35510000', '36200000'],
['chr19', '35798974', '35799005', '56', 'chr19', '35510000', '36200000'],
['chr19', '35883830', '35883861', '16', 'chr19', '35510000', '36200000'],
['chr19', '35884320', '35884351', '51', 'chr19', '35510000', '36200000']],
dtype=object)
# Otherwise you can continue to use the DataFrame for your analysis
您应该包括您的实际预期输出以及NUMPY矩阵?还是简单的嵌套列表?我已经编辑了这个问题。