Python 3.x python:删除一列中由逗号分隔的多个条目
我在excel中有一个名为sorted_list的表,如下所示:Python 3.x python:删除一列中由逗号分隔的多个条目,python-3.x,pandas,Python 3.x,Pandas,我在excel中有一个名为sorted_list的表,如下所示: +-------------------+--------------------------------+---+----------------------------------------------------------------------------------------------------------+----------------------------------------------------
+-------------------+--------------------------------+---+----------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------+------+------+
| P33151 | partially reviewed | 9 | other code | Homo sapiens (Human); Pan troglodytes (Chimpanzee) | 784 | 100% |
| B4DMA7 | unreviewed | 1 | B4DMA7 | Homo sapiens (Human) | 779 | 100% |
| A8K0L9 | unreviewed | 1 | A8K0L9 | Homo sapiens (Human) | 828 | 100% |
| B4DTP0 | unreviewed | 1 | B4DTP0 | Homo sapiens (Human) | 525 | 100% |
| D3DSM0 | unreviewed | 1 | D3DSM0 | Homo sapiens (Human) | 712 | 100% |
| A8K0L1 | unreviewed | 1 | A8K0L1 | Homo sapiens (Human) | 781 | 100% |
| P06756,L7RXH0 | partially reviewed and UniParc | 8 | P06756; L7RXH0; UPI0001BE65FF; UPI000DF0CE97; UPI0003E68261; UPI0002A11580; UPI0000112063; UPI0012318420 | Homo sapiens (Human); ? | 1048 | 100% |
| Q59EQ1 | unreviewed | 8 | A0A2J8RMA6; Q59EQ1; H3BR78; H3BPQ2; H3BSM4; H3BQH2; H3BP26; H3BQB5 | Pongo abelii (Sumatran orangutan) (Pongo pygmaeus abelii); Homo sapiens (Human) | 670 | 100% |
| A0A024R8K7 | partially reviewed and UniParc | 3 | A0A024R8K7; P16144-2; UPI0003EAE94B | Homo sapiens (Human) | 1752 | 100% |
| P11279,A0A024RDY3 | partially reviewed | 3 | P11279; A0A024RDY3; B3KRY3 | Homo sapiens (Human) | 417 | 100% |
| B4DFP0 | unreviewed | 1 | B4DFP0 | Homo sapiens (Human) | 382 | 100% |
| J3KRI5 | unreviewed | 2 | J3KRI5; H2QB90 | Homo sapiens (Human); Pan troglodytes (Chimpanzee) | 744 | 100% |
| B2RCN5 | unreviewed | 1 | B2RCN5 | Homo sapiens (Human) | 916 | 100% |
| Q9NR97 | reviewed | 1 | Q9NR97 | Homo sapiens (Human) | 1041 | 100% |
| Q02846 | reviewed | 1 | Q02846 | Homo sapiens (Human) | 1103 | 100% |
| Q9NY15 | reviewed | 1 | Q9NY15 | Homo sapiens (Human) | 2570 | 100% |
+-------------------+--------------------------------+---+----------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------------+------+------+
我感兴趣的是将第一列的值与其他表匹配,但是col1中的某些行有多个值。
我希望用单个值提取每一行(删除“,”之后的部分),然后将其与其他表的preppi
列preppi['prot1']
到目前为止我使用的代码是
col_one_list = sorted_list['id'].tolist()
print(list(col_one_list))
filepath= "/Users/saheeba/Downloads/preppi_final.csv"
preppi = pd.read_csv(filepath)
df = preppi.loc[preppi['prot1'].isin(col_one_list)]
print(df.shape)
但它将数据保留在行中,第一列中有两个值,例如<代码>P06756,L7RXH0
关于如何避免这种情况,有什么建议吗?尝试通过拆分分隔符上的第一列(此处为逗号)并保留第一个元素来创建一个新列。对于没有分隔符的行,您将获得剩余的唯一元素(元素本身就是分隔符),对于剩余的行,您将获得第一个元素。创建该列后,应用已使用该列的逻辑 这对我不起作用,因为对我来说,每个列中的项目数量不一样