Python if else条件在dataframe中,并提取列值
我有一个数据帧(df),看起来像Python if else条件在dataframe中,并提取列值,python,pandas,dataframe,if-statement,Python,Pandas,Dataframe,If Statement,我有一个数据帧(df),看起来像 +-----------------+-----------+----------------+---------------------+--------------+-------------+ | Gene | Gene name | Tissue | Cell type | Level | Reliability | +-----------------+-----------+--
+-----------------+-----------+----------------+---------------------+--------------+-------------+
| Gene | Gene name | Tissue | Cell type | Level | Reliability |
+-----------------+-----------+----------------+---------------------+--------------+-------------+
| ENSG00000001561 | ENPP4 | adipose tissue | adipocytes | Low | Approved |
| ENSG00000001561 | ENPP4 | adrenal gland | glandular cells | High | Approved |
| ENSG00000001561 | ENPP4 | appendix | glandular cells | Medium | Approved |
| ENSG00000001561 | ENPP4 | appendix | lymphoid tissue | Low | Approved |
| ENSG00000001561 | ENPP4 | bone marrow | hematopoietic cells | Medium | Approved |
| ENSG00000002586 | CD99 | adipose tissue | adipocytes | Low | Supported |
| ENSG00000002586 | CD99 | adrenal gland | glandular cells | Medium | Supported |
| ENSG00000002586 | CD99 | appendix | glandular cells | Not detected | Supported |
| ENSG00000002586 | CD99 | appendix | lymphoid tissue | Not detected | Supported |
| ENSG00000002586 | CD99 | bone marrow | hematopoietic cells | High | Supported |
| ENSG00000002586 | CD99 | breast | adipocytes | Not detected | Supported |
| ENSG00000003056 | M6PR | adipose tissue | adipocytes | High | Approved |
| ENSG00000003056 | M6PR | adrenal gland | glandular cells | High | Approved |
| ENSG00000003056 | M6PR | appendix | glandular cells | High | Approved |
| ENSG00000003056 | M6PR | appendix | lymphoid tissue | High | Approved |
| ENSG00000003056 | M6PR | bone marrow | hematopoietic cells | High | Approved |
+-----------------+-----------+----------------+---------------------+--------------+-------------+
预期产出:
+-----------+--------+-------------------------------+
| Gene name | Level | Tissue |
+-----------+--------+-------------------------------+
| ENPP4 | Low | adipose tissue, appendix |
| ENPP4 | High | adrenal gland, bronchus |
| ENPP4 | Medium | appendix, breast, bone marrow |
| CD99 | Low | adipose tissue, appendix |
| CD99 | High | bone marrow |
| CD99 | Medium | adrenal gland |
| ... | ... | ... |
+-----------+--------+-------------------------------+
使用的代码(从以下位置获取帮助):
错误:KeyError:('Level',发生在索引172')
我不明白我做错了什么。有什么建议吗?试试:
df.groupby(['Gene name','Level'], as_index=False)['Cell type'].agg(', '.join)
输出:
| | Gene name | Level | Cell type |
|---:|:------------|:-------------|:----------------------------------------------------------------------------------------------------------------|
| 0 | CD99 | High | hematopoietic cells |
| 1 | CD99 | Low | adipocytes |
| 2 | CD99 | Medium | glandular cells |
| 3 | CD99 | Not detected | glandular cells , lymphoid tissue , adipocytes |
| 4 | ENPP4 | High | glandular cells |
| 5 | ENPP4 | Low | adipocytes , lymphoid tissue |
| 6 | ENPP4 | Medium | glandular cells , hematopoietic cells |
| 7 | M6PR | High | adipocytes , glandular cells , glandular cells , lymphoid tissue , hematopoietic cells |
| Gene name | High | Low | Medium | Not detected |
|:------------|:----------------------------------------------------------------------------------------------------------------|:---------------------------------------|:-------------------------------------------|:---------------------------------------------------------|
| CD99 | hematopoietic cells | adipocytes | glandular cells | glandular cells , lymphoid tissue , adipocytes |
| ENPP4 | glandular cells | adipocytes , lymphoid tissue | glandular cells , hematopoietic cells | nan |
| M6PR | adipocytes , glandular cells , glandular cells , lymphoid tissue , hematopoietic cells | nan | nan | nan |
根据以下评论添加更新:
(df.groupby(['Gene name','Level'], as_index=False)['Cell type']
.agg(','.join).set_index(['Gene name','Level'])['Cell type']
.unstack().reset_index())
输出:
| | Gene name | Level | Cell type |
|---:|:------------|:-------------|:----------------------------------------------------------------------------------------------------------------|
| 0 | CD99 | High | hematopoietic cells |
| 1 | CD99 | Low | adipocytes |
| 2 | CD99 | Medium | glandular cells |
| 3 | CD99 | Not detected | glandular cells , lymphoid tissue , adipocytes |
| 4 | ENPP4 | High | glandular cells |
| 5 | ENPP4 | Low | adipocytes , lymphoid tissue |
| 6 | ENPP4 | Medium | glandular cells , hematopoietic cells |
| 7 | M6PR | High | adipocytes , glandular cells , glandular cells , lymphoid tissue , hematopoietic cells |
| Gene name | High | Low | Medium | Not detected |
|:------------|:----------------------------------------------------------------------------------------------------------------|:---------------------------------------|:-------------------------------------------|:---------------------------------------------------------|
| CD99 | hematopoietic cells | adipocytes | glandular cells | glandular cells , lymphoid tissue , adipocytes |
| ENPP4 | glandular cells | adipocytes , lymphoid tissue | glandular cells , hematopoietic cells | nan |
| M6PR | adipocytes , glandular cells , glandular cells , lymphoid tissue , hematopoietic cells | nan | nan | nan |
尝试:
输出:
| | Gene name | Level | Cell type |
|---:|:------------|:-------------|:----------------------------------------------------------------------------------------------------------------|
| 0 | CD99 | High | hematopoietic cells |
| 1 | CD99 | Low | adipocytes |
| 2 | CD99 | Medium | glandular cells |
| 3 | CD99 | Not detected | glandular cells , lymphoid tissue , adipocytes |
| 4 | ENPP4 | High | glandular cells |
| 5 | ENPP4 | Low | adipocytes , lymphoid tissue |
| 6 | ENPP4 | Medium | glandular cells , hematopoietic cells |
| 7 | M6PR | High | adipocytes , glandular cells , glandular cells , lymphoid tissue , hematopoietic cells |
| Gene name | High | Low | Medium | Not detected |
|:------------|:----------------------------------------------------------------------------------------------------------------|:---------------------------------------|:-------------------------------------------|:---------------------------------------------------------|
| CD99 | hematopoietic cells | adipocytes | glandular cells | glandular cells , lymphoid tissue , adipocytes |
| ENPP4 | glandular cells | adipocytes , lymphoid tissue | glandular cells , hematopoietic cells | nan |
| M6PR | adipocytes , glandular cells , glandular cells , lymphoid tissue , hematopoietic cells | nan | nan | nan |
根据以下评论添加更新:
(df.groupby(['Gene name','Level'], as_index=False)['Cell type']
.agg(','.join).set_index(['Gene name','Level'])['Cell type']
.unstack().reset_index())
输出:
| | Gene name | Level | Cell type |
|---:|:------------|:-------------|:----------------------------------------------------------------------------------------------------------------|
| 0 | CD99 | High | hematopoietic cells |
| 1 | CD99 | Low | adipocytes |
| 2 | CD99 | Medium | glandular cells |
| 3 | CD99 | Not detected | glandular cells , lymphoid tissue , adipocytes |
| 4 | ENPP4 | High | glandular cells |
| 5 | ENPP4 | Low | adipocytes , lymphoid tissue |
| 6 | ENPP4 | Medium | glandular cells , hematopoietic cells |
| 7 | M6PR | High | adipocytes , glandular cells , glandular cells , lymphoid tissue , hematopoietic cells |
| Gene name | High | Low | Medium | Not detected |
|:------------|:----------------------------------------------------------------------------------------------------------------|:---------------------------------------|:-------------------------------------------|:---------------------------------------------------------|
| CD99 | hematopoietic cells | adipocytes | glandular cells | glandular cells , lymphoid tissue , adipocytes |
| ENPP4 | glandular cells | adipocytes , lymphoid tissue | glandular cells , hematopoietic cells | nan |
| M6PR | adipocytes , glandular cells , glandular cells , lymphoid tissue , hematopoietic cells | nan | nan | nan |
如果我可以问波士顿,你是如何读取数据的?我试过读取剪贴板(),但结果不是right@sammywemmy我用这些语句
df=pd.read_剪贴板(sep='|',header=None)
df=df.drop([0,7],axis=1)。设置_轴(['Gene','Gene name','Tissue','Cell type','Level','Reliability','axis=1)
Oh。。我没有复制标题内容,只是复制了数据。@sammywemmy我也可以用.str.strip来清理一些空白,但我认为大部分需要做的事情都在上面。@ScottBoston它成功了,谢谢!我正试图给它增加另一个复杂因素。我希望在新的df中得到这样的列:Gene name | Level:High | Level:Medium | Level:Low
,这些列从第一个df得到相应的值。有什么建议吗?@ScottBoston.。这是一条很棒的班轮。谢谢你的帮助。如果我可以问波士顿,你是如何读取数据的?我试过读取剪贴板(),但结果不是right@sammywemmy我用这些语句df=pd.read_剪贴板(sep='|',header=None)
df=df.drop([0,7],axis=1)。设置_轴(['Gene','Gene name','Tissue','Cell type','Level','Reliability','axis=1)
Oh。。我没有复制标题内容,只是复制了数据。@sammywemmy我也可以用.str.strip来清理一些空白,但我认为大部分需要做的事情都在上面。@ScottBoston它成功了,谢谢!我正试图给它增加另一个复杂因素。我希望在新的df中得到这样的列:Gene name | Level:High | Level:Medium | Level:Low
,这些列从第一个df得到相应的值。有什么建议吗?@ScottBoston.。这是一条很棒的班轮。谢谢你的帮助(y)