Python 使用字典替换数据帧中的字符串而不覆盖_Python_Regex_Pandas_Dictionary

Python 使用字典替换数据帧中的字符串而不覆盖

python regex pandas dictionary

Python 使用字典替换数据帧中的字符串而不覆盖,python,regex,pandas,dictionary,Python,Regex,Pandas,Dictionary,我正在尝试转换熊猫数据框，该数据框中的列填充了如下值： df['Alteration'] Q79K,E17K Q79K,E17K T315I AA_code = {re.compile('[C]'): 'Cys',re.compile('[D]'): 'Asp', re.compile('[S]'): 'Ser',re.compile('[Q]'): 'Gln',re.compile('[K]'): 'Lys', re.compile('[I]'): 'Ile',re.compile('

我正在尝试转换熊猫数据框，该数据框中的列填充了如下值：

df['Alteration']

Q79K,E17K
Q79K,E17K
T315I

AA_code = {re.compile('[C]'): 'Cys',re.compile('[D]'): 'Asp', 
re.compile('[S]'): 'Ser',re.compile('[Q]'): 'Gln',re.compile('[K]'): 'Lys', 
re.compile('[I]'): 'Ile',re.compile('[P]'): 'Pro',re.compile('[T]'): 'Thr', 
re.compile('[F]'): 'Phe',re.compile('[N]'): 'Asn',re.compile('[G]'): 'Gly', 
re.compile('[H]'): 'His',re.compile('[L]'): 'Leu',re.compile('[R]'): 'Arg', 
re.compile('[W]'): 'Trp',re.compile('[A]'): 'Ala',re.compile('[V]'): 'Val', 
re.compile('[E]'): 'Glu',re.compile('[Y]'): 'Tyr',re.compile('[M]'): 'Met'}

我想把单字母氨基酸转换成三字母代码，看起来更像这样：

Gln79Lys,Glu17Lys
Gln79Lys,Glu17Lys
Thr315Ile

Glyln79Leuys,Glu17Leuys
Glyln79Leuys,Glu17Leuys
Thr315Ile

到目前为止，我已经尝试使用一个使用正则表达式作为键的字典，例如：

df['Alteration']

Q79K,E17K
Q79K,E17K
T315I

AA_code = {re.compile('[C]'): 'Cys',re.compile('[D]'): 'Asp', 
re.compile('[S]'): 'Ser',re.compile('[Q]'): 'Gln',re.compile('[K]'): 'Lys', 
re.compile('[I]'): 'Ile',re.compile('[P]'): 'Pro',re.compile('[T]'): 'Thr', 
re.compile('[F]'): 'Phe',re.compile('[N]'): 'Asn',re.compile('[G]'): 'Gly', 
re.compile('[H]'): 'His',re.compile('[L]'): 'Leu',re.compile('[R]'): 'Arg', 
re.compile('[W]'): 'Trp',re.compile('[A]'): 'Ala',re.compile('[V]'): 'Val', 
re.compile('[E]'): 'Glu',re.compile('[Y]'): 'Tyr',re.compile('[M]'): 'Met'}

以及根据字典替换的以下代码：

df['Replacement'] = dfx2['Alteration'].replace(AA_code, regex=True)

然而，我得到了一些奇怪的行为，其中replace函数过度写入值，看起来更像这样：

Gln79Lys,Glu17Lys
Gln79Lys,Glu17Lys
Thr315Ile

Glyln79Leuys,Glu17Leuys
Glyln79Leuys,Glu17Leuys
Thr315Ile

据我所知，Glyln是从代码中派生出来的，首先将Q改为Gln，然后Gln中的G被字典中的G:Gly key:value对覆盖以获得Glyln。有没有办法解决这个问题

谢谢

创建一个查找表，然后在

系列.str.replace中调用它，例如：
import pandas as pd

lookup = {
    'Q': 'Gln',
    'K': 'Lys',
    'E': 'Glu',
    'G': 'Gly'
    # needs completing...
}

s = pd.Series(['Q79K,E17K', 'Q79K,E17K', 'T315I'])
s.str.replace('([{}])'.format(''.join(lookup)), lambda m: lookup[m.group(1)])

给你：
0    Gln79Lys,Glu17Lys
1    Gln79Lys,Glu17Lys
2                T315I

乔恩的回答很好。根据他的意见，另一种方法是
import pandas as pd

lookup = {
    'Q': 'Gln',
    'K': 'Lys',
    'E': 'Glu',
    'G': 'Gly'
     # needs completing...
}

s = pd.Series(['Q79K,E17K', 'Q79K,E17K', 'T315I'])
s.apply(lambda row: "".join([lookup[x] if x in lookup else x for x in row]))

或者，正如@Jon Clements在评论中所建议的
s.apply（lambda行：”.join（[lookup.get（x，x）表示行中的x]））

这就给了你
0    Gln79Lys,Glu17Lys
1    Gln79Lys,Glu17Lys
2                T315I
dtype: object

如果您打算采用这种方法，那么查找。第行中x的get（x，x）
是避免显式If/else检查的一种方法……太棒了。我正在研究如何避免if/else检查自己。谢谢