Warning: file_get_contents(/data/phpspider/zhask/data//catemap/4/string/5.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 迭代添加新字符串作为列表中的元素_Python_String_Pandas_List_Dataframe - Fatal编程技术网

Python 迭代添加新字符串作为列表中的元素

Python 迭代添加新字符串作为列表中的元素,python,string,pandas,list,dataframe,Python,String,Pandas,List,Dataframe,我正在使用一个数据框,该数据框由一列组成,列中的数字格式如下: [45,45,D'],[46,49,C'],[50,66,S'],[67101,C'],[102103,S'],[104106,C'],[107108,S'],[109120,C'],[121121,S'],[122123,C'],[124140,S'],[141149,C'],[150176,S'],[177178,C'],[179181,S'],[182,194,C'],[2147,C'> 这些数字对应于字符串中字符的位置:即字

我正在使用一个数据框,该数据框由一列组成,列中的数字格式如下:
[45,45,D'],[46,49,C'],[50,66,S'],[67101,C'],[102103,S'],[104106,C'],[107108,S'],[109120,C'],[121121,S'],[122123,C'],[124140,S'],[141149,C'],[150176,S'],[177178,C'],[179181,S'],[182,194,C'],[2147,C'>

这些数字对应于字符串中字符的位置:即字符串:
'mgilsflpvlatesdwadckpqpwghmlwtavlflapvagtpappkavlklepqwinqedsvtlctrgthspesdsiqwfhngnliptqpsyrfkannndsgeytcqtgqtgqtlsdpvhltvlsvqtqtlfqgqfqfqkskfsdpnfqahshgqshgqthgjjjjvgjavavaivaavavavavavavavavalqstgqqqstqstqstqstqstqstqstqstqstqstqstqstqstqstqstpvhltvlstqlvlstqtltvlstqstqstqstqstqstqstqstqstqstqstqstqstqstqstqstqst

如您所见,列表中的某些字符与数字列表中的数字不对应(即,缺少0-44)。因此,必须删除第0-44位的字符,以创建较短的字母序列

我可以为一行这样做,但我很难为数据帧中的每一行这样做

这是一行代码:

new_s=''
对于res中的项目:
new_s+=strSeq[项目[0]-1:项目[1]]
打印(len(新的),新的)
这就是我一直在努力为所有线路提供的:

shortenedSeq_list=[]
计数器=0
stringstring=[]
对于df.itertuples()中的行:
strSeq2=[rows.sequence]
strremove2=[rows.shorted\u mobidb\u consenses]
对于stremove2中的项目:
res=ast.literal\u eval(项目)
对于res中的项目:
追加(streseq2[项目[0]-1:项目[1]]
弦
但这会导致输出:

 [],
 [],
 [],
 [],
 [],
 [],
 [],
 [],
 [],
 ['MGKGKPRGLNSARKLRVHRRNNRWAETTYKKRLLGTAFKSSPFGGSSHAKGIVLEKIGIESKQPNSAIRKCVRVQLIKNGKKVTAFVPNDGCLNFVDENDEVLLAGFGRKGKAKGDIPGVRFKVVKVSGVSLLALWKEKKEKPRS'],
 [],
 [],
然而,我希望列表中的每一行都是被缩短的序列

我最终希望将此列表作为列添加到dataframe中

更新

数字输出为字符串而不是列表,因此res是数字的列表,这是工作代码输出:

173 aappkavlklePQwinvlqedsvtlcrgthspesdsiqwfhngnliptqpsyrfkannndsgeytcqtgqtlsdpvhltvlqtlqtlvqtlfqteggetivitivlrchwkdkplvkvtffqngkskkkfsrsdpnfsipqahsgdyhctgnightystskpvtitvqap
,其中173是缩短序列的长度,后跟序列

df样本:

shortened_mobidb_consensus;sequence
[[45, 45, 'D'], [46, 49, 'C'], [50, 66, 'S'], [67, 101, 'C'], [102, 103, 'S'], [104, 106, 'C'], [107, 108, 'S'], [109, 120, 'C'], [121, 121, 'S'], [122, 123, 'C'], [124, 140, 'S'], [141, 149, 'C'], [150, 176, 'S'], [177, 178, 'C'], [179, 181, 'S'], [182, 194, 'C'], [195, 213, 'S'], [214, 217, 'C']];MGILSFLPVLATESDWADCKSPQPWGHMLLWTAVLFLAPVAGTPAAPPKAVLKLEPQWINVLQEDSVTLTCRGTHSPESDSIQWFHNGNLIPTHTQPSYRFKANNNDSGEYTCQTGQTSLSDPVHLTVLSEWLVLQTPHLEFQEGETIVLRCHSWKDKPLVKVTFFQNGKSKKFSRSDPNFSIPQANHSHSGDYHCTGNIGYTLYSSKPVTITVQAPSSSPMGIIVAVVTGIAVAAIVAAVVALIYCRKKRISALPGYPECREMGETLPEKPANPTNPDEADKVGAENTITYSLLMHPDALEEPDDQNRI
[[1, 1, 'D'], [2, 143, 'S'], [144, 145, 'C']];MGKGKPRGLNSARKLRVHRRNNRWAETTYKKRLLGTAFKSSPFGGSSHAKGIVLEKIGIESKQPNSAIRKCVRVQLIKNGKKVTAFVPNDGCLNFVDENDEVLLAGFGRKGKAKGDIPGVRFKVVKVSGVSLLALWKEKKEKPRS
[[1, 145, 'S']];MGKGKPRGLNSARKLRVHRRNNRWAETTYKKRLLGTAFKSSPFGGSSHAKGIVLEKIGIESKQPNSAIRKCVRVQLIKNGKKVTAFVPNDGCLNFVDENDEVLLAGFGRKGKAKGDIPGVRFKVVKVSGVSLLALWKEKKEKPRS
[[1, 1, 'D'], [2, 2, 'C'], [3, 37, 'S'], [38, 39, 'C'], [40, 40, 'S'], [41, 41, 'C'], [42, 62, 'S'], [63, 65, 'C'], [66, 231, 'S']];MSKNILVLGGSGALGAEVVKFFKSKSWNTISIDFRENPNADHSFTIKDSGEEEIKSVIEKINSKSIKVDTFVCAAGGWSGGNASSDEFLKSVKGMIDMNLYSAFASAHIGAKLLNQGGLFVLTGASAALNRTSGMIAYGATKAATHHIIKDLASENGGLPAGSTSLGILPVTLDTPTNRKYMSDANFDDWTPLSEVAEKLFEWSTNSDSRPTNGSLVKFETKSKVTTWTNL
[[24, 29, 'D'], [30, 91, 'S'], [92, 92, 'D']];MKVSTTALAVLLCTMTLCNQVFSAPYGADTPTACCFSYSRKIPRQFIVDYFETSSLCSQPGVIFLTKRNRQICADSKETWVQEYITDLELNA
解决方案1: 解决方案2:(修复代码)
工作代码的输出是什么?“res”是什么?上面的代码更新了输出!此外,
df
的示例也是必要的,而且,
ast
未定义。基本上,这个问题中有很多遗漏的部分。ast被导入到我的Jupiter笔记本的顶部。数据框有40000行长,我将在上面添加示例。仍然有相同的问题!好的,让我们等待
df
。现在还不清楚问题发生在哪里,我已经添加了上面相关列的一个片段。
df = pd.read_csv('stringsample.txt',sep=';',converters={0:ast.literal_eval})

for index, row in df.iterrows():
    new_s = ''
    res = row.shortened_mobidb_consensus
    for item in res:
        new_s += row.sequence[item[0]-1:item[1]]
    df.loc[index,'output'] = new_s
df['output']
0    AAPPKAVLKLEPQWINVLQEDSVTLTCRGTHSPESDSIQWFHNGNL...
1    MGKGKPRGLNSARKLRVHRRNNRWAETTYKKRLLGTAFKSSPFGGS...
2    MGKGKPRGLNSARKLRVHRRNNRWAETTYKKRLLGTAFKSSPFGGS...
3    MSKNILVLGGSGALGAEVVKFFKSKSWNTISIDFRENPNADHSFTI...
4    APYGADTPTACCFSYSRKIPRQFIVDYFETSSLCSQPGVIFLTKRN...
Name: output, dtype: object
df = pd.read_csv('stringsample.txt',sep=';')
shortenedSeq_list =[] 
counter=0
stringstring=[]
for rows in df.itertuples():
    strSeq2 = rows.sequence
    strremove2 = rows.shortened_mobidb_consensus
    res = ast.literal_eval(strremove2)
    new_s = ''
    for item in res:
        new_s += strSeq2[item[0]-1:item[1]]
    stringstring.append(new_s)
stringstring
['AAPPKAVLKLEPQWINVLQEDSVTLTCRGTHSPESDSIQWFHNGNLIPTHTQPSYRFKANNNDSGEYTCQTGQTSLSDPVHLTVLSEWLVLQTPHLEFQEGETIVLRCHSWKDKPLVKVTFFQNGKSKKFSRSDPNFSIPQANHSHSGDYHCTGNIGYTLYSSKPVTITVQAP',
 'MGKGKPRGLNSARKLRVHRRNNRWAETTYKKRLLGTAFKSSPFGGSSHAKGIVLEKIGIESKQPNSAIRKCVRVQLIKNGKKVTAFVPNDGCLNFVDENDEVLLAGFGRKGKAKGDIPGVRFKVVKVSGVSLLALWKEKKEKPRS',
 'MGKGKPRGLNSARKLRVHRRNNRWAETTYKKRLLGTAFKSSPFGGSSHAKGIVLEKIGIESKQPNSAIRKCVRVQLIKNGKKVTAFVPNDGCLNFVDENDEVLLAGFGRKGKAKGDIPGVRFKVVKVSGVSLLALWKEKKEKPRS',
 'MSKNILVLGGSGALGAEVVKFFKSKSWNTISIDFRENPNADHSFTIKDSGEEEIKSVIEKINSKSIKVDTFVCAAGGWSGGNASSDEFLKSVKGMIDMNLYSAFASAHIGAKLLNQGGLFVLTGASAALNRTSGMIAYGATKAATHHIIKDLASENGGLPAGSTSLGILPVTLDTPTNRKYMSDANFDDWTPLSEVAEKLFEWSTNSDSRPTNGSLVKFETKSKVTTWTNL',
 'APYGADTPTACCFSYSRKIPRQFIVDYFETSSLCSQPGVIFLTKRNRQICADSKETWVQEYITDLELNA']