Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/365.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 从数据帧中查找字符串中的子字符串索引_Python_Pandas_Dataframe_Indexing - Fatal编程技术网

Python 从数据帧中查找字符串中的子字符串索引

Python 从数据帧中查找字符串中的子字符串索引,python,pandas,dataframe,indexing,Python,Pandas,Dataframe,Indexing,我有一个包含两列(和许多行)的数据帧,一列是完整序列,另一列包含子序列。 我想找到完整序列中子序列开始位置的索引,并将其添加为另一列: 我试过这个: df["start"] = df.sequence.index(df.sub_sequence) 但这将返回:TypeError:“RangeIndex”对象不可调用 我做错了什么 以下是df和我希望结束的df: 示例数据帧: import pandas as pd data = {"sequence"

我有一个包含两列(和许多行)的数据帧,一列是完整序列,另一列包含子序列。

我想找到完整序列中子序列开始位置的索引,并将其添加为另一列:

我试过这个:

df["start"] = df.sequence.index(df.sub_sequence)
但这将返回:
TypeError:“RangeIndex”对象不可调用

我做错了什么

以下是df和我希望结束的df:

示例数据帧:

import pandas as pd 

data = {"sequence": ["abcde","fghij","klmno"], "sub_sequence": ["cde", "gh", "no"]}    
df = pd.DataFrame (data, columns = ['sequence','sub_sequence'])

  sequence sub_sequence
0    abcde          cde
1    fghij           gh
2    klmno           no
data2 = {"sequence": ["abcde","fghij","klmno"], "sub_sequence": ["cde", "gh", "no"], "start": [2,1,3]}
df2 = pd.DataFrame (data2, columns = ['sequence','sub_sequence','start'])

  sequence sub_sequence  start
0    abcde          cde      2
1    fghij           gh      1
2    klmno           no      3

预期结果:

import pandas as pd 

data = {"sequence": ["abcde","fghij","klmno"], "sub_sequence": ["cde", "gh", "no"]}    
df = pd.DataFrame (data, columns = ['sequence','sub_sequence'])

  sequence sub_sequence
0    abcde          cde
1    fghij           gh
2    klmno           no
data2 = {"sequence": ["abcde","fghij","klmno"], "sub_sequence": ["cde", "gh", "no"], "start": [2,1,3]}
df2 = pd.DataFrame (data2, columns = ['sequence','sub_sequence','start'])

  sequence sub_sequence  start
0    abcde          cde      2
1    fghij           gh      1
2    klmno           no      3
在列表中使用和:

df['start'] = [seq.index(sub) for seq, sub in zip(df['sequence'], df['sub_sequence'])]
或沿
轴=1使用:

结果:

  sequence sub_sequence  start
0    abcde          cde      2
1    fghij           gh      1
2    klmno           no      3