Python 从数据帧中查找字符串中的子字符串索引
我有一个包含两列(和许多行)的数据帧,一列是完整序列,另一列包含子序列。 我想找到完整序列中子序列开始位置的索引,并将其添加为另一列: 我试过这个:Python 从数据帧中查找字符串中的子字符串索引,python,pandas,dataframe,indexing,Python,Pandas,Dataframe,Indexing,我有一个包含两列(和许多行)的数据帧,一列是完整序列,另一列包含子序列。 我想找到完整序列中子序列开始位置的索引,并将其添加为另一列: 我试过这个: df["start"] = df.sequence.index(df.sub_sequence) 但这将返回:TypeError:“RangeIndex”对象不可调用 我做错了什么 以下是df和我希望结束的df: 示例数据帧: import pandas as pd data = {"sequence"
df["start"] = df.sequence.index(df.sub_sequence)
但这将返回:TypeError:“RangeIndex”对象不可调用
我做错了什么
以下是df和我希望结束的df:
示例数据帧:
import pandas as pd
data = {"sequence": ["abcde","fghij","klmno"], "sub_sequence": ["cde", "gh", "no"]}
df = pd.DataFrame (data, columns = ['sequence','sub_sequence'])
sequence sub_sequence
0 abcde cde
1 fghij gh
2 klmno no
data2 = {"sequence": ["abcde","fghij","klmno"], "sub_sequence": ["cde", "gh", "no"], "start": [2,1,3]}
df2 = pd.DataFrame (data2, columns = ['sequence','sub_sequence','start'])
sequence sub_sequence start
0 abcde cde 2
1 fghij gh 1
2 klmno no 3
预期结果:
import pandas as pd
data = {"sequence": ["abcde","fghij","klmno"], "sub_sequence": ["cde", "gh", "no"]}
df = pd.DataFrame (data, columns = ['sequence','sub_sequence'])
sequence sub_sequence
0 abcde cde
1 fghij gh
2 klmno no
data2 = {"sequence": ["abcde","fghij","klmno"], "sub_sequence": ["cde", "gh", "no"], "start": [2,1,3]}
df2 = pd.DataFrame (data2, columns = ['sequence','sub_sequence','start'])
sequence sub_sequence start
0 abcde cde 2
1 fghij gh 1
2 klmno no 3
在列表中使用和:
df['start'] = [seq.index(sub) for seq, sub in zip(df['sequence'], df['sub_sequence'])]
或沿轴=1使用:
结果:
sequence sub_sequence start
0 abcde cde 2
1 fghij gh 1
2 klmno no 3