Warning: file_get_contents(/data/phpspider/zhask/data//catemap/6/xamarin/3.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Pandas:根据从其他列中提取的子字符串截断列中的字符串(Python 3)_Python_Arrays_Python 3.x_Pandas_List - Fatal编程技术网

Pandas:根据从其他列中提取的子字符串截断列中的字符串(Python 3)

Pandas:根据从其他列中提取的子字符串截断列中的字符串(Python 3),python,arrays,python-3.x,pandas,list,Python,Arrays,Python 3.x,Pandas,List,我有一个包含两个相关列的数据框架,“rm_word”和“article” 数据样本: ,grouping,fts,article,rm_word 0,"1",fts,"This is the article. This is a sentence. This is a sentence. This is a sentence. This goes on for awhile and that's super ***crazy***. It goes on and o

我有一个包含两个相关列的数据框架,“rm_word”和“article”

数据样本:

,grouping,fts,article,rm_word
0,"1",fts,"This is the article. This is a sentence. This is a sentence. This is a sentence. This goes on for awhile and that's super ***crazy***. It goes on and on.",crazy
我想查询每个“文章”的最后100个字符,以确定其行中相应的“rm_单词”是否出现。如果是这样,那么我想删除出现“rm_单词”的整个句子,以及“文章”后面的所有句子

期望的结果(当“疯狂”是“rm_词”时):

此掩码能够确定文章何时包含“rm_单词”,但我在句子删除位方面遇到了问题

mask = ([ (str(a) in b[-100:].lower()) for a,b in zip(df["rm_word"], df["article"])])

print (df.loc[mask])
任何帮助都将不胜感激!非常感谢。

这行吗

df = pd.DataFrame(
    columns=['article', 'rm_word'],
    data=[["This is the article. This is a sentence. This is a sentence. This is a sentence.", 'crazy'],
          ["This is the article. This is a sentence. This is a sentence. This is a sentence. This goes on for awhile and that's super crazy. It goes on and on.", 'crazy']]
)

def clean_article(x):
    if x['rm_word'] not in x['article'][-100:].lower():
        return x
    article = x['article'].rsplit(x['rm_word'])[0]
    article = article.split('.')[:-1]
    x['article'] = '.'.join(article) + '.'
    return x


df = df.apply(lambda x: clean_article(x), axis=1)
df['article'].values
返回

array(['This is the article. This is a sentence. This is a sentence. This is a sentence.',
       'This is the article. This is a sentence. This is a sentence. This is a sentence.'],
      dtype=object)

您是否还要删除
rm\u单词周围的
“***”
?@kait我手动添加星号只是为了强调rm\u单词。整个句子(以及后面的所有内容)都应该删除。
array(['This is the article. This is a sentence. This is a sentence. This is a sentence.',
       'This is the article. This is a sentence. This is a sentence. This is a sentence.'],
      dtype=object)