Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/342.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何逐个组合分组文本_Python_Pandas - Fatal编程技术网

Python 如何逐个组合分组文本

Python 如何逐个组合分组文本,python,pandas,Python,Pandas,我有一个如下所示的数据帧 text group 0 hello 1 1 world 1 2 it's 2 3 time 2 4 to 2 5 explore 2 6 one 3 7 more 3 8 line 3 text group

我有一个如下所示的数据帧

        text  group
0      hello      1
1      world      1
2       it's      2
3       time      2
4         to      2
5    explore      2
6        one      3
7       more      3
8       line      3
        text  group                     result
0      hello      1                      hello
1      world      1                hello world
2       it's      2                       it's
3       time      2                  it's time
4         to      2               it's time to
5    explore      2       it's time to explore
6        one      3                        one
7       more      3                   one more
8       line      3              one more line
我想把文本中的每个单词一个接一个地组合在新的列中,如下所示

        text  group
0      hello      1
1      world      1
2       it's      2
3       time      2
4         to      2
5    explore      2
6        one      3
7       more      3
8       line      3
        text  group                     result
0      hello      1                      hello
1      world      1                hello world
2       it's      2                       it's
3       time      2                  it's time
4         to      2               it's time to
5    explore      2       it's time to explore
6        one      3                        one
7       more      3                   one more
8       line      3              one more line
到目前为止我试过

df['res']=df.groupby('group')['text'].transform(lambda x: ' '.join(x))
df['result']=df[['text','res']].apply(lambda x: ' '.join( x['res'].split()[:x['res'].split().index(x['text'])+1]),axis=1)
上述代码适用于上述问题。但它也有一些问题

若我有重复的文本索引将给我第一个元素的位置,它失败的数据

        text  group                     result
0      hello      1                      hello
1      world      1                hello world
2       it's      2                       it's
3       time      2                  it's time
4         to      2               it's time to
5    explore      2       it's time to explore
6        one      3                        one
7       more      3                   one more
8       line      3              one more line
9      hello      4                      hello
10  repeated      4             hello repeated
11     hello      4                      hello #this must be hello repeated hello
12      came      4  hello repeated hello came
注意:在第4组失败

而且我的剧本显然是无效的

有人能提出一种解决我的索引问题和性能问题的方法吗


任何帮助都是值得赞赏的。

使用
字符串
的函数
cumsum
并不容易,但这里有一个可能的解决方案-首先在末尾添加空间,最后通过以下方式从右侧移除空间:

备选方案:

df['res'] = df['text'].add(' ').groupby(df['group']).transform(pd.Series.cumsum).str.rstrip()

另一个解决方案:

f = lambda x: [' '.join(x[:i]) for i in range(1, len(x)+1)]
df['res'] = df.groupby('group')['text'].transform(f)

在列表理解中使用
groupby

df['res'] = [' '.join(d.text[:i]) for _, d in df.groupby('group') for i in range(1, len(d)+1)]

print(df)
        text  group                        res
0      hello      1                      hello
1      world      1                hello world
2       it's      2                       it's
3       time      2                  it's time
4         to      2               it's time to
5    explore      2       it's time to explore
6        one      3                        one
7       more      3                   one more
8       line      3              one more line
9      hello      4                      hello
10  repeated      4             hello repeated
11     hello      4       hello repeated hello
12      came      4  hello repeated hello came