Python 将列拆分为多行，其中拆分取决于另一列的值_Python_Pandas_Split

Python 将列拆分为多行，其中拆分取决于另一列的值

python pandas

Python 将列拆分为多行，其中拆分取决于另一列的值,python,pandas,split,Python,Pandas,Split,我有一个熊猫df，看起来像这样： id text 10000 Hi, how are you? [10000] Good thanks, yourself? [10000] I'm great. 20000 Is it hot there today? [20000] No, it's raining. [2000] Oh, too bad! 30000 What's your name [3000] Steve, and yours? [3000] Rit

我有一个熊猫df，看起来像这样：

id        text
10000     Hi, how are you? [10000] Good thanks, yourself? [10000] I'm great.
20000     Is it hot there today? [20000] No, it's raining. [2000] Oh, too bad!
30000     What's your name [3000] Steve, and yours? [3000] Rita.

以下是df：

df = pd.DataFrame([
    [1000, "Hi, how are you? [10000] Good thanks, yourself? [10000] I'm great."],
    [2000, "Is it hot there today? [20000] No, it's raining. [2000] Oh, too bad!"],
    [3000, "What's your name [3000] Steve, and yours? [3000] Rita."]], columns=['id', 'text'])

我想添加一个新列，根据“id”列中的值将“text”列拆分为一个列表

id        text                                               lines

10000     "Hi, how are you? [10000] Good thanks, yourself?   ["Hi, how are you?", 
          [10000] I'm great."                                 "Good thanks, ...",
                                                              "I'm great."]
20000     Is it hot there today? [20000] No, it's raining.  ["Is it hot there ...",
          [2000] Oh, too bad!                                "No, it's raining.",
                                                             "Oh, too bad!"]
30000     What's your name? [3000] Steve, and yours?        ["What's your name?",
          [3000] Rita.                                        "Steve, and yours?",
                                                              "Rita."]

我试过这个：

df ['lines'] = df.apply(lambda x: x['text'].split(x['id']))

但我得到一个关键错误：

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
pandas/index.pyx in pandas.index.IndexEngine.get_loc 
(pandas/index.c:4279)()

pandas/src/hashtable_class_helper.pxi in 
pandas.hashtable.Int64HashTable.get_item (pandas/hashtable.c:8543)()

TypeError: an integer is required

During handling of the above exception, another exception occurred:

KeyError                                  Traceback (most recent call 
last)
<ipython-input-14-e50f764c5674> in <module>()
----> 1 df ['lines'] = df.apply(lambda x: x['text'].split(x['id']))


KeyError: ('text', 'occurred at index id')

---------------------------------------------------------------------------
TypeError回溯（最近一次调用上次）
pandas.index.IndexEngine.get_loc中的pandas/index.pyx
（熊猫/索引c:4279）（）
中的pandas/src/hashtable\u class\u helper.pxi
pandas.hashtable.Int64HashTable.get_项（pandas/hashtable.c:8543）（）
TypeError：需要一个整数
在处理上述异常期间，发生了另一个异常：
KeyError回溯（最近的呼叫
最后)
在（）
---->1 df['lines']=df.apply（lambda x:x['text'].split（x['id']））
KeyError:（'text'，'发生在索引id'处）

使用轴=1和适当的分隔符

In [548]: df.apply(lambda x: x['text'].split(' [%s] ' % x['id']), axis=1)
Out[548]:
0    [Hi, how are you?, Good thanks, yourself?, I'm...
1    [Is it hot there today?, No, it's raining., Oh...
2         [What's your name, Steve, and yours?, Rita.]
dtype: object