Python 爆炸柱

Python 爆炸柱,python,pandas,Python,Pandas,我有以下数据集: Date Text 2020/05/12 Include details about your goal 2020/05/12 Describe expected and actual results 2020/05/13 Include any error messages 2020/05/13 The community is here to help you 2020/05/14 Avoid asking opinio

我有以下数据集:

Date            Text
2020/05/12    Include details about your goal
2020/05/12    Describe expected and actual results
2020/05/13    Include any error messages
2020/05/13    The community is here to help you 
2020/05/14    Avoid asking opinion-based questions.
我清除了标点符号,停止词。。。为了准备爆炸:

    stop_words = stopwords.words('english')


# punctuation to remove
    punctuation = string.punctuation.replace("'", '')  # don't remove apostrophe from strings
    punc = r'[{}]'.format(punctuation)

df.Text = df.Text.str.replace('\d+', '')  # remove numbers
    df.Text =df.Text.str.replace(punc, ' ')  # remove punctuation except apostrophe
   df.Text = df.Text.str.replace('\\s+', ' ')  # remove occurrences of more than one whitespace
    df.Text = df.Text.str.strip()  # remove whitespace from beginning and end of string
   df.Text = df.Text.str.lower()  # convert all to lowercase
    df.dropna(inplace=True)
    df.Text=df.Text.apply(lambda x: list(word for word in x.split() if word not in stop_words))  # remove words
    
但是,它仅适用于第一行,而不适用于所有行。 下一步是

df_1 = df.explode('Text')
你能告诉我怎么了吗

第一行拆分如下:

Text                                   New_Text (to show the difference after cleaning the text)
Include details about your goal    ['include','details','goal']
我没有其他行(因此没有以“描述…”或“避免…”开头的行)。 在我的日期集中,我有1942行,但在清理文本后只返回1行

更新:

输出示例:

Date            Text
    2020/05/12    Include 
    2020/05/12    details 
    2020/05/12    goal
     ....         ...
固定问题(不适用,但应适用):

我认为下面的代码应该允许我得到这个结果:

(pd.melt(test.Text.apply(pd.Series).reset_index(), 
             id_vars=['Date'],
             value_name='Text')
     .set_index(['Date'])
     .drop('variable', axis=1)
     .dropna()
     .sort_index()
     )

要将日期转换为索引,请执行以下操作:
test=test.set_index(['Date'])

随着问题的更新,代码再次被修改。当date列和word列垂直展开时,您所需的输出得到了响应

import pandas as pd
import numpy as np
import io

data = '''
Date Text
2020/05/12 "Include details about your goal"
2020/05/12 "Describe expected and actual results"
2020/05/13 "Include any error messages"
2020/05/13 "The community is here to help you" 
2020/05/14 "Avoid asking opinion-based questions."
'''

test = pd.read_csv(io.StringIO(data), sep='\s+')
test.set_index('Date',inplace=True)
expand_df = test['Text'].str.split(' ', expand=True)
expand_df.reset_index(inplace=True)
expand_df = pd.melt(expand_df, id_vars='Date', value_vars=np.arange(6), value_name='text')
expand_df.dropna(axis=0, inplace=True, )
expand_df = expand_df[['Date', 'text']]
expand_df
    Date    text
0   2020/05/12  Include
1   2020/05/12  Describe
2   2020/05/13  Include
3   2020/05/13  The
4   2020/05/14  Avoid
5   2020/05/12  details
6   2020/05/12  expected
7   2020/05/13  any
8   2020/05/13  community
9   2020/05/14  asking
10  2020/05/12  about
11  2020/05/12  and
12  2020/05/13  error
13  2020/05/13  is
14  2020/05/14  opinion-based
15  2020/05/12  your
16  2020/05/12  actual
17  2020/05/13  messages
18  2020/05/13  here
19  2020/05/14  questions.
20  2020/05/12  goal
21  2020/05/12  results
23  2020/05/13  to
28  2020/05/13  help

清除df.Text后,我有标记,例如:[“包括”、“详细信息”、“目标”]。其他行也一样。你能告诉我所有的步骤吗?(我在问题中写的那些,只是为了确保我在正确的步骤中应用了df.Text.str.split?感谢
df.Text.str.split()的结果)
是从问题顶部的数据粘贴而来的。在转换之前,如果没有数据,我无法向您展示该过程。问题是,这对于第一行来说效果很好,不幸的是,对于另一行来说效果不好。关于向问题添加第一个字符串的演示?修复了代码。您编写的代码是添加的代码吗?