Python 爆炸柱_Python_Pandas - Fatal编程技术网

Python 爆炸柱

python pandas

Python 爆炸柱,python,pandas,Python,Pandas,我有以下数据集： Date Text 2020/05/12 Include details about your goal 2020/05/12 Describe expected and actual results 2020/05/13 Include any error messages 2020/05/13 The community is here to help you 2020/05/14 Avoid asking opinio

我有以下数据集：

Date            Text
2020/05/12    Include details about your goal
2020/05/12    Describe expected and actual results
2020/05/13    Include any error messages
2020/05/13    The community is here to help you 
2020/05/14    Avoid asking opinion-based questions.

我清除了标点符号，停止词。。。为了准备爆炸：

    stop_words = stopwords.words('english')


# punctuation to remove
    punctuation = string.punctuation.replace("'", '')  # don't remove apostrophe from strings
    punc = r'[{}]'.format(punctuation)

df.Text = df.Text.str.replace('\d+', '')  # remove numbers
    df.Text =df.Text.str.replace(punc, ' ')  # remove punctuation except apostrophe
   df.Text = df.Text.str.replace('\\s+', ' ')  # remove occurrences of more than one whitespace
    df.Text = df.Text.str.strip()  # remove whitespace from beginning and end of string
   df.Text = df.Text.str.lower()  # convert all to lowercase
    df.dropna(inplace=True)
    df.Text=df.Text.apply(lambda x: list(word for word in x.split() if word not in stop_words))  # remove words

但是，它仅适用于第一行，而不适用于所有行。下一步是

df_1 = df.explode('Text')

你能告诉我怎么了吗

第一行拆分如下：

Text                                   New_Text (to show the difference after cleaning the text)
Include details about your goal    ['include','details','goal']

我没有其他行（因此没有以“描述…”或“避免…”开头的行）。在我的日期集中，我有1942行，但在清理文本后只返回1行

更新：

输出示例：

Date            Text
    2020/05/12    Include 
    2020/05/12    details 
    2020/05/12    goal
     ....         ...

固定问题（不适用，但应适用）：

我认为下面的代码应该允许我得到这个结果：

(pd.melt(test.Text.apply(pd.Series).reset_index(), 
             id_vars=['Date'],
             value_name='Text')
     .set_index(['Date'])
     .drop('variable', axis=1)
     .dropna()
     .sort_index()
     )

要将日期转换为索引，请执行以下操作：

test=test.set_index（['Date']）

随着问题的更新，代码再次被修改。当date列和word列垂直展开时，您所需的输出得到了响应

import pandas as pd
import numpy as np
import io

data = '''
Date Text
2020/05/12 "Include details about your goal"
2020/05/12 "Describe expected and actual results"
2020/05/13 "Include any error messages"
2020/05/13 "The community is here to help you" 
2020/05/14 "Avoid asking opinion-based questions."
'''

test = pd.read_csv(io.StringIO(data), sep='\s+')
test.set_index('Date',inplace=True)
expand_df = test['Text'].str.split(' ', expand=True)
expand_df.reset_index(inplace=True)
expand_df = pd.melt(expand_df, id_vars='Date', value_vars=np.arange(6), value_name='text')
expand_df.dropna(axis=0, inplace=True, )
expand_df = expand_df[['Date', 'text']]
expand_df
    Date    text
0   2020/05/12  Include
1   2020/05/12  Describe
2   2020/05/13  Include
3   2020/05/13  The
4   2020/05/14  Avoid
5   2020/05/12  details
6   2020/05/12  expected
7   2020/05/13  any
8   2020/05/13  community
9   2020/05/14  asking
10  2020/05/12  about
11  2020/05/12  and
12  2020/05/13  error
13  2020/05/13  is
14  2020/05/14  opinion-based
15  2020/05/12  your
16  2020/05/12  actual
17  2020/05/13  messages
18  2020/05/13  here
19  2020/05/14  questions.
20  2020/05/12  goal
21  2020/05/12  results
23  2020/05/13  to
28  2020/05/13  help

清除df.Text后，我有标记，例如：[“包括”、“详细信息”、“目标”]。其他行也一样。你能告诉我所有的步骤吗？（我在问题中写的那些，只是为了确保我在正确的步骤中应用了df.Text.str.split？感谢

df.Text.str.split（）的结果）

是从问题顶部的数据粘贴而来的。在转换之前，如果没有数据，我无法向您展示该过程。问题是，这对于第一行来说效果很好，不幸的是，对于另一行来说效果不好。关于向问题添加第一个字符串的演示？修复了代码。您编写的代码是添加的代码吗？