Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/306.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python AttributeError:(“str';对象没有属性';str';,”发生在索引31978';)_Python_Pandas - Fatal编程技术网

Python AttributeError:(“str';对象没有属性';str';,”发生在索引31978';)

Python AttributeError:(“str';对象没有属性';str';,”发生在索引31978';),python,pandas,Python,Pandas,我对熊猫很陌生(只有几天时间接触到它),尽管我仍在学习和探索如何使用熊猫。我有一个很大的csv文件,由数十万行组成。我的目标是基于多列将多行连接成一行。最重要的是,通过引用日期/时间,以及以后需要包含的日期/时间。下面演示我的csv文件 Body UDH Original Sender ID Received Date/Time Hi John, Can You ABC0

我对熊猫很陌生(只有几天时间接触到它),尽管我仍在学习和探索如何使用熊猫。我有一个很大的csv文件,由数十万行组成。我的目标是基于多列将多行连接成一行。最重要的是,通过引用日期/时间,以及以后需要包含的日期/时间。下面演示我的csv文件

       Body                      UDH               Original Sender ID           Received Date/Time
Hi John, Can You            ABC0010101                  GGQMS                   01/02/2001 01:03:19
Wait A moment?              ABC0010102                  GGQMS                   01/02/2001 01:03:20
Whats is                    050004000111              112233445566              01/03/2001 11:16:01
Carrine Doing               050004000112              112233445566              01/03/2001 11:16:01
Over There?                 050004000113              112233445566              01/03/2001 11:16:02
Where is                    CD10F1011                   zwerty                  01/03/2001 15:22:10
Your Homework?              CD10F1012                   zwerty                  01/03/2001 15:22:11
Order for Pizza             AACCDD55001               112233445566              01/04/2001 19:20:21
Now for cheap $.            AACCDD55002               112233445566              01/04/2001 19:20:22
John, you know              G0500781                    GGQMS                   01/04/2001 10:21:21
Where can I get it?         G0500782                    GGQMS                   01/04/2001 10:21:21
如你所见,上面是我的csv文件。这里的UDH作为主键,根据字符数(从第一个到最后第二个)我们可以识别主体所属的位置。另一部分是接收日期/时间,身体的第二部分延迟1秒或可能超过1秒

我已经成功地连接了身体,但是,某些身体由第三部分组成,我没有完全连接身体

以下是我目前的代码:

 def problem3():
    filep2 = pd.read_csv(r'/Users/John/Downloads/Practice1/my_r.csv')

    #data cleaning
    filep2['Received Date/Time']= filep2['Received Date/Time'].astype('datetime64[ns]')
    filep2['UDH']=filep2['UDH'].astype(object)
    filep2['Original Sender ID']=filep2['Original Sender ID'].astype(object)
    filep2['Account User Name']=filep2['Account User Name'].astype(object)
    filep2['Body']=filep2['Body'].astype(str)
    filep2['UDH']=filep2['UDH'].str.strip()
    df = pd.DataFrame(filep2)

    #Filter null row in UDH column
    df=df[df['UDH'].notnull()]
    df=df.sort_values(by ='UDH')

    df['Body'] = df.apply(multiple_condition, axis=1)    
    df.to_csv(r'/Users/John/Downloads/Practice1/my_c.csv', index=False, header=True) 

def multiple_condition (df):
    if (df['UDH'].str.len() == 8):
         df=df.groupby(df[['UDH'].str[:7],'Original Sender ID','Received Date/Time'])['Body'].apply(' '.join).reset_index()
         return df
    elif (df['UDH'].str.len() == 9):
         df= df.groupby(df[['UDH'].str[:8],'Original Sender ID','Received Date/Time'])['Body'].apply(' '.join).reset_index() 
         return df
    elif (df['UDH'].str.len() == 10):
         df= df.groupby(df[['UDH'].str[:9],'Original Sender ID','Received Date/Time'])['Body'].apply(' '.join).reset_index()
         return df
    elif (df['UDH'].str.len() == 11):
         df=df.groupby(df[['UDH'].str[:10],'Original Sender ID','Received Date/Time'])['Body'].apply(' '.join).reset_index() 
         return df
    elif (df['UDH'].str.len() == 12):
         df=df.groupby(df[['UDH'].str[:11],'Original Sender ID','Received Date/Time'])['Body'].apply(' '.join).reset_index() 
         return df
上述代码给出了作为本主题/票据主题的错误。错误信息如下所述

更新的错误消息

  Traceback (most recent call last):

  File "<ipython-input-85-8ca58b5f49ad>", line 1, in <module>
    runfile('/Users/syafiq/Downloads/RoutingPractice01.py', wdir='/Users/syafiq/Downloads')

  File "/Users/John/opt/anaconda3/lib/python3.7/site-packages/spyder_kernels/customize/spydercustomize.py", line 827, in runfile
    execfile(filename, namespace)

  File "/Users/John/opt/anaconda3/lib/python3.7/site-packages/spyder_kernels/customize/spydercustomize.py", line 110, in execfile
    exec(compile(f.read(), filename, 'exec'), namespace)

  File "/Users/John/Downloads/RoutingPractice01.py", line 79, in <module>
    problem3()

  File "/Users/John/Downloads/RoutingPractice01.py", line 35, in problem3
    filep2['Received Date/Time']= filep2['Received Date/Time'].astype('datetime64[ns]')

  File "/Users/John/opt/anaconda3/lib/python3.7/site-packages/pandas/core/frame.py", line 2980, in __getitem__
    indexer = self.columns.get_loc(key)

  File "/Users/John/opt/anaconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2899, in get_loc
    return self._engine.get_loc(self._maybe_cast_indexer(key))

  File "pandas/_libs/index.pyx", line 107, in pandas._libs.index.IndexEngine.get_loc

  File "pandas/_libs/index.pyx", line 131, in pandas._libs.index.IndexEngine.get_loc

  File "pandas/_libs/hashtable_class_helper.pxi", line 1607, in pandas._libs.hashtable.PyObjectHashTable.get_item

  File "pandas/_libs/hashtable_class_helper.pxi", line 1614, in pandas._libs.hashtable.PyObjectHashTable.get_item

KeyError: 'Received Date/Time'
注意:我已尝试通过多种方式获得上述所需的输出,但仍无法解决/出现错误。我已经尝试了无数次不同的方法,仍然没有骰子,继续撞砖墙。UDH是身体组的标识符

我对熊猫还是新手,已经有一段时间没有被蟒蛇弄脏手了。如果有人能指出我哪里做错了,我真的很感激。如果您能帮助我获得所需的产量,我将不胜感激

非常感谢,非常感谢!:)

我不需要
apply()
而直接使用
groupby()

我只使用
io.StringIO()
来模拟文件

text = '''       Body                      UDH               Original Sender ID           Received Date/Time
Hi John, Can You            ABC0010101                  GGQMS                   01/02/2001 01:03:19
Wait A moment?              ABC0010102                  GGQMS                   01/02/2001 01:03:20
Whats is                    050004000111              112233445566              01/03/2001 11:16:01
Carrine Doing               050004000112              112233445566              01/03/2001 11:16:01
Over There?                 050004000113              112233445566              01/03/2001 11:16:02
Where is                    CD10F1011                   zwerty                  01/03/2001 15:22:10
Your Homework?              CD10F1012                   zwerty                  01/03/2001 15:22:11
Order for Pizza             AACCDD55001               112233445566              01/04/2001 19:20:21
Now for cheap $.            AACCDD55002               112233445566              01/04/2001 19:20:22
John, you know              G0500781                    GGQMS                   01/04/2001 10:21:21
Where can I get it?         G0500782                    GGQMS                   01/04/2001 10:21:21'''

import pandas as pd
import io

df = pd.read_csv(io.StringIO(text), sep='\s{2,}')

#df['Received Date/Time'] = df['Received Date/Time'].astype('datetime64[ns]')
#df['UDH'] = df['UDH'].astype(object)
#df['Original Sender ID'] = df['Original Sender ID'].astype(object)
#df['Account User Name'] = df['Account User Name'].astype(object)
#df['Body'] = df['Body'].astype(str)
#df['UDH'] = df['UDH'].str.strip()

#Filter null row in UDH column
#df = df[df['UDH'].notnull()]
#df = df.sort_values(by ='UDH')

#groups = df.groupby([df['UDH'].str[:-1], 'Original Sender ID'])
#for name, data in groups:
    #print(name)
#    data['Received Date/Time'] = data['Received Date/Time'].min()
    #print(data)

groups = df.groupby([df['UDH'].str[:-1], 'Original Sender ID'])
df2 = groups.agg({'Body':' '.join, 'Received Date/Time':max}).reset_index()

#groups = df.groupby([df['UDH'].str[:-1]])
#df2 = groups.agg({'Body':' '.join, 'Received Date/Time':max, 'Original Sender ID':min}).reset_index()

df2 = df2.sort_values('Received Date/Time')

pd.options.display.width = 200
print(df2)
结果

           UDH Original Sender ID                                Body   Received Date/Time
2    ABC001010              GGQMS     Hi John, Can You Wait A moment?  01/02/2001 01:03:20
0  05000400011       112233445566  Whats is Carrine Doing Over There?  01/03/2001 11:16:02
3     CD10F101             zwerty             Where is Your Homework?  01/03/2001 15:22:11
4      G050078              GGQMS  John, you know Where can I get it?  01/04/2001 10:21:21
1   AACCDD5500       112233445566    Order for Pizza Now for cheap $.  01/04/2001 19:20:22

始终将完整的错误消息(从单词“Traceback”开始)作为文本(而不是屏幕截图)进行讨论(不是评论)。还有其他有用的信息。Hi@furas根据要求,我已经更新了帖子。如果它显示字符串没有
.str
,那么您应该尝试不使用
.str
?也就是说,
len(df['UDH'])==9
BTW:可能在
多个条件下
你应该使用
.str[:-1]
而不是
.str[:7]
.str[:8]
等。也许它会将所有
if/elif
减少到一行代码(没有
if/elif
)。稍等一下@furas,我正在尝试修复它。目前,根据您的建议更新了我的代码。它确实解决了这个错误,但是body列似乎是空的。我对编程有点陌生,所以我在修复和找出哪一行出错方面有点慢。首先,我真的非常感谢你的帮助。然而,在我尝试使用你的代码后,它给了我一个错误,我已经更新了我的帖子。我以前从未使用过导入IO。谢谢你的介绍!您的错误
KeyError:“接收日期/时间”
可能意味着您获取的文件中没有列
“接收日期/时间”
-您可以检查您读取的内容。我只使用了
IO
来模拟文件,并将示例数据直接放到代码中,这样其他人就可以运行它,而无需使用数据创建文件。您可以从文件中读取数据,而不必使用
io.StringIO()
我要感谢您在帮助我解决此错误以及知识共享方面给予的慷慨时间。我非常感激!:D非常感谢你!这是一个有趣的问题。
text = '''       Body                      UDH               Original Sender ID           Received Date/Time
Hi John, Can You            ABC0010101                  GGQMS                   01/02/2001 01:03:19
Wait A moment?              ABC0010102                  GGQMS                   01/02/2001 01:03:20
Whats is                    050004000111              112233445566              01/03/2001 11:16:01
Carrine Doing               050004000112              112233445566              01/03/2001 11:16:01
Over There?                 050004000113              112233445566              01/03/2001 11:16:02
Where is                    CD10F1011                   zwerty                  01/03/2001 15:22:10
Your Homework?              CD10F1012                   zwerty                  01/03/2001 15:22:11
Order for Pizza             AACCDD55001               112233445566              01/04/2001 19:20:21
Now for cheap $.            AACCDD55002               112233445566              01/04/2001 19:20:22
John, you know              G0500781                    GGQMS                   01/04/2001 10:21:21
Where can I get it?         G0500782                    GGQMS                   01/04/2001 10:21:21'''

import pandas as pd
import io

df = pd.read_csv(io.StringIO(text), sep='\s{2,}')

#df['Received Date/Time'] = df['Received Date/Time'].astype('datetime64[ns]')
#df['UDH'] = df['UDH'].astype(object)
#df['Original Sender ID'] = df['Original Sender ID'].astype(object)
#df['Account User Name'] = df['Account User Name'].astype(object)
#df['Body'] = df['Body'].astype(str)
#df['UDH'] = df['UDH'].str.strip()

#Filter null row in UDH column
#df = df[df['UDH'].notnull()]
#df = df.sort_values(by ='UDH')

#groups = df.groupby([df['UDH'].str[:-1], 'Original Sender ID'])
#for name, data in groups:
    #print(name)
#    data['Received Date/Time'] = data['Received Date/Time'].min()
    #print(data)

groups = df.groupby([df['UDH'].str[:-1], 'Original Sender ID'])
df2 = groups.agg({'Body':' '.join, 'Received Date/Time':max}).reset_index()

#groups = df.groupby([df['UDH'].str[:-1]])
#df2 = groups.agg({'Body':' '.join, 'Received Date/Time':max, 'Original Sender ID':min}).reset_index()

df2 = df2.sort_values('Received Date/Time')

pd.options.display.width = 200
print(df2)
           UDH Original Sender ID                                Body   Received Date/Time
2    ABC001010              GGQMS     Hi John, Can You Wait A moment?  01/02/2001 01:03:20
0  05000400011       112233445566  Whats is Carrine Doing Over There?  01/03/2001 11:16:02
3     CD10F101             zwerty             Where is Your Homework?  01/03/2001 15:22:11
4      G050078              GGQMS  John, you know Where can I get it?  01/04/2001 10:21:21
1   AACCDD5500       112233445566    Order for Pizza Now for cheap $.  01/04/2001 19:20:22