Python AttributeError:(“str';对象没有属性';str';,”发生在索引31978';)
我对熊猫很陌生(只有几天时间接触到它),尽管我仍在学习和探索如何使用熊猫。我有一个很大的csv文件,由数十万行组成。我的目标是基于多列将多行连接成一行。最重要的是,通过引用日期/时间,以及以后需要包含的日期/时间。下面演示我的csv文件Python AttributeError:(“str';对象没有属性';str';,”发生在索引31978';),python,pandas,Python,Pandas,我对熊猫很陌生(只有几天时间接触到它),尽管我仍在学习和探索如何使用熊猫。我有一个很大的csv文件,由数十万行组成。我的目标是基于多列将多行连接成一行。最重要的是,通过引用日期/时间,以及以后需要包含的日期/时间。下面演示我的csv文件 Body UDH Original Sender ID Received Date/Time Hi John, Can You ABC0
Body UDH Original Sender ID Received Date/Time
Hi John, Can You ABC0010101 GGQMS 01/02/2001 01:03:19
Wait A moment? ABC0010102 GGQMS 01/02/2001 01:03:20
Whats is 050004000111 112233445566 01/03/2001 11:16:01
Carrine Doing 050004000112 112233445566 01/03/2001 11:16:01
Over There? 050004000113 112233445566 01/03/2001 11:16:02
Where is CD10F1011 zwerty 01/03/2001 15:22:10
Your Homework? CD10F1012 zwerty 01/03/2001 15:22:11
Order for Pizza AACCDD55001 112233445566 01/04/2001 19:20:21
Now for cheap $. AACCDD55002 112233445566 01/04/2001 19:20:22
John, you know G0500781 GGQMS 01/04/2001 10:21:21
Where can I get it? G0500782 GGQMS 01/04/2001 10:21:21
如你所见,上面是我的csv文件。这里的UDH作为主键,根据字符数(从第一个到最后第二个)我们可以识别主体所属的位置。另一部分是接收日期/时间,身体的第二部分延迟1秒或可能超过1秒
我已经成功地连接了身体,但是,某些身体由第三部分组成,我没有完全连接身体
以下是我目前的代码:
def problem3():
filep2 = pd.read_csv(r'/Users/John/Downloads/Practice1/my_r.csv')
#data cleaning
filep2['Received Date/Time']= filep2['Received Date/Time'].astype('datetime64[ns]')
filep2['UDH']=filep2['UDH'].astype(object)
filep2['Original Sender ID']=filep2['Original Sender ID'].astype(object)
filep2['Account User Name']=filep2['Account User Name'].astype(object)
filep2['Body']=filep2['Body'].astype(str)
filep2['UDH']=filep2['UDH'].str.strip()
df = pd.DataFrame(filep2)
#Filter null row in UDH column
df=df[df['UDH'].notnull()]
df=df.sort_values(by ='UDH')
df['Body'] = df.apply(multiple_condition, axis=1)
df.to_csv(r'/Users/John/Downloads/Practice1/my_c.csv', index=False, header=True)
def multiple_condition (df):
if (df['UDH'].str.len() == 8):
df=df.groupby(df[['UDH'].str[:7],'Original Sender ID','Received Date/Time'])['Body'].apply(' '.join).reset_index()
return df
elif (df['UDH'].str.len() == 9):
df= df.groupby(df[['UDH'].str[:8],'Original Sender ID','Received Date/Time'])['Body'].apply(' '.join).reset_index()
return df
elif (df['UDH'].str.len() == 10):
df= df.groupby(df[['UDH'].str[:9],'Original Sender ID','Received Date/Time'])['Body'].apply(' '.join).reset_index()
return df
elif (df['UDH'].str.len() == 11):
df=df.groupby(df[['UDH'].str[:10],'Original Sender ID','Received Date/Time'])['Body'].apply(' '.join).reset_index()
return df
elif (df['UDH'].str.len() == 12):
df=df.groupby(df[['UDH'].str[:11],'Original Sender ID','Received Date/Time'])['Body'].apply(' '.join).reset_index()
return df
上述代码给出了作为本主题/票据主题的错误。错误信息如下所述
更新的错误消息
Traceback (most recent call last):
File "<ipython-input-85-8ca58b5f49ad>", line 1, in <module>
runfile('/Users/syafiq/Downloads/RoutingPractice01.py', wdir='/Users/syafiq/Downloads')
File "/Users/John/opt/anaconda3/lib/python3.7/site-packages/spyder_kernels/customize/spydercustomize.py", line 827, in runfile
execfile(filename, namespace)
File "/Users/John/opt/anaconda3/lib/python3.7/site-packages/spyder_kernels/customize/spydercustomize.py", line 110, in execfile
exec(compile(f.read(), filename, 'exec'), namespace)
File "/Users/John/Downloads/RoutingPractice01.py", line 79, in <module>
problem3()
File "/Users/John/Downloads/RoutingPractice01.py", line 35, in problem3
filep2['Received Date/Time']= filep2['Received Date/Time'].astype('datetime64[ns]')
File "/Users/John/opt/anaconda3/lib/python3.7/site-packages/pandas/core/frame.py", line 2980, in __getitem__
indexer = self.columns.get_loc(key)
File "/Users/John/opt/anaconda3/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 2899, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas/_libs/index.pyx", line 107, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 131, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1607, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1614, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: 'Received Date/Time'
注意:我已尝试通过多种方式获得上述所需的输出,但仍无法解决/出现错误。我已经尝试了无数次不同的方法,仍然没有骰子,继续撞砖墙。UDH是身体组的标识符
我对熊猫还是新手,已经有一段时间没有被蟒蛇弄脏手了。如果有人能指出我哪里做错了,我真的很感激。如果您能帮助我获得所需的产量,我将不胜感激
非常感谢,非常感谢!:) 我不需要apply()
而直接使用groupby()
我只使用io.StringIO()
来模拟文件
text = ''' Body UDH Original Sender ID Received Date/Time
Hi John, Can You ABC0010101 GGQMS 01/02/2001 01:03:19
Wait A moment? ABC0010102 GGQMS 01/02/2001 01:03:20
Whats is 050004000111 112233445566 01/03/2001 11:16:01
Carrine Doing 050004000112 112233445566 01/03/2001 11:16:01
Over There? 050004000113 112233445566 01/03/2001 11:16:02
Where is CD10F1011 zwerty 01/03/2001 15:22:10
Your Homework? CD10F1012 zwerty 01/03/2001 15:22:11
Order for Pizza AACCDD55001 112233445566 01/04/2001 19:20:21
Now for cheap $. AACCDD55002 112233445566 01/04/2001 19:20:22
John, you know G0500781 GGQMS 01/04/2001 10:21:21
Where can I get it? G0500782 GGQMS 01/04/2001 10:21:21'''
import pandas as pd
import io
df = pd.read_csv(io.StringIO(text), sep='\s{2,}')
#df['Received Date/Time'] = df['Received Date/Time'].astype('datetime64[ns]')
#df['UDH'] = df['UDH'].astype(object)
#df['Original Sender ID'] = df['Original Sender ID'].astype(object)
#df['Account User Name'] = df['Account User Name'].astype(object)
#df['Body'] = df['Body'].astype(str)
#df['UDH'] = df['UDH'].str.strip()
#Filter null row in UDH column
#df = df[df['UDH'].notnull()]
#df = df.sort_values(by ='UDH')
#groups = df.groupby([df['UDH'].str[:-1], 'Original Sender ID'])
#for name, data in groups:
#print(name)
# data['Received Date/Time'] = data['Received Date/Time'].min()
#print(data)
groups = df.groupby([df['UDH'].str[:-1], 'Original Sender ID'])
df2 = groups.agg({'Body':' '.join, 'Received Date/Time':max}).reset_index()
#groups = df.groupby([df['UDH'].str[:-1]])
#df2 = groups.agg({'Body':' '.join, 'Received Date/Time':max, 'Original Sender ID':min}).reset_index()
df2 = df2.sort_values('Received Date/Time')
pd.options.display.width = 200
print(df2)
结果
UDH Original Sender ID Body Received Date/Time
2 ABC001010 GGQMS Hi John, Can You Wait A moment? 01/02/2001 01:03:20
0 05000400011 112233445566 Whats is Carrine Doing Over There? 01/03/2001 11:16:02
3 CD10F101 zwerty Where is Your Homework? 01/03/2001 15:22:11
4 G050078 GGQMS John, you know Where can I get it? 01/04/2001 10:21:21
1 AACCDD5500 112233445566 Order for Pizza Now for cheap $. 01/04/2001 19:20:22
始终将完整的错误消息(从单词“Traceback”开始)作为文本(而不是屏幕截图)进行讨论(不是评论)。还有其他有用的信息。Hi@furas根据要求,我已经更新了帖子。如果它显示字符串没有
.str
,那么您应该尝试不使用.str
?也就是说,len(df['UDH'])==9
BTW:可能在多个条件下
你应该使用.str[:-1]
而不是.str[:7]
和.str[:8]
等。也许它会将所有if/elif
减少到一行代码(没有if/elif
)。稍等一下@furas,我正在尝试修复它。目前,根据您的建议更新了我的代码。它确实解决了这个错误,但是body列似乎是空的。我对编程有点陌生,所以我在修复和找出哪一行出错方面有点慢。首先,我真的非常感谢你的帮助。然而,在我尝试使用你的代码后,它给了我一个错误,我已经更新了我的帖子。我以前从未使用过导入IO。谢谢你的介绍!您的错误KeyError:“接收日期/时间”
可能意味着您获取的文件中没有列“接收日期/时间”
-您可以检查您读取的内容。我只使用了IO
来模拟文件,并将示例数据直接放到代码中,这样其他人就可以运行它,而无需使用数据创建文件。您可以从文件中读取数据,而不必使用io.StringIO()
我要感谢您在帮助我解决此错误以及知识共享方面给予的慷慨时间。我非常感激!:D非常感谢你!这是一个有趣的问题。
text = ''' Body UDH Original Sender ID Received Date/Time
Hi John, Can You ABC0010101 GGQMS 01/02/2001 01:03:19
Wait A moment? ABC0010102 GGQMS 01/02/2001 01:03:20
Whats is 050004000111 112233445566 01/03/2001 11:16:01
Carrine Doing 050004000112 112233445566 01/03/2001 11:16:01
Over There? 050004000113 112233445566 01/03/2001 11:16:02
Where is CD10F1011 zwerty 01/03/2001 15:22:10
Your Homework? CD10F1012 zwerty 01/03/2001 15:22:11
Order for Pizza AACCDD55001 112233445566 01/04/2001 19:20:21
Now for cheap $. AACCDD55002 112233445566 01/04/2001 19:20:22
John, you know G0500781 GGQMS 01/04/2001 10:21:21
Where can I get it? G0500782 GGQMS 01/04/2001 10:21:21'''
import pandas as pd
import io
df = pd.read_csv(io.StringIO(text), sep='\s{2,}')
#df['Received Date/Time'] = df['Received Date/Time'].astype('datetime64[ns]')
#df['UDH'] = df['UDH'].astype(object)
#df['Original Sender ID'] = df['Original Sender ID'].astype(object)
#df['Account User Name'] = df['Account User Name'].astype(object)
#df['Body'] = df['Body'].astype(str)
#df['UDH'] = df['UDH'].str.strip()
#Filter null row in UDH column
#df = df[df['UDH'].notnull()]
#df = df.sort_values(by ='UDH')
#groups = df.groupby([df['UDH'].str[:-1], 'Original Sender ID'])
#for name, data in groups:
#print(name)
# data['Received Date/Time'] = data['Received Date/Time'].min()
#print(data)
groups = df.groupby([df['UDH'].str[:-1], 'Original Sender ID'])
df2 = groups.agg({'Body':' '.join, 'Received Date/Time':max}).reset_index()
#groups = df.groupby([df['UDH'].str[:-1]])
#df2 = groups.agg({'Body':' '.join, 'Received Date/Time':max, 'Original Sender ID':min}).reset_index()
df2 = df2.sort_values('Received Date/Time')
pd.options.display.width = 200
print(df2)
UDH Original Sender ID Body Received Date/Time
2 ABC001010 GGQMS Hi John, Can You Wait A moment? 01/02/2001 01:03:20
0 05000400011 112233445566 Whats is Carrine Doing Over There? 01/03/2001 11:16:02
3 CD10F101 zwerty Where is Your Homework? 01/03/2001 15:22:11
4 G050078 GGQMS John, you know Where can I get it? 01/04/2001 10:21:21
1 AACCDD5500 112233445566 Order for Pizza Now for cheap $. 01/04/2001 19:20:22