Warning: file_get_contents(/data/phpspider/zhask/data//catemap/8/python-3.x/19.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python TypeError:normalize()参数2必须是str,而不是带有字符串数据帧的序列_Python_Python 3.x_String_Nltk_Typeerror - Fatal编程技术网

Python TypeError:normalize()参数2必须是str,而不是带有字符串数据帧的序列

Python TypeError:normalize()参数2必须是str,而不是带有字符串数据帧的序列,python,python-3.x,string,nltk,typeerror,Python,Python 3.x,String,Nltk,Typeerror,我有一个包含每天新闻的数据框架,我试图分析一天的感觉强度,也就是说,从新闻中得到的一天的总体感觉是积极的、消极的还是中性的。以下是DFU新闻的数据框: Date name 0 2017-10-20 Gucci debuts art installation at its Ginza sto... 1 2018-08-01 Gucci Joins Paris Fashion Week for Its Spring ... 2 2018-04-20 Gucci lau

我有一个包含每天新闻的数据框架,我试图分析一天的感觉强度,也就是说,从新闻中得到的一天的总体感觉是积极的、消极的还是中性的。以下是DFU新闻的数据框:

    Date    name
0   2017-10-20  Gucci debuts art installation at its Ginza sto...
1   2018-08-01  Gucci Joins Paris Fashion Week for Its Spring ...
2   2018-04-20  Gucci launches its new creative hub Gucci ArtL...
3   2017-10-20  Gucci to launch homeware line Gucci Decor - CP...
4   2017-12-07  GUCCI opens new store at Miami Design District...
5   2018-01-12  Gucci opens Gucci Garden in Florence - LUXUO
6   2018-02-26  GUCCI's wild experiment with the Fall Winter 2...
7   2018-08-09  Gucci Revamped London Flagship Store | The Imp...
8   2018-08-01  Alessandro Michele Announces new Gucci Home co...
9   2017-10-20  Before He Picks Up the CFDA’s International Aw...
我试图通过他使用的以下代码获得强烈的感觉:

但是,对于某些日期,我会得到一个类型错误。多亏了
try catch
,您没有将其考虑在内,并绘制下表:

    name    compound    neg neu pos
Date                    
2017-10-20  Gucci debuts art installation at its Ginza sto...               
2018-08-01  Gucci Joins Paris Fashion Week for Its Spring ...               
2018-04-20  Gucci launches its new creative hub Gucci ArtL...   0.4404  0   0.756   0.244
2017-10-20  Gucci to launch homeware line Gucci Decor - CP...               
2017-12-07  GUCCI opens new store at Miami Design District...   0   0   1   0
2018-01-12  Gucci opens Gucci Garden in Florence - LUXUO    0   0   1   0
2018-02-26  GUCCI's wild experiment with the Fall Winter 2...   0   0   1   0
2018-08-09  Gucci Revamped London Flagship Store | The Imp...   0.3182  0   0.602   0.398
2018-08-01  Alessandro Michele Announces new Gucci Home co...               
2017-10-20  Before He Picks Up the CFDA’s International Aw...               
但是,当我删除try catch以了解其失败的原因时,我得到以下错误:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-26-2e9dbfc62bce> in <module>
      4 for date, row in df_news.T.iteritems():
      5 #    try:
----> 6     sentence = unicodedata.normalize('NFKD', df_news.loc[date, 'name']).encode('ascii','ignore')
      7     #print((sentence))
      8     ss = sid.polarity_scores(str(sentence))

TypeError: normalize() argument 2 must be str, not Series
---------------------------------------------------------------------------
IndexingError                             Traceback (most recent call last)
<ipython-input-173-1bc93a0a065c> in <module>
      5     try:
      6         #sentence = unicodedata.normalize('NFKD', df_news.loc[date, 'name']).encode('ascii','ignore')
----> 7         sentence = df_news.loc[date, 'name'].apply(lambda x: unicodedata.normalize('NFKD', x).encode('ascii','ignore'))
      8         ss = sid.polarity_scores(str(sentence))
      9         df_news.set_value(date, 'compound', ss['compound'])

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexing.py in __getitem__(self, key)
   1470             except (KeyError, IndexError):
   1471                 pass
-> 1472             return self._getitem_tuple(key)
   1473         else:
   1474             # we by definition only have the 0th axis

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexing.py in _getitem_tuple(self, tup)
    873 
    874         # no multi-index, so validate all of the indexers
--> 875         self._has_valid_tuple(tup)
    876 
    877         # ugly hack for GH #836

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexing.py in _has_valid_tuple(self, key)
    218         for i, k in enumerate(key):
    219             if i >= self.obj.ndim:
--> 220                 raise IndexingError('Too many indexers')
    221             try:
    222                 self._validate_key(k, i)

IndexingError: Too many indexers
获取数据 它应该回馈:

    Date    name
0   2017-10-20  Gucci debuts art installation at its Ginza sto...
1   2018-08-01  Gucci Joins Paris Fashion Week for Its Spring ...
2   2018-04-20  Gucci launches its new creative hub Gucci ArtL...
3   2017-10-20  Gucci to launch homeware line Gucci Decor - CP...
4   2017-12-07  GUCCI opens new store at Miami Design District...
5   2018-01-12  Gucci opens Gucci Garden in Florence - LUXUO
6   2018-02-26  GUCCI's wild experiment with the Fall Winter 2...
7   2018-08-09  Gucci Revamped London Flagship Store | The Imp...
8   2018-08-01  Alessandro Michele Announces new Gucci Home co...
9   2017-10-20  Before He Picks Up the CFDA’s International Aw...
编辑: 我对当天出现的文章进行了分组,并将它们放在列表中

# get date out of the index to column    
df_news = df_news.reset_index()
# optional
df_news['Date'] = pd.to_datetime(df_news['Date'])
# groupby and output group rows as list
df_news = df_news.groupby('Date')['name'].apply(list)
df_news.head()
它还给了我:

Date
2017-10-20    [Gucci debuts art installation at its Ginza st...
2017-12-07    [GUCCI opens new store at Miami Design Distric...
2018-01-12       [Gucci opens Gucci Garden in Florence - LUXUO]
2018-02-26    [GUCCI's wild experiment with the Fall Winter ...
2018-04-20    [Gucci launches its new creative hub Gucci Art...
2018-08-01    [Gucci Joins Paris Fashion Week for Its Spring...
2018-08-09    [Gucci Revamped London Flagship Store | The Im...
Name: name, dtype: object
因此,当我尝试应用Stael的答案时:

sentence = df_news.loc[date, 'name'].apply(lambda x: unicodedata.normalize('NFKD', x).encode('ascii','ignore'))
也就是说,对系列中的每个项目进行规范化

我得到以下错误:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-26-2e9dbfc62bce> in <module>
      4 for date, row in df_news.T.iteritems():
      5 #    try:
----> 6     sentence = unicodedata.normalize('NFKD', df_news.loc[date, 'name']).encode('ascii','ignore')
      7     #print((sentence))
      8     ss = sid.polarity_scores(str(sentence))

TypeError: normalize() argument 2 must be str, not Series
---------------------------------------------------------------------------
IndexingError                             Traceback (most recent call last)
<ipython-input-173-1bc93a0a065c> in <module>
      5     try:
      6         #sentence = unicodedata.normalize('NFKD', df_news.loc[date, 'name']).encode('ascii','ignore')
----> 7         sentence = df_news.loc[date, 'name'].apply(lambda x: unicodedata.normalize('NFKD', x).encode('ascii','ignore'))
      8         ss = sid.polarity_scores(str(sentence))
      9         df_news.set_value(date, 'compound', ss['compound'])

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexing.py in __getitem__(self, key)
   1470             except (KeyError, IndexError):
   1471                 pass
-> 1472             return self._getitem_tuple(key)
   1473         else:
   1474             # we by definition only have the 0th axis

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexing.py in _getitem_tuple(self, tup)
    873 
    874         # no multi-index, so validate all of the indexers
--> 875         self._has_valid_tuple(tup)
    876 
    877         # ugly hack for GH #836

C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexing.py in _has_valid_tuple(self, key)
    218         for i, k in enumerate(key):
    219             if i >= self.obj.ndim:
--> 220                 raise IndexingError('Too many indexers')
    221             try:
    222                 self._validate_key(k, i)

IndexingError: Too many indexers

在我看来是这样的:

sentence = unicodedata.normalize('NFKD', df_news.loc[date, 'name']).encode('ascii','ignore')
您试图调用df.news.loc[…]系列中的每个项目的normalise

但是pandas没有为您在整个系列中应用该功能-我认为您想要做的是这样的:

sentence = df_news.loc[date, 'name'].apply(lambda x: unicodedata.normalize('NFKD', x).encode('ascii','ignore')
df['new_column'] = [i['example_key'] for i in scores]
这是一种将函数(规格化)应用于系列中每个项的方法


编辑:

理论2-当你调用
df_news.loc[date,'name']
时,你选择的是
index==date
column==name'
的项目,但从你的问题来看,有些日期在你的索引中是重复的,这意味着,有时,不是获取一条记录,在其中调用
unicodedata.normalize
,而是获取一个序列,这会导致错误

您会注意到,使用'try:except:'子句时未填充的记录是具有重复日期的记录


您需要以某种方式来处理这个问题,也许可以使用iteritems中的
row
,而不是date,但这需要您自己来解决

看到你在一篇文章中又犯了第三个错误,我想我还得再做一次

首先也是最重要的一点是,我觉得您不太理解自己的代码。像
AttributeError:“list”对象没有属性“apply”
这样的错误对我来说意味着,在对它们进行操作时,您不知道变量是什么,因此我认为在进入下一节之前,您需要更慢、更仔细地理解代码的每一部分都在做什么

也就是说,您的问题并不像您所做的那么复杂-您正在尝试应用这两行代码

    sentence = unicodedata.normalize('NFKD', df_news.loc[date, 'name']).encode('ascii','ignore')

    ss = sid.polarity_scores(str(sentence))
数据框中“name”列中的每个条目,这并不难

您可以很容易地做到这一点:

scores = []
for entry in df['name']:
    sentence = unicodedata.normalize('NFKD', entry).encode('ascii','ignore')
    scores.append(sid.polarity_scores(str(sentence)))
这将为您提供一个您正在调用
ss

您可以将这些列作为数据帧中的列应用,如下所示:

sentence = df_news.loc[date, 'name'].apply(lambda x: unicodedata.normalize('NFKD', x).encode('ascii','ignore')
df['new_column'] = [i['example_key'] for i in scores]
这不是最好或最有效的方法,但它是一个非常简单的方法,让你实现你想要做的事情

祝你好运


如果您以前按天分组,并列出了字符串(顺便说一句,我认为您不应该这样做),那么您需要另一层迭代

scores = []
for sentence_list in df['name']:
    for entry in sentence_list:
        sentence = unicodedata.normalize('NFKD', entry).encode('ascii','ignore')
        scores.append(sid.polarity_scores(str(sentence)))

嗯,然后它会在
ss=sid上创建一个
SyntaxError:invalid syntaxe
。极性评分(str(句子))
您希望变量句子是什么?在第一种情况下,您正在操作一个系列,因此您可能希望从中产生类似于一个系列的内容-您不能接受str或一个系列,这没有意义。抱歉!!我缺少了一个括号,实际错误是
AttributeError:'str'对象在
df_news.loc[date,'name'].apply(lambda…
ok,我想我开始理解了-我想
df_news.loc[date,'name']
有时会给你一个字符串,有时会给你一个
系列
。我从你的问题中看到,日期
'2017-10-20'
在索引中出现了两次。在这种情况下,你会得到一个系列,而不是一个字符串。你需要以某种方式处理它,然后才能
将其正常化@乘客:我已经编辑了我的答案,试图让它更清楚。谢谢你的帮助。但是我仍然有一个
类型错误
带有
语句=unicodedata.normalize('NFKD',entry.)。encode('ascii','ignore')
,因为条目是一个列表。但是当我这样做时,
语句=df_news.loc[date,'name']
我可以应用
ss=sid.polarity\u分数(str(句子))
在某些句子中,如果
neu
的返回分数为1,则另一个似乎不起作用。我认为没有任何理由将字符串分组到列表中。我认为您这样做是因为日期索引中有重复项,但这不是一个真正的问题-这种方法应该可以处理重复的日期.