在Python中，如何将SArray中的每个元素中的每个值相乘？_Python_Pandas_Graphlab

在Python中，如何将SArray中的每个元素中的每个值相乘？

python pandas

在Python中，如何将SArray中的每个元素中的每个值相乘？,python,pandas,graphlab,Python,Pandas,Graphlab,我用的是Graphlab，但我想这个问题也适用于熊猫 import graphlab sf = graphlab.SFrame({'id': [1, 2, 3], 'user_score': [{"a":4, "b":3}, {"a":5, "b":7}, {"a":2, "b":3}], 'weight': [4, 5, 2]}) 我想创建一个新列，其中“user_score”中每个元素的值乘以“weight”中的数字。就是 sf = graphlab.SFrame({'id': [1, 2

我用的是Graphlab，但我想这个问题也适用于熊猫

import graphlab
sf = graphlab.SFrame({'id': [1, 2, 3], 'user_score': [{"a":4, "b":3}, {"a":5, "b":7}, {"a":2, "b":3}], 'weight': [4, 5, 2]})

我想创建一个新列，其中“user_score”中每个元素的值乘以“weight”中的数字。就是

sf = graphlab.SFrame({'id': [1, 2, 3], 'user_score': [{"a":4, "b":3}, {"a":5, "b":7}, {"a":2, "b":3}], 'weight': [4, 5, 2]}, 'new':[{"a":16, "b":12}, {"a":25, "b":35}, {"a":4, "b":6}])

我试着在下面写一个简单的函数，但没有成功。有什么想法吗

def trans(x, y):
    d = dict()
    for k, v in x.items():
        d[k] = v*y
    return d

sf.apply(trans(sf['user_score'], sf['weight']))

它收到以下错误消息：

AttributeError: 'SArray' object has no attribute 'items'

我使用的是

pandas

dataframe，但在您的情况下也应该可以使用

import pandas as pd
df['new']=[dict((k,v*y) for k,v in x.items()) for x, y in zip(df['user_score'], df['weight'])]

输入数据帧：

df
Out[34]: 
   id          user_score  weight
0   1  {u'a': 4, u'b': 3}       4
1   2  {u'a': 5, u'b': 7}       5
2   3  {u'a': 2, u'b': 3}       2

df
Out[36]: 
   id          user_score  weight                   new
0   1  {u'a': 4, u'b': 3}       4  {u'a': 16, u'b': 12}
1   2  {u'a': 5, u'b': 7}       5  {u'a': 25, u'b': 35}
2   3  {u'a': 2, u'b': 3}       2    {u'a': 4, u'b': 6}

输出：

df
Out[34]: 
   id          user_score  weight
0   1  {u'a': 4, u'b': 3}       4
1   2  {u'a': 5, u'b': 7}       5
2   3  {u'a': 2, u'b': 3}       2

df
Out[36]: 
   id          user_score  weight                   new
0   1  {u'a': 4, u'b': 3}       4  {u'a': 16, u'b': 12}
1   2  {u'a': 5, u'b': 7}       5  {u'a': 25, u'b': 35}
2   3  {u'a': 2, u'b': 3}       2    {u'a': 4, u'b': 6}

以下是许多可能的解决方案之一：

In [69]: df
Out[69]:
   id        user_score  weight
0   1  {'b': 3, 'a': 4}       4
1   2  {'b': 7, 'a': 5}       5
2   3  {'b': 3, 'a': 2}       2

In [70]: df['user_score'] = df['user_score'].apply(lambda x: pd.Series(x)).mul(df.weight, axis=0).to_dict('record')

In [71]: df
Out[71]:
   id          user_score  weight
0   1  {'b': 12, 'a': 16}       4
1   2  {'b': 35, 'a': 25}       5
2   3    {'b': 6, 'a': 4}       2

这很微妙，但我认为你想要的是：

sf.apply(lambda row: trans(row['user_score'], row['weight']))

apply函数将函数作为其参数，并将每一行作为参数传递给该函数。在您的版本中，在调用apply之前，您正在评估trans函数，这就是为什么错误消息会抱怨在需要dict时将SArray传递给trans函数的原因

应用意味着它不是由definition@Jeff，谢谢你的评论！我已经改正了我的错误answer@Jeff，如果我错了，请纠正我，但我认为像

df['num_column'].apply（np.sum）

-这样的东西是矢量化的，因为它将应用于整个列（而不是一个元素一个元素），或者我理解错了吗？是的，如果你传递一个ufunc（它

np.sum

），它将像

np.sum（df>）一样执行['num_columns']

，因此这是“矢量化”的。点几乎总是

。apply

不是矢量化的（因此与其他几乎任何东西相比都有性能损失）。