Python 从Numpy结果分配考拉列_Python_Pandas_Numpy_Databricks

Python 从Numpy结果分配考拉列

python pandas numpy

Python 从Numpy结果分配考拉列,python,pandas,numpy,databricks,Python,Pandas,Numpy,Databricks,尝试在Databricks考拉中复制Pandas功能大熊猫： df = pd.DataFrame({'a': [450, 1, 26], 'b': [1, 450, 70], }) thresh = [x for x in range(26)] # create a list 1 to 25 df["c"] = np.where((df.a.isin(thresh) | df.b.isin(thresh)), 1, 0)

尝试在Databricks考拉中复制Pandas功能大熊猫：

df = pd.DataFrame({'a': [450, 1, 26],
                   'b': [1, 450, 70],
                  })
thresh = [x for x in range(26)] # create a list 1 to 25
df["c"] = np.where((df.a.isin(thresh) | df.b.isin(thresh)), 1, 0) # find the values within the threshold and flag column 'c'
df
# returns
Out[32]: 
     a    b  c
0  450    1  1
1    1  450  1
2   26   70  0

在考拉：

df = ks.DataFrame({'a': [450, 1, 26],
                   'b': [1, 450, 70],
                  })

thresh = [x for x in range(26)] # create a list 1 to 25
df = df.assign(c=np.where((df.a.isin(thresh) | df.b.isin(thresh)), 1, 0)) # find the values within the threshold and flag column 'c'
# returns
PandasNotImplementedError: The method `pd.Series.__iter__()` is not implemented. If you want to collect your data as an NumPy array, use 'to_numpy()' instead.

我如何正确地使用

来_numpy

，因为它期望或将numpy结果包装在ks.Series（）中，以便assign（）将获得结果

df=df.assign（c=ks.Series（np.where（（df.a.isin（thresh）| df.b.isin（thresh）），1，0））

给出与上述相同的错误

有没有办法复制考拉中的熊猫功能？

要执行您在

ks.DataFrame

中执行的操作，您不需要

np.where

，但可以使用

astype

：

df = df.assign(c= (df.a.isin(thresh) | df.b.isin(thresh)).astype(int) )
df
     a    b  c
0  450    1  1
1    1  450  1
2   26   70  0