Python SparkPandasNotImplementedError:.iloc需要数值切片或条件布尔索引_Python_Pandas_Error Handling_Apache Spark Sql_Databricks

Python SparkPandasNotImplementedError:.iloc需要数值切片或条件布尔索引

python pandas error-handling

Python SparkPandasNotImplementedError:.iloc需要数值切片或条件布尔索引,python,pandas,error-handling,apache-spark-sql,databricks,Python,Pandas,Error Handling,Apache Spark Sql,Databricks,我在Databricks上不断遇到以下错误： SparkPandasNotImplementedError:。iloc需要数值切片或条件布尔索引，您正在尝试使用pandas函数。iloc[…，…]，使用spark函数选择，其中这是我的代码： import re import nltk import heapq corpus = [] for i in range(0, len(Y)): describe = re.sub('[^a-zA-Z]', ' ', Y.iloc[i, 0])

我在Databricks上不断遇到以下错误：

SparkPandasNotImplementedError:。iloc需要数值切片或条件布尔索引，您正在尝试使用pandas函数。iloc[…，…]，使用spark函数选择，其中

这是我的代码：

import re 
import nltk
import heapq
corpus = []
for i in range(0, len(Y)):
    describe = re.sub('[^a-zA-Z]', ' ', Y.iloc[i, 0])
    describe = describe.lower()
    describe = describe.split()
    describe = ' '.join(describe)
    corpus.append(describe)

该代码在Spyder中运行良好，但在databricks中运行不好

我试图成功地重现与您相同的问题，如下面的代码和图所示

import numpy as np
import pandas as pd
import databricks.koalas as ks
dates = pd.date_range('20130101', periods=6)
pdf = pd.DataFrame(np.random.randn(6, 4), index=dates, columns=list('ABCD'))
df = ks.from_pandas(pdf)
print(pdf.iloc[0,0])
print(df.iloc[0,0])

由于缺少对变量Y的必要描述，我猜Y是一个数据帧，但区别在于本地Spyder上的dataframe和DataRicks中的dataframe

根据的考拉文档，它不支持考拉数据帧的ilocint，int操作

因此，如果您想对databricks中每行的第一列值执行一些操作，有两种解决方案，如下所示

# Here, `Y` is a Koalas dataframe
for row in Y.iterrows():
    describe = re.sub('[^a-zA-Z]', ' ', row[1][0])
    describe = describe.lower()
    describe = describe.split()
    describe = ' '.join(describe)
    corpus.append(describe)

确保Y是与datatrick相同脚本中的pandas数据帧。 Y必须是您想要的考拉数据帧，请尝试使用下面的代码

# Here, `Y` is a Koalas dataframe
for row in Y.iterrows():
    describe = re.sub('[^a-zA-Z]', ' ', row[1][0])
    describe = describe.lower()
    describe = describe.split()
    describe = ' '.join(describe)
    corpus.append(describe)

正如您在下面看到的示例代码和结果一样，函数iterrows可以帮助您获取每行的第一列值