Python Pandas版本0.22.0-drop_duplicates()得到一个意外的关键字参数';保持';
我正在尝试使用drop_duplicates(subset=[''],keep=False)删除数据帧中的重复项。显然,它在我的Jupyter笔记本中工作正常,但当我试图以.py文件的形式通过终端执行时,我得到以下错误:Python Pandas版本0.22.0-drop_duplicates()得到一个意外的关键字参数';保持';,python,pandas,python-2.7,dataframe,Python,Pandas,Python 2.7,Dataframe,我正在尝试使用drop_duplicates(subset=[''],keep=False)删除数据帧中的重复项。显然,它在我的Jupyter笔记本中工作正常,但当我试图以.py文件的形式通过终端执行时,我得到以下错误: Traceback (most recent call last): File "/home/source/fork/PySpark_Analytics/Notebooks/Krish/beryllium_pandas.py", line 54, in <module
Traceback (most recent call last):
File "/home/source/fork/PySpark_Analytics/Notebooks/Krish/beryllium_pandas.py", line 54, in <module>
dffsamelname = dffsameflname.drop_duplicates(subset=['INDIVIDUAL_LASTNAME'], keep=False)
File "/var/webeng/opensource/aetna-anaconda/lib/python2.7/site-packages/pandas/util/decorators.py", line 88, in wrapper
return func(*args, **kwargs)
TypeError: drop_duplicates() got an unexpected keyword argument 'keep'
我想删除正在删除的两条记录。因此keep=false是必需的
如果我删除keep=false,它就可以正常工作。可能是您的对象不是本机pandas数据帧,而是pyspark数据帧。
由此看来,子集似乎是唯一被接受的参数。您可以添加导入和创建数据框的行。可能是您的对象不是本机pandas数据框,而是pyspark数据框。
由此看来,子集似乎是唯一被接受的参数。您可以添加导入和创建数据框的行。打印(pd.\uuuuu version\uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu在执行此操作之前。从pyspark导入SparkContext中导入pyspark,从pyspark.sql导入SparkConf HiveContext,SQLContext,从datetime导入日期中导入函数,从pyspark.sql导入行中导入timedelta,从pyspark.sql导入列中导入函数,从pyspark.sql分解导入SparkSession导入pandas作为pd导入numpy作为np设置Spark会话sqlSession=SparkSession\.builder\.appName(“铍模型”)\.enableHiveSupport()\.config(“hive.exec.dynamic.partition”,“true”)\.config(“hive.exec.dynamic.partition.mode”,“nonstrict”)\.getOrCreate()正在检查是否已将toPandas()指定给某个对象,因为它不是就地函数。在执行此操作之前,我已将数据帧转换为toPandas()。从pyspark导入pyspark导入SparkContext,从pyspark导入SparkContext,从pyspark导入SparkContext,从pyspark导入SparkContext,从datetime导入date导入SQLContext,从date导入函数,pyspark.sql中的timedelta从pyspark.sql.functions导入行导入列,从pyspark.sql中分解导入SparkSession作为pd导入pandas作为np导入numpy设置Spark Session sqlSession=SparkSession\.builder\.appName(“铍模型”)\.enableHiveSupport()\.config(“hive.exec.dynamic.partition”,“true”)\.config(“hive.exec.dynamic.partition.mode”,“nonstrict”)\.getOrCreate()只是检查您是否将toPandas()分配给了某个对象,因为它不是一个就地函数
# Trying to drop both the records with same last name
dffsamelname = dffsameflname.drop_duplicates(subset=['INDIVIDUAL_LASTNAME'], keep=False)