Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/357.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python Pandas版本0.22.0-drop_duplicates()得到一个意外的关键字参数';保持';_Python_Pandas_Python 2.7_Dataframe - Fatal编程技术网

Python Pandas版本0.22.0-drop_duplicates()得到一个意外的关键字参数';保持';

Python Pandas版本0.22.0-drop_duplicates()得到一个意外的关键字参数';保持';,python,pandas,python-2.7,dataframe,Python,Pandas,Python 2.7,Dataframe,我正在尝试使用drop_duplicates(subset=[''],keep=False)删除数据帧中的重复项。显然,它在我的Jupyter笔记本中工作正常,但当我试图以.py文件的形式通过终端执行时,我得到以下错误: Traceback (most recent call last): File "/home/source/fork/PySpark_Analytics/Notebooks/Krish/beryllium_pandas.py", line 54, in <module

我正在尝试使用drop_duplicates(subset=[''],keep=False)删除数据帧中的重复项。显然,它在我的Jupyter笔记本中工作正常,但当我试图以.py文件的形式通过终端执行时,我得到以下错误:

Traceback (most recent call last):
  File "/home/source/fork/PySpark_Analytics/Notebooks/Krish/beryllium_pandas.py", line 54, in <module>
    dffsamelname = dffsameflname.drop_duplicates(subset=['INDIVIDUAL_LASTNAME'], keep=False)

File "/var/webeng/opensource/aetna-anaconda/lib/python2.7/site-packages/pandas/util/decorators.py", line 88, in wrapper
    return func(*args, **kwargs)
TypeError: drop_duplicates() got an unexpected keyword argument 'keep'
我想删除正在删除的两条记录。因此keep=false是必需的


如果我删除keep=false,它就可以正常工作。

可能是您的对象不是本机pandas数据帧,而是pyspark数据帧。
由此看来,子集似乎是唯一被接受的参数。您可以添加导入和创建数据框的行。

可能是您的对象不是本机pandas数据框,而是pyspark数据框。
由此看来,子集似乎是唯一被接受的参数。您可以添加导入和创建数据框的行。

打印(pd.\uuuuu version\uuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuuu在执行此操作之前。从pyspark导入SparkContext中导入pyspark,从pyspark.sql导入SparkConf HiveContext,SQLContext,从datetime导入日期中导入函数,从pyspark.sql导入行中导入timedelta,从pyspark.sql导入列中导入函数,从pyspark.sql分解导入SparkSession导入pandas作为pd导入numpy作为np设置Spark会话sqlSession=SparkSession\.builder\.appName(“铍模型”)\.enableHiveSupport()\.config(“hive.exec.dynamic.partition”,“true”)\.config(“hive.exec.dynamic.partition.mode”,“nonstrict”)\.getOrCreate()正在检查是否已将toPandas()指定给某个对象,因为它不是就地函数。在执行此操作之前,我已将数据帧转换为toPandas()。从pyspark导入pyspark导入SparkContext,从pyspark导入SparkContext,从pyspark导入SparkContext,从pyspark导入SparkContext,从datetime导入date导入SQLContext,从date导入函数,pyspark.sql中的timedelta从pyspark.sql.functions导入行导入列,从pyspark.sql中分解导入SparkSession作为pd导入pandas作为np导入numpy设置Spark Session sqlSession=SparkSession\.builder\.appName(“铍模型”)\.enableHiveSupport()\.config(“hive.exec.dynamic.partition”,“true”)\.config(“hive.exec.dynamic.partition.mode”,“nonstrict”)\.getOrCreate()只是检查您是否将toPandas()分配给了某个对象,因为它不是一个就地函数
# Trying to drop both the records with same last name
dffsamelname = dffsameflname.drop_duplicates(subset=['INDIVIDUAL_LASTNAME'], keep=False)