Dask正则表达式提取比较失败,NotImplementedError
我有一个Dask数据帧,如下所示:Dask正则表达式提取比较失败,NotImplementedError,dask,Dask,我有一个Dask数据帧,如下所示: class1 statement class2 value <geoentity_Pic_de_Font_Blanca_2986043> <hasLatitude> 42.64991^^<degrees> 42.64991 <geoentity_Pic_de_Font_Blanca_29
class1 statement class2 value
<geoentity_Pic_de_Font_Blanca_2986043> <hasLatitude> 42.64991^^<degrees> 42.64991
<geoentity_Pic_de_Font_Blanca_2986043> <hasLongitude> 1.53335^^<degrees> 1.53335
<geoentity_Pic_de_Font_Blanca_2986043> <hasGeonamesEntityId> 2986043 NaN
<geoentity_Pic_de_Font_Blanca_2986043> rdfs:label Pic de Font Blanca NaN
但这给了我以下错误:
E:\WPy-3710\python-3.7.1.amd64\lib\site-packages\dask\dataframe\core.py in __getitem__(self, key)
3347 graph = HighLevelGraph.from_collections(name, dsk, dependencies=[self, key])
3348 return new_dd_object(graph, name, self, self.divisions)
-> 3349 raise NotImplementedError(key)
3350
3351 def __setitem__(self, key, value):
NotImplementedError: Dask DataFrame Structure:
0 1
npartitions=442
bool bool
... ...
... ... ...
... ...
... ...
Dask Name: and_, 3978 tasks
我的数据类型是:
我不知道为什么会失败,因为提取本身似乎返回了正确的子字符串。有人知道我做错了什么吗?如果没有一个可复制的示例,很难说,但看起来您正在尝试用另一个Dask数据帧索引一个Dask数据帧,这是不受支持的,可能不是您想要的 只使用熊猫
In [18]: df = pd.DataFrame({"A": ['a1', 'b2', 'c3']})
In [19]: df[df.A.str.extract('(\d)') == '1']
Out[19]:
A
0 NaN
1 NaN
2 NaN
这是因为.str.extract返回一个数据帧。设置expand=False以获得1D系列
In [20]: df[df.A.str.extract('(\d)', expand=False) == '1']
Out[20]:
A
0 a1
这也适用于达斯克
In [21]: df = dd.from_pandas(df, 2)
In [22]: df[df.A.str.extract('(\d)', expand=False) == '1']
Out[22]:
Dask DataFrame Structure:
A
npartitions=1
0 object
2 ...
Dask Name: getitem, 5 tasks
In [23]: _.compute()
Out[23]:
A
0 a1
In [20]: df[df.A.str.extract('(\d)', expand=False) == '1']
Out[20]:
A
0 a1
In [21]: df = dd.from_pandas(df, 2)
In [22]: df[df.A.str.extract('(\d)', expand=False) == '1']
Out[22]:
Dask DataFrame Structure:
A
npartitions=1
0 object
2 ...
Dask Name: getitem, 5 tasks
In [23]: _.compute()
Out[23]:
A
0 a1