Dask正则表达式提取比较失败,NotImplementedError

Dask正则表达式提取比较失败,NotImplementedError,dask,Dask,我有一个Dask数据帧,如下所示: class1 statement class2 value <geoentity_Pic_de_Font_Blanca_2986043> <hasLatitude> 42.64991^^<degrees> 42.64991 <geoentity_Pic_de_Font_Blanca_29

我有一个Dask数据帧,如下所示:

class1                                  statement             class2              value
<geoentity_Pic_de_Font_Blanca_2986043>  <hasLatitude>         42.64991^^<degrees> 42.64991
<geoentity_Pic_de_Font_Blanca_2986043>  <hasLongitude>        1.53335^^<degrees>  1.53335
<geoentity_Pic_de_Font_Blanca_2986043>  <hasGeonamesEntityId> 2986043             NaN
<geoentity_Pic_de_Font_Blanca_2986043>  rdfs:label            Pic de Font Blanca  NaN
但这给了我以下错误:

E:\WPy-3710\python-3.7.1.amd64\lib\site-packages\dask\dataframe\core.py in __getitem__(self, key)
   3347             graph = HighLevelGraph.from_collections(name, dsk, dependencies=[self, key])
   3348             return new_dd_object(graph, name, self, self.divisions)
-> 3349         raise NotImplementedError(key)
   3350 
   3351     def __setitem__(self, key, value):

NotImplementedError: Dask DataFrame Structure:
                    0     1
npartitions=442            
                 bool  bool
                  ...   ...
...               ...   ...
                  ...   ...
                  ...   ...
Dask Name: and_, 3978 tasks
我的数据类型是:


我不知道为什么会失败,因为提取本身似乎返回了正确的子字符串。有人知道我做错了什么吗?

如果没有一个可复制的示例,很难说,但看起来您正在尝试用另一个Dask数据帧索引一个Dask数据帧,这是不受支持的,可能不是您想要的

只使用熊猫

In [18]: df = pd.DataFrame({"A": ['a1', 'b2', 'c3']})

In [19]: df[df.A.str.extract('(\d)') == '1']
Out[19]:
     A
0  NaN
1  NaN
2  NaN
这是因为.str.extract返回一个数据帧。设置expand=False以获得1D系列

In [20]: df[df.A.str.extract('(\d)', expand=False) == '1']
Out[20]:
    A
0  a1
这也适用于达斯克

In [21]: df = dd.from_pandas(df, 2)

In [22]: df[df.A.str.extract('(\d)', expand=False) == '1']
Out[22]:
Dask DataFrame Structure:
                    A
npartitions=1
0              object
2                 ...
Dask Name: getitem, 5 tasks

In [23]: _.compute()
Out[23]:
    A
0  a1
In [20]: df[df.A.str.extract('(\d)', expand=False) == '1']
Out[20]:
    A
0  a1
In [21]: df = dd.from_pandas(df, 2)

In [22]: df[df.A.str.extract('(\d)', expand=False) == '1']
Out[22]:
Dask DataFrame Structure:
                    A
npartitions=1
0              object
2                 ...
Dask Name: getitem, 5 tasks

In [23]: _.compute()
Out[23]:
    A
0  a1