Dataframe dask中的compute()不工作

Dataframe dask中的compute()不工作,dataframe,dask,dask-distributed,Dataframe,Dask,Dask Distributed,我正在Dask中尝试一个简单的并行计算。 这是我的密码 import time import dask as dask import dask.distributed as distributed import dask.dataframe as dd import dask.delayed as delayed from dask.distributed import Client,progress client = Client('localhost:8786'

我正在Dask中尝试一个简单的并行计算。 这是我的密码

  import time
  import dask as dask
  import dask.distributed as distributed
  import dask.dataframe as dd
  import dask.delayed as delayed
  from dask.distributed import Client,progress

  client = Client('localhost:8786')
  df = dd.read_csv('file.csv')
  ddf = df.groupby(['col1'])[['col2']].sum() 
  ddf = ddf.compute()
  print ddf
从文档中可以看出这很好,但在运行时,我得到了以下信息:

    Traceback (most recent call last):
    File "dask_prg1.py", line 17, in <module>
    ddf = ddf.compute()
    File "/usr/local/lib/python2.7/site-packages/dask/base.py", line 156, in compute
   (result,) = compute(self, traverse=False, **kwargs)
    File "/usr/local/lib/python2.7/site-packages/dask/base.py", line 402, in compute
   results = schedule(dsk, keys, **kwargs)
   File "/usr/local/lib/python2.7/site-packages/distributed/client.py", line 2159, in get
direct=direct)
  File "/usr/local/lib/python2.7/site-packages/distributed/client.py", line 1562, in gather
asynchronous=asynchronous)
 File "/usr/local/lib/python2.7/site-packages/distributed/client.py", line 652, in sync
return sync(self.loop, func, *args, **kwargs)
 File "/usr/local/lib/python2.7/site-packages/distributed/utils.py", line 275, in sync
six.reraise(*error[0])
 File "/usr/local/lib/python2.7/site-packages/distributed/utils.py", line 260, in f
result[0] = yield make_coro()
   File "/usr/local/lib/python2.7/site-packages/tornado/gen.py", line 1099, in run
value = future.result()
 File "/usr/local/lib/python2.7/site-packages/tornado/concurrent.py", line 260, in result
raise_exc_info(self._exc_info)
 File "/usr/local/lib/python2.7/site-packages/tornado/gen.py", line 1107, in run
yielded = self.gen.throw(*exc_info)
 File "/usr/local/lib/python2.7/site-packages/distributed/client.py", line 1439, in _gather
traceback)
File "/usr/local/lib/python2.7/site-packages/dask/bytes/core.py", line 122, in read_block_from_file
with lazy_file as f:
File "/usr/local/lib/python2.7/site-packages/dask/bytes/core.py", line 166, in __enter__
f = SeekableFile(self.fs.open(self.path, mode=mode))
 File "/usr/local/lib/python2.7/site-packages/dask/bytes/local.py", line 58, in open
return open(self._normalize_path(path), mode=mode)
 IOError: [Errno 2] No such file or directory: 'file.csv'
回溯(最近一次呼叫最后一次):
文件“dask_prg1.py”,第17行,在
ddf=ddf.compute()
文件“/usr/local/lib/python2.7/site packages/dask/base.py”,第156行,在compute中
(结果,)=compute(自我,遍历=False,**kwargs)
文件“/usr/local/lib/python2.7/site packages/dask/base.py”,第402行,在compute中
结果=进度表(dsk、键、**kwargs)
get中第2159行的文件“/usr/local/lib/python2.7/site packages/distributed/client.py”
直接=直接)
文件“/usr/local/lib/python2.7/site packages/distributed/client.py”,第1562行,位于gather中
异步=异步)
文件“/usr/local/lib/python2.7/site packages/distributed/client.py”,第652行,同步
返回同步(self.loop、func、*args、**kwargs)
文件“/usr/local/lib/python2.7/site packages/distributed/utils.py”,第275行,同步
六、重放(*错误[0])
文件“/usr/local/lib/python2.7/site packages/distributed/utils.py”,第260行,f
结果[0]=收益率make_coro()
文件“/usr/local/lib/python2.7/site packages/tornado/gen.py”,第1099行,正在运行
value=future.result()
文件“/usr/local/lib/python2.7/site packages/tornado/concurrent.py”,第260行,在结果中
提升exc信息(自身exc信息)
文件“/usr/local/lib/python2.7/site packages/tornado/gen.py”,第1107行,正在运行
屈服=自我生成抛出(*exc_信息)
文件“/usr/local/lib/python2.7/site packages/distributed/client.py”,第1439行,在
(回溯)
文件“/usr/local/lib/python2.7/site packages/dask/bytes/core.py”,第122行,从文件读取块
将lazy_文件作为f:
文件“/usr/local/lib/python2.7/site packages/dask/bytes/core.py”,第166行,输入__
f=SeekableFile(self.fs.open(self.path,mode=mode))
文件“/usr/local/lib/python2.7/site packages/dask/bytes/local.py”,第58行,打开
返回打开(自规范化路径,模式=模式)
IOError:[Errno 2]没有这样的文件或目录:“file.csv”

我不明白出了什么问题。请帮我解决这个问题。提前谢谢。

您可能希望将绝对文件路径传递到
read\u csv
。原因是,您正在将打开和读取文件的工作交给dask工作人员,而您可能没有开始使用与脚本/会话相同的工作目录。

您可能希望将绝对文件路径传递给
读取csv
。原因是,您正在将打开和读取文件的工作交给dask工作人员,而您可能没有开始使用与脚本/会话相同的工作目录。

这不是问题所在。请尝试。我还尝试省略compute()语句,然后运行。它运行得很好。所以,我认为问题在于compute()语句。您的工作人员在同一台机器上吗?他们有权限查看相同的文件吗?谢谢。这就是问题所在。工作人员在不同的机器上,可能其中一人无法访问csv文件。这不是问题所在。请尝试。我还尝试省略compute()语句,然后运行。它运行得很好。所以,我认为问题在于compute()语句。您的工作人员在同一台机器上吗?他们有权限查看相同的文件吗?谢谢。这就是问题所在。工人在不同的机器上,其中一人可能无法访问csv文件。