将NLTK与Hadoop集成时出错

将NLTK与Hadoop集成时出错,hadoop,nltk,Hadoop,Nltk,我正在尝试将NLTK与Hadoop集成。基本上我想给这些词贴上标签。我尝试了以下链接: 但是,在运行MapReduce程序时,我仍然会遇到错误: 14/12/09 11:45:53 ERROR streaming.StreamJob: Job not successful. Error: # of failed Map Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask: task_201412091132_0004_m_0

我正在尝试将NLTK与Hadoop集成。基本上我想给这些词贴上标签。我尝试了以下链接:

但是,在运行MapReduce程序时,我仍然会遇到错误:

14/12/09 11:45:53 ERROR streaming.StreamJob: Job not successful. Error: # of failed Map Tasks exceeded allowed limit. FailedCount: 1. LastFailedTask: task_201412091132_0004_m_000000
14/12/09 11:45:53 INFO streaming.StreamJob: killJob...
Streaming Command Failed!
我的映射程序是:

#!/usr/bin/env python

import sys
import os
import re
#import sys
import zipimport

importer = zipimport.zipimporter('nltkandyaml.mod')
yaml = importer.load_module('yaml')
nltk = importer.load_module('nltk')


# input comes from STDIN (standard input)
for line in sys.stdin:

  line = line.strip()

  words = line.split()

  for word in words:

    a=nltk.pos_tag(word)
    print '%s\t%s' % (word, 1)
我使用的是与单词计数示例中相同的Reducer程序。我是Hadoop新手。请帮忙