python中的标记化错误

python中的标记化错误,python,conflict,tokenize,Python,Conflict,Tokenize,我从PythonDocx库example-extracttext.py中获得了这个示例程序,用于从docx文件中提取文本 #!/usr/bin/env python """ This file opens a docx (Office 2007) file and dumps the text. If you need to extract text from documents, use this file as a basis for your work. Part of Python'

我从PythonDocx库example-extracttext.py中获得了这个示例程序,用于从docx文件中提取文本

#!/usr/bin/env python
"""
This file opens a docx (Office 2007) file and dumps the text.

If you need to extract text from documents, use this file as a basis for your
work.

Part of Python's docx module - http://github.com/mikemaccana/python-docx
See LICENSE for licensing information.
"""

import sys

from docx import opendocx, getdocumenttext

if __name__ == '__main__':
    try:
        document = opendocx(sys.argv[1])
        newfile = open(sys.argv[2], 'w')
    except:
        print(
            "Please supply an input and output file. For example:\n"
            "  example-extracttext.py 'My Office 2007 document.docx' 'outp"
            "utfile.txt'"
        )
        exit()

    # Fetch all the text out of the document we just created
    paratextlist = getdocumenttext(document)

    # Make explicit unicode version
    newparatextlist = []
    for paratext in paratextlist:
        newparatextlist.append(paratext.encode("utf-8"))

    # Print out text of document with two newlines under each paragraph
    newfile.write('\n\n'.join(newparatextlist))
它运行得很好,但是当我将另一个名为tokenize.py的程序(在下面给出)放在同一个目录中时

import nltk.data
tokenizer = nltk.data.load('tokenizers/punkt/english.pickle')
fo = open(sys.argv[1], "r")
data = fo.read()
print '\n-----\n'.join(tokenizer.tokenize(data))
它给出了以下错误

Traceback (most recent call last):
  File "./example-extracttext.py", line 14, in <module>
    from docx import opendocx, getdocumenttext
  File "/usr/local/lib/python2.7/dist-packages/docx-0.2.1-py2.7.egg/docx.py", line 12, in <module>
    from lxml import etree
  File "parsertarget.pxi", line 4, in init lxml.etree (src/lxml/lxml.etree.c:178742)
  File "/usr/lib/python2.7/inspect.py", line 39, in <module>
    import tokenize
  File "/home/sriram/NLP_TOOLS/EDITING_TOOL/NLP/sriram_work/tokenize.py", line 3, in <module>
    import nltk.data
  File "/usr/local/lib/python2.7/dist-packages/nltk/__init__.py", line 106, in <module>
    from decorators import decorator, memoize
  File "/usr/local/lib/python2.7/dist-packages/nltk/decorators.py", line 176, in <module>
    @decorator
  File "/usr/local/lib/python2.7/dist-packages/nltk/decorators.py", line 154, in decorator
    if inspect.isclass(caller):
AttributeError: 'module' object has no attribute 'isclass'
回溯(最近一次呼叫最后一次):
文件“/example extracttext.py”,第14行,在
从docx导入opendocx,获取documenttext
文件“/usr/local/lib/python2.7/dist packages/docx-0.2.1-py2.7.egg/docx.py”,第12行,在
从lxml导入etree
文件“parsertarget.pxi”,第4行,在init lxml.etree(src/lxml/lxml.etree.c:178742)中
文件“/usr/lib/python2.7/inspect.py”,第39行,在
导入标记化
文件“/home/sriram/NLP_TOOLS/EDITING_TOOL/NLP/sriram_work/tokenize.py”,第3行,在
导入nltk.data
文件“/usr/local/lib/python2.7/dist-packages/nltk/_-init__.py”,第106行,在
从decorators导入decorator,memoize
文件“/usr/local/lib/python2.7/dist-packages/nltk/decorators.py”,第176行,在
@装饰师
文件“/usr/local/lib/python2.7/dist packages/nltk/decorators.py”,第154行,在decorator中
如果inspect.isclass(调用者):
AttributeError:“模块”对象没有属性“isclass”

请告诉我如何解决这个问题。我想在一个shell脚本中使用这两个程序。

那么tokenyzer.py的代码在哪里?我已经在第二个代码段中提供了代码。尝试将
if inspect.isclass(调用者):
替换为
if inspect is caller:
@rasikeperera:你到底为什么要这样做?这是一个完全不同的测试,它正在修改一个第三方包,不需要修改。