python中的标记化错误_Python_Conflict_Tokenize

python中的标记化错误

python

python中的标记化错误,python,conflict,tokenize,Python,Conflict,Tokenize,我从PythonDocx库example-extracttext.py中获得了这个示例程序，用于从docx文件中提取文本 #!/usr/bin/env python """ This file opens a docx (Office 2007) file and dumps the text. If you need to extract text from documents, use this file as a basis for your work. Part of Python'

我从PythonDocx库example-extracttext.py中获得了这个示例程序，用于从docx文件中提取文本

#!/usr/bin/env python
"""
This file opens a docx (Office 2007) file and dumps the text.

If you need to extract text from documents, use this file as a basis for your
work.

Part of Python's docx module - http://github.com/mikemaccana/python-docx
See LICENSE for licensing information.
"""

import sys

from docx import opendocx, getdocumenttext

if __name__ == '__main__':
    try:
        document = opendocx(sys.argv[1])
        newfile = open(sys.argv[2], 'w')
    except:
        print(
            "Please supply an input and output file. For example:\n"
            "  example-extracttext.py 'My Office 2007 document.docx' 'outp"
            "utfile.txt'"
        )
        exit()

    # Fetch all the text out of the document we just created
    paratextlist = getdocumenttext(document)

    # Make explicit unicode version
    newparatextlist = []
    for paratext in paratextlist:
        newparatextlist.append(paratext.encode("utf-8"))

    # Print out text of document with two newlines under each paragraph
    newfile.write('\n\n'.join(newparatextlist))

它运行得很好，但是当我将另一个名为tokenize.py的程序（在下面给出）放在同一个目录中时

import nltk.data
tokenizer = nltk.data.load('tokenizers/punkt/english.pickle')
fo = open(sys.argv[1], "r")
data = fo.read()
print '\n-----\n'.join(tokenizer.tokenize(data))

它给出了以下错误

Traceback (most recent call last):
  File "./example-extracttext.py", line 14, in <module>
    from docx import opendocx, getdocumenttext
  File "/usr/local/lib/python2.7/dist-packages/docx-0.2.1-py2.7.egg/docx.py", line 12, in <module>
    from lxml import etree
  File "parsertarget.pxi", line 4, in init lxml.etree (src/lxml/lxml.etree.c:178742)
  File "/usr/lib/python2.7/inspect.py", line 39, in <module>
    import tokenize
  File "/home/sriram/NLP_TOOLS/EDITING_TOOL/NLP/sriram_work/tokenize.py", line 3, in <module>
    import nltk.data
  File "/usr/local/lib/python2.7/dist-packages/nltk/__init__.py", line 106, in <module>
    from decorators import decorator, memoize
  File "/usr/local/lib/python2.7/dist-packages/nltk/decorators.py", line 176, in <module>
    @decorator
  File "/usr/local/lib/python2.7/dist-packages/nltk/decorators.py", line 154, in decorator
    if inspect.isclass(caller):
AttributeError: 'module' object has no attribute 'isclass'

回溯（最近一次呼叫最后一次）：
文件“/example extracttext.py”，第14行，在
从docx导入opendocx，获取documenttext
文件“/usr/local/lib/python2.7/dist packages/docx-0.2.1-py2.7.egg/docx.py”，第12行，在
从lxml导入etree
文件“parsertarget.pxi”，第4行，在init lxml.etree（src/lxml/lxml.etree.c:178742）中
文件“/usr/lib/python2.7/inspect.py”，第39行，在
导入标记化
文件“/home/sriram/NLP_TOOLS/EDITING_TOOL/NLP/sriram_work/tokenize.py”，第3行，在
导入nltk.data
文件“/usr/local/lib/python2.7/dist-packages/nltk/_-init__.py”，第106行，在
从decorators导入decorator，memoize
文件“/usr/local/lib/python2.7/dist-packages/nltk/decorators.py”，第176行，在
@装饰师
文件“/usr/local/lib/python2.7/dist packages/nltk/decorators.py”，第154行，在decorator中
如果inspect.isclass（调用者）：
AttributeError:“模块”对象没有属性“isclass”

请告诉我如何解决这个问题。我想在一个shell脚本中使用这两个程序。

那么tokenyzer.py的代码在哪里？我已经在第二个代码段中提供了代码。尝试将

if inspect.isclass（调用者）：

替换为

if inspect is caller:

@rasikeperera:你到底为什么要这样做？这是一个完全不同的测试，它正在修改一个第三方包，不需要修改。