Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/352.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181

Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/image-processing/2.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 如何解析由换行符分隔的文本_Python_Nlp - Fatal编程技术网

Python 如何解析由换行符分隔的文本

Python 如何解析由换行符分隔的文本,python,nlp,Python,Nlp,如何解析以换行符分隔的标记,如下面的标记: Wolff PERSON is O in O Argentina LOCATION The O US LOCATION Envoy O noted O 使用python将其转换成这样的完整句子 Wolff is in Argentina The US Envoy noted 您可以为此使用: >>> from StringIO import StringIO >>> from itertools im

如何解析以换行符分隔的标记,如下面的标记:

Wolff PERSON
is O
in O    
Argentina LOCATION

The O
US LOCATION
Envoy O 
noted O
使用python将其转换成这样的完整句子

Wolff is in Argentina
The US Envoy noted
您可以为此使用:

>>> from StringIO import StringIO
>>> from itertools import groupby
>>> s = '''Wolff PERSON
is O
in O    
Argentina LOCATION

The O
US LOCATION
Envoy O 
noted O'''
>>> c = StringIO(s)
>>> for k, g in groupby(c, key=str.isspace):
    if not k:
        print ' '.join(x.split(None, 1)[0] for x in g)
...         
Wolff is in Argentina
The US Envoy noted
如果输入实际上来自字符串而不是文件,则:

for k, g in groupby(s.splitlines(), key= lambda x: not x.strip()):
    if not k:
        print ' '.join(x.split(None, 1)[0] for x in g)
...         
Wolff is in Argentina
The US Envoy noted

c=s.splitlines()
代替StringIO难道不够吗?@PaulMcGuire我以为OP的输入来自一个文件,这就是为什么使用StringIO。不,此代码不适用于
str.splitlines
,因为它会剥离尾随的
\n
,即导致空字符串,这将使
str.isspace
条件失败。@Aश威尼च豪德利,谢谢你。请你解释一下这条线是干什么的<代码>''.join(x.split(None,1)[0]表示g中的x)@DevEx此处
g
是包含当前组行的grouper对象,然后我们在组中的每行上循环,只在空白处拆分一次,并只存储第一个单词。稍后使用空格连接这些单词。我尝试了这个
f=open(“file.txt”,“r”)reader=f.readlines()c=StringIO(reader)表示groupby中的k,g(c,key=str.isspace):如果不是k:print“”。连接(x.split(None,1)[0]表示g中的x)
。请问我做错了什么?