Python 有没有更好的方法来标记一些字符串?
我试图为一些NLP编写python字符串标记化代码,并得出以下代码:Python 有没有更好的方法来标记一些字符串?,python,python-3.x,nlp,tokenize,Python,Python 3.x,Nlp,Tokenize,我试图为一些NLP编写python字符串标记化代码,并得出以下代码: str = ['I am Batman.','I loved the tea.','I will never go to that mall again!'] s= [] a=0 for line in str: s.append([]) s[a].append(line.split()) a+=1 print(s) 结果是: [[['I', 'am', 'Batman.']], [['I', 'lo
str = ['I am Batman.','I loved the tea.','I will never go to that mall again!']
s= []
a=0
for line in str:
s.append([])
s[a].append(line.split())
a+=1
print(s)
结果是:
[[['I', 'am', 'Batman.']], [['I', 'loved', 'the', 'tea.']], [['I', 'will', 'never', 'go', 'to', 'that', 'mall', 'again!']]]
如您所见,列表现在有一个额外的维度,例如,如果我想要“蝙蝠侠”一词,我必须键入s[0][0][2]
,而不是s[0][2]
,因此我将代码更改为:
str = ['I am Batman.','I loved the tea.','I will never go to that mall again!']
s= []
a=0
m = []
for line in str:
s.append([])
m=(line.split())
for word in m:
s[a].append(word)
a += 1
print(s)
这让我得到了正确的输出:
[['I', 'am', 'Batman.'], ['I', 'loved', 'the', 'tea.'], ['I', 'will', 'never', 'go', 'to', 'that', 'mall', 'again!']]
但是我有一种感觉,这可以用一个循环来实现,因为我将要导入的数据集将非常大,n
的复杂性将比n^2
好得多,所以,有没有更好的方法用一个循环来实现这一点呢?您应该使用split()
用于循环中的每个字符串
列表理解示例:
str = ['I am Batman.','I loved the tea.','I will never go to that mall again!']
[s.split() for s in str]
[['I', 'am', 'Batman.'],
['I', 'loved', 'the', 'tea.'],
['I', 'will', 'never', 'go', 'to', 'that', 'mall', 'again!']]
[line.split() for line in str]
应该对循环中的每个字符串使用split()
列表理解示例:
str = ['I am Batman.','I loved the tea.','I will never go to that mall again!']
[s.split() for s in str]
[['I', 'am', 'Batman.'],
['I', 'loved', 'the', 'tea.'],
['I', 'will', 'never', 'go', 'to', 'that', 'mall', 'again!']]
[line.split() for line in str]
见此:-
>>> list1 = ['I am Batman.','I loved the tea.','I will never go to that mall again!']
>>> [i.split() for i in list1]
# split by default slits on whitespace strings and give output as list
[['I', 'am', 'Batman.'], ['I', 'loved', 'the', 'tea.'], ['I', 'will', 'never', 'go', 'to', 'that', 'mall', 'again!']]
见此:-
>>> list1 = ['I am Batman.','I loved the tea.','I will never go to that mall again!']
>>> [i.split() for i in list1]
# split by default slits on whitespace strings and give output as list
[['I', 'am', 'Batman.'], ['I', 'loved', 'the', 'tea.'], ['I', 'will', 'never', 'go', 'to', 'that', 'mall', 'again!']]
您的原始代码就在那里
>>> str = ['I am Batman.','I loved the tea.','I will never go to that mall again!']
>>> s=[]
>>> for line in str:
... s.append(line.split())
...
>>> print(s)
[['I', 'am', 'Batman.'], ['I', 'loved', 'the', 'tea.'], ['I', 'will', 'never', 'go', 'to', 'that', 'mall', 'again!']]
line.split()
为您提供了一个列表,因此将其附加到循环中。
或者直接去理解:
str = ['I am Batman.','I loved the tea.','I will never go to that mall again!']
[s.split() for s in str]
[['I', 'am', 'Batman.'],
['I', 'loved', 'the', 'tea.'],
['I', 'will', 'never', 'go', 'to', 'that', 'mall', 'again!']]
[line.split() for line in str]
当您说s.append([])
时,索引“a”处有一个空列表,如下所示:
L = []
如果您将
拆分的结果添加到该列表中,例如L.append([1])
,那么您最终会在该列表中得到一个列表:[[1]]
您的原始代码就在那里
>>> str = ['I am Batman.','I loved the tea.','I will never go to that mall again!']
>>> s=[]
>>> for line in str:
... s.append(line.split())
...
>>> print(s)
[['I', 'am', 'Batman.'], ['I', 'loved', 'the', 'tea.'], ['I', 'will', 'never', 'go', 'to', 'that', 'mall', 'again!']]
line.split()
为您提供了一个列表,因此将其附加到循环中。
或者直接去理解:
str = ['I am Batman.','I loved the tea.','I will never go to that mall again!']
[s.split() for s in str]
[['I', 'am', 'Batman.'],
['I', 'loved', 'the', 'tea.'],
['I', 'will', 'never', 'go', 'to', 'that', 'mall', 'again!']]
[line.split() for line in str]
当您说s.append([])
时,索引“a”处有一个空列表,如下所示:
L = []
如果您将拆分的结果附加到该列表中,如L.append([1])
,那么您将在该列表中得到一个列表:[[1]]