Python 有没有更好的方法来标记一些字符串？_Python_Python 3.x_Nlp_Tokenize

Python 有没有更好的方法来标记一些字符串？

python python-3.x nlp

Python 有没有更好的方法来标记一些字符串？,python,python-3.x,nlp,tokenize,Python,Python 3.x,Nlp,Tokenize,我试图为一些NLP编写python字符串标记化代码，并得出以下代码： str = ['I am Batman.','I loved the tea.','I will never go to that mall again!'] s= [] a=0 for line in str: s.append([]) s[a].append(line.split()) a+=1 print(s) 结果是： [[['I', 'am', 'Batman.']], [['I', 'lo

我试图为一些NLP编写python字符串标记化代码，并得出以下代码：

str = ['I am Batman.','I loved the tea.','I will never go to that mall again!']
s= []
a=0
for line in str:
    s.append([])
    s[a].append(line.split())
    a+=1
print(s)

结果是：

[[['I', 'am', 'Batman.']], [['I', 'loved', 'the', 'tea.']], [['I', 'will', 'never', 'go', 'to', 'that', 'mall', 'again!']]]

如您所见，列表现在有一个额外的维度，例如，如果我想要“蝙蝠侠”一词，我必须键入

s[0][0][2]

，而不是

s[0][2]

，因此我将代码更改为：

str = ['I am Batman.','I loved the tea.','I will never go to that mall again!']
s= []
a=0
m = []
for line in str:
    s.append([])
    m=(line.split())
    for word in m:
        s[a].append(word)
    a += 1
print(s)

这让我得到了正确的输出：

[['I', 'am', 'Batman.'], ['I', 'loved', 'the', 'tea.'], ['I', 'will', 'never', 'go', 'to', 'that', 'mall', 'again!']]

但是我有一种感觉，这可以用一个循环来实现，因为我将要导入的数据集将非常大，

的复杂性将比

n^2

好得多，所以，有没有更好的方法用一个循环来实现这一点呢？

您应该使用

split（）

用于循环中的每个字符串

列表理解示例：

str = ['I am Batman.','I loved the tea.','I will never go to that mall again!']

[s.split() for s in str]

[['I', 'am', 'Batman.'],
 ['I', 'loved', 'the', 'tea.'],
 ['I', 'will', 'never', 'go', 'to', 'that', 'mall', 'again!']]

[line.split() for line in str]

应该对循环中的每个字符串使用

split（）

列表理解示例：

str = ['I am Batman.','I loved the tea.','I will never go to that mall again!']

[s.split() for s in str]

[['I', 'am', 'Batman.'],
 ['I', 'loved', 'the', 'tea.'],
 ['I', 'will', 'never', 'go', 'to', 'that', 'mall', 'again!']]

[line.split() for line in str]

见此：-

>>> list1 = ['I am Batman.','I loved the tea.','I will never go to that mall again!']
>>> [i.split() for i in list1]  
# split by default slits on whitespace strings and give output as list

[['I', 'am', 'Batman.'], ['I', 'loved', 'the', 'tea.'], ['I', 'will', 'never', 'go', 'to', 'that', 'mall', 'again!']]

见此：-

>>> list1 = ['I am Batman.','I loved the tea.','I will never go to that mall again!']
>>> [i.split() for i in list1]  
# split by default slits on whitespace strings and give output as list

[['I', 'am', 'Batman.'], ['I', 'loved', 'the', 'tea.'], ['I', 'will', 'never', 'go', 'to', 'that', 'mall', 'again!']]

您的原始代码就在那里

>>> str = ['I am Batman.','I loved the tea.','I will never go to that mall again!']
>>> s=[]
>>> for line in str:
...   s.append(line.split())
...
>>> print(s)
[['I', 'am', 'Batman.'], ['I', 'loved', 'the', 'tea.'], ['I', 'will', 'never', 'go', 'to', 'that', 'mall', 'again!']]

line.split（）

为您提供了一个列表，因此将其附加到循环中。或者直接去理解：

str = ['I am Batman.','I loved the tea.','I will never go to that mall again!']

[s.split() for s in str]

[['I', 'am', 'Batman.'],
 ['I', 'loved', 'the', 'tea.'],
 ['I', 'will', 'never', 'go', 'to', 'that', 'mall', 'again!']]

[line.split() for line in str]

当您说

s.append（[]）

时，索引“a”处有一个空列表，如下所示：

L = []

如果您将

拆分的结果添加到该列表中，例如L.append（[1]）
，那么您最终会在该列表中得到一个列表：[[1]]
您的原始代码就在那里
>>> str = ['I am Batman.','I loved the tea.','I will never go to that mall again!']
>>> s=[]
>>> for line in str:
...   s.append(line.split())
...
>>> print(s)
[['I', 'am', 'Batman.'], ['I', 'loved', 'the', 'tea.'], ['I', 'will', 'never', 'go', 'to', 'that', 'mall', 'again!']]

line.split（）为您提供了一个列表，因此将其附加到循环中。
或者直接去理解：
str = ['I am Batman.','I loved the tea.','I will never go to that mall again!']

[s.split() for s in str]

[['I', 'am', 'Batman.'],
 ['I', 'loved', 'the', 'tea.'],
 ['I', 'will', 'never', 'go', 'to', 'that', 'mall', 'again!']]

[line.split() for line in str]

当您说s.append（[]）
时，索引“a”处有一个空列表，如下所示：
L = []

如果您将拆分的结果附加到该列表中，如L.append（[1]）
，那么您将在该列表中得到一个列表：[[1]]