Python TypeError:强制使用Unicode:需要字符串或缓冲区,找到浮点
这是我的密码:Python TypeError:强制使用Unicode:需要字符串或缓冲区,找到浮点,python,unicode,Python,Unicode,这是我的密码: import numpy as np import pandas as pd from tqdm import tqdm import re import time import os print u'read data ...' train_data = pd.read_csv('Train.csv', index_col='SentenceId', delimiter='\t', encoding='utf-8') test_data = pd.read_csv('Test
import numpy as np
import pandas as pd
from tqdm import tqdm
import re
import time
import os
print u'read data ...'
train_data = pd.read_csv('Train.csv', index_col='SentenceId', delimiter='\t', encoding='utf-8')
test_data = pd.read_csv('Test.csv', index_col='SentenceId', delimiter='\t', encoding='utf-8')
train_label = pd.read_csv('Label.csv', index_col='SentenceId', delimiter='\t', encoding='utf-8')
addition_data = pd.read_csv('addition_data.csv', header=None, encoding='utf-8')[0]
train_data.dropna(inplace=True) # drop some empty sentences
...
def findall(sub_string, string):
start = 0
idxs = []
while True:
idx = string[start:].find(sub_string)
if idx == -1:
return idxs
else:
idxs.append(start + idx)
start += idx + len(sub_string)
tags = {'pos':1, 'neu':2, 'neg':3}
def label2tag(i):
s = train_data.loc[i]['Content']
r = np.array([0]*len(s))
try:
l = train_label.loc[[i]].as_matrix()
except:
return r
for i in l:
for j in findall(i[0], s):
r[j:j+len(i[0])] = tags[i[1]]
return r
print u'translating target into tags ...'
train_data['label'] = map(label2tag, tqdm(iter(train_data.index)))
这是我得到的错误的回溯:
Traceback (most recent call last):
File "shibie.py", line 88, in <module>
train_data['label'] = map(label2tag, tqdm(iter(train_data.index)))
File "shibie.py", line 83, in label2tag
for j in findall(i[0], s):
File "shibie.py", line 66, in findall
idx = string[start:].find(sub_string)
TypeError: coercing to Unicode: need string or buffer, float found
回溯(最近一次呼叫最后一次):
文件“shibie.py”,第88行,在
列车数据['label']=map(label2tag,tqdm(iter(列车数据索引)))
label2tag中的文件“shibie.py”,第83行
对于findall中的j(i[0],s):
findall中第66行的文件“shibie.py”
idx=string[start:]查找(子字符串)
TypeError:强制使用Unicode:需要字符串或缓冲区,找到浮点
上面的代码在我自己的电脑上运行,但在我学校的Ubuntu上,它会出现很多错误。我不知道这是否是因为我的文件中有空格,但我发现我的文件没有空格。请上传源代码。在
idx=string[start:]之前。find(sub_string)
函数中的findall行,插入print(sub_string)
并确保打印的值符合您的预期,您希望此评估有什么行为:findall(“baabaa”,“baababaaba”)
<代码>[0,6]或[0,3,6]
?请上载源代码。在idx=string[start:]之前。在findall
函数中查找(sub_string)
行,插入打印(sub_string)
并确保打印的值符合您的预期。您希望此评估有什么行为:findall(“baa”,“baabaa”)
<代码>[0,6]或[0,3,6]
?