Python TypeError:强制使用Unicode:需要字符串或缓冲区,找到浮点

Python TypeError:强制使用Unicode:需要字符串或缓冲区,找到浮点,python,unicode,Python,Unicode,这是我的密码: import numpy as np import pandas as pd from tqdm import tqdm import re import time import os print u'read data ...' train_data = pd.read_csv('Train.csv', index_col='SentenceId', delimiter='\t', encoding='utf-8') test_data = pd.read_csv('Test

这是我的密码:

import numpy as np
import pandas as pd
from tqdm import tqdm
import re
import time
import os

print u'read data ...'
train_data = pd.read_csv('Train.csv', index_col='SentenceId', delimiter='\t', encoding='utf-8')
test_data = pd.read_csv('Test.csv', index_col='SentenceId', delimiter='\t', encoding='utf-8')
train_label = pd.read_csv('Label.csv', index_col='SentenceId', delimiter='\t', encoding='utf-8')
addition_data = pd.read_csv('addition_data.csv', header=None, encoding='utf-8')[0]
train_data.dropna(inplace=True) # drop some empty sentences
...
def findall(sub_string, string):
    start = 0
    idxs = []
    while True:
        idx = string[start:].find(sub_string)
        if idx == -1:
            return idxs
        else:
            idxs.append(start + idx)
            start += idx + len(sub_string)

tags = {'pos':1, 'neu':2, 'neg':3}

def label2tag(i):
    s = train_data.loc[i]['Content']
    r = np.array([0]*len(s))
    try:
        l = train_label.loc[[i]].as_matrix()
    except:
        return r
    for i in l:
        for j in findall(i[0], s):
            r[j:j+len(i[0])] = tags[i[1]]
    return r

print u'translating target into tags ...'
train_data['label'] = map(label2tag, tqdm(iter(train_data.index)))
这是我得到的错误的回溯:

Traceback (most recent call last):
  File "shibie.py", line 88, in <module>
    train_data['label'] = map(label2tag, tqdm(iter(train_data.index)))
  File "shibie.py", line 83, in label2tag
    for j in findall(i[0], s):
  File "shibie.py", line 66, in findall
    idx = string[start:].find(sub_string)
TypeError: coercing to Unicode: need string or buffer, float found
回溯(最近一次呼叫最后一次):
文件“shibie.py”,第88行,在
列车数据['label']=map(label2tag,tqdm(iter(列车数据索引)))
label2tag中的文件“shibie.py”,第83行
对于findall中的j(i[0],s):
findall中第66行的文件“shibie.py”
idx=string[start:]查找(子字符串)
TypeError:强制使用Unicode:需要字符串或缓冲区,找到浮点

上面的代码在我自己的电脑上运行,但在我学校的Ubuntu上,它会出现很多错误。我不知道这是否是因为我的文件中有空格,但我发现我的文件没有空格。

请上传源代码。在
idx=string[start:]之前。find(sub_string)
函数中的findall行,插入
print(sub_string)
并确保打印的值符合您的预期,您希望此评估有什么行为:
findall(“baabaa”,“baababaaba”)
<代码>[0,6]或
[0,3,6]
?请上载源代码。在
idx=string[start:]之前。在
findall
函数中查找(sub_string)
行,插入
打印(sub_string)
并确保打印的值符合您的预期。您希望此评估有什么行为:
findall(“baa”,“baabaa”)
<代码>[0,6]或
[0,3,6]