TypeError:应为二进制或unicode字符串，获得618.0 我一直在尝试在我的数据集中实现这个ML线性模型。(https://www.tensorflow.org/tutorials/estimator/linear) 语言：Python 3.8.3 图书馆： TensorFlow 2.4.0 Numpy:1.19.3 熊猫 Matplotlib和其他： import os import sys import numpy as np import pandas as pd import matplotlib.pyplot as plt from IPython.display import clear_output from six.moves import urllib_Python_Tensorflow_Typeerror

TypeError:应为二进制或unicode字符串，获得618.0 我一直在尝试在我的数据集中实现这个ML线性模型。(https://www.tensorflow.org/tutorials/estimator/linear) 语言：Python 3.8.3 图书馆： TensorFlow 2.4.0 Numpy:1.19.3 熊猫 Matplotlib和其他： import os import sys import numpy as np import pandas as pd import matplotlib.pyplot as plt from IPython.display import clear_output from six.moves import urllib

python tensorflow

TypeError:应为二进制或unicode字符串，获得618.0 我一直在尝试在我的数据集中实现这个ML线性模型。(https://www.tensorflow.org/tutorials/estimator/linear) 语言：Python 3.8.3 图书馆： TensorFlow 2.4.0 Numpy:1.19.3 熊猫 Matplotlib和其他： import os import sys import numpy as np import pandas as pd import matplotlib.pyplot as plt from IPython.display import clear_output from six.moves import urllib,python,tensorflow,typeerror,Python,Tensorflow,Typeerror,ss1517是我的数据集的名称。它是一个CSV文件，有4116行和20列，并且有很多NaN值（没有一列没有NaN值）分类列是我的数据集中的分类列数字列是我的数据集中的数字列 CATEGORICAL_COLUMNS = ['Location Name', 'Location Code', 'Borough', 'Register', 'Building Name', 'Schools in Building', 'ENGroupA', 'RangeA'] NUMERIC_COLUMNS = [

ss1517是我的数据集的名称。它是一个CSV文件，有4116行和20列，并且有很多NaN值（没有一列没有NaN值）

分类列是我的数据集中的分类列
数字列是我的数据集中的数字列

CATEGORICAL_COLUMNS = ['Location Name', 'Location Code', 'Borough', 'Register', 'Building Name', 'Schools in Building', 'ENGroupA', 'RangeA']
NUMERIC_COLUMNS = ['Geographical District Code', '# Schools', 'Major N', 'Oth N', 'NoCrim N', 'Prop N', 'Vio N', 'AvgOfOth N', 'AvgOfNoCrim N', 'AvgOfProp N', 'AvgOfVio N']

feature_columns = []#Sadece linear regression'u eğitmek için kullanıyoruz
for feature_name in CATEGORICAL_COLUMNS:
  vocabulary = traindata[feature_name].unique()
  feature_columns.append(tf.feature_column.categorical_column_with_vocabulary_list(feature_name, vocabulary))
for feature_name in NUMERIC_COLUMNS:
  feature_columns.append(tf.feature_column.numeric_column(feature_name, dtype=tf.float32))

每次我试图用

df.fillna（method=“ffill”）

，

df.fillna（method=“bfill”）

，

df.fillna（method=“bfill”）

，

df.fillna（value=0）

，或

df.fillna（value=“randomstringvalues）

，来填充Na值时，我都会遇到这个错误（

。
我还尝试使用df.dropna（）


不用说，当我尝试使用NaN值运行代码时，它无法运行。

我有两个问题。

第一个问题，我如何处理我的NaN值，以便在将来不会看到此错误（TypeError:Expected binary或unicode string，Get 618.0）
第二个问题，我如何消除这个错误，并将我的数据集快速实现到这个模型中？

注意：我确信我没有输入任何错误。
我看不到您的数据，因此这是一个猜测。打开您的.csv文件并搜索618.0。也许，某些行没有所有预期值，并且解析器正在尝试加载一个数字值，其中一个是预期的分类值。
查看是否存在“格式”问题的另一种方法是在excel中打开csv，查看所有行的格式是否正确。
我猜您的数据中有一些非unicode字符。非unicode字符如下：� ä
任何不是字母、数字或符号的东西。
这里有两个选项，可以找到所有这些字符，并将它们与其他字符或其他字符组合在一起
或者，您可以在读取csv文件时使用正确的编码
我的CSV文件是这两个文件的组合：，。我这样组合：ss1517=pd.concat（[ss1516，ss1617]，axis=0，join=“inner”）我搜索了618.0，但没有找到任何关于它的信息，正如您所看到的，行对于CSV文件来说已经足够合适了（不是严格地像XLSX文件那样格式化，而是用coma格式化）
traindata = ss1517.iloc[0:2470,:] # 60 % of my dataset is splitted by training set
evaldata = ss1517.iloc[2470:4116, :] # 40 % of my dataset is splitted by eval set
ytrain = traindata.pop("AvgOfMajor N")
yeval = evaldata.pop("AvgOfMajor N")

CATEGORICAL_COLUMNS = ['Location Name', 'Location Code', 'Borough', 'Register', 'Building Name', 'Schools in Building', 'ENGroupA', 'RangeA']
NUMERIC_COLUMNS = ['Geographical District Code', '# Schools', 'Major N', 'Oth N', 'NoCrim N', 'Prop N', 'Vio N', 'AvgOfOth N', 'AvgOfNoCrim N', 'AvgOfProp N', 'AvgOfVio N']

feature_columns = []#Sadece linear regression'u eğitmek için kullanıyoruz
for feature_name in CATEGORICAL_COLUMNS:
  vocabulary = traindata[feature_name].unique()
  feature_columns.append(tf.feature_column.categorical_column_with_vocabulary_list(feature_name, vocabulary))
for feature_name in NUMERIC_COLUMNS:
  feature_columns.append(tf.feature_column.numeric_column(feature_name, dtype=tf.float32))

def make_input_fn(data_df, label_df, num_epochs=10, shuffle=True, batch_size=32):
  def input_function():# inner function, this will be returned.
    ds = tf.data.Dataset.from_tensor_slices((dict(data_df), label_df)) # Create tf.data.Dataset object with data and its label
    if shuffle:
      ds = ds.shuffle(1000) # randomize order of data
    ds = ds.batch(batch_size).repeat(num_epochs)
    return ds # return a batch of dataset
  return input_function # return the input_function

train_input_fn = make_input_fn(traindata, ytrain) 
eval_input_fn = make_input_fn(evaldata, yeval, num_epochs=1, shuffle=False) 

linear_est = tf.estimator.LinearClassifier(feature_columns=feature_columns)
linear_est.train(train_input_fn) #train
result = linear_est.evaluate(eval_input_fn) #get model metrics/stats by testing on testing data

clear_output() #clears console output
print(result["accuracy"]) #the result variable is simply dict of stats about our model

data = pandas.read_csv(myfile, encoding='utf-8', quotechar='"', delimiter=',')