Warning: file_get_contents(/data/phpspider/zhask/data//catemap/2/python/288.json): failed to open stream: No such file or directory in /data/phpspider/zhask/libs/function.php on line 167

Warning: Invalid argument supplied for foreach() in /data/phpspider/zhask/libs/tag.function.php on line 1116

Notice: Undefined index: in /data/phpspider/zhask/libs/function.php on line 180

Warning: array_chunk() expects parameter 1 to be array, null given in /data/phpspider/zhask/libs/function.php on line 181
Python 尝试构建决策树时出现TypeError_Python_Pandas_Machine Learning_Decision Tree - Fatal编程技术网

Python 尝试构建决策树时出现TypeError

Python 尝试构建决策树时出现TypeError,python,pandas,machine-learning,decision-tree,Python,Pandas,Machine Learning,Decision Tree,我正在尝试建立一个决策树,这是我的数据: d = {'height':[0,0,1,1,1],'length':[1,1,0,0,1],'width':[0,0,1,1,1],'label':['Apple','Apple','Grape','Grape','Lemon']} training_data = pd.DataFrame(d) training_data 这是用于尝试设置数据分区问题的代码: class Question: #used for the thres

我正在尝试建立一个决策树,这是我的数据:

d = {'height':[0,0,1,1,1],'length':[1,1,0,0,1],'width':[0,0,1,1,1],'label':['Apple','Apple','Grape','Grape','Lemon']}

training_data = pd.DataFrame(d)
training_data
这是用于尝试设置数据分区问题的代码:

class Question:
    
    #used for the threshold used to partition the data
    def __init__(self, column, value):
        self.column = column #storing a column number
        self.value = value #storing a column value
        
    def match(self,example):
        
        #comparing feature value in an example to the 
        #feature value in the question
        
        val = example[self.column]
        if is_numeric(val):

            if val == 0:
                return int(val) >= self.value

            if val == 1:
                return val <= self.value
        
        #if the value is numeric, see if the value is greater than or
        #equal to three for example, return this in a separate branch
        else:
            return val == self.value
        #if the value is not numeric return it in the other branch
        #with things that aren't numeric and aren't greater than or
        #equal to three, for example

    def __repr__(self):
        
        #printing the question in a readable format
        
        condition = '=='
        
        if is_numeric(self.value):
            condition = '>='
        return "Is %s %s %s?" % (
            header[self.column], condition, int(self.value))   
这是我得到的错误,请帮忙! 我知道我的数据都是数字,但我不明白为什么一个值会被归类为字符串。我尝试将这些值转换为浮点数,然后转换为整数,并尝试使用pd.to_numeric()

TypeError回溯(最近一次调用)
在里面
1#示例:找到此培训数据集的最佳问题
2.
---->3最佳收益,最佳问题=找到最佳分割(训练数据)
4最佳问题
在查找最佳分割(df)中
20
21#尝试拆分数据集
--->22正确行,错误行=分区(df,问题)
23
24#如果分区
分区中(df,问题)
18行为真,行为假=[],[]
19对于loc,df.iterrows()中的行:
--->20如果问题匹配(第行):
21行。追加(行)
22.其他:
在匹配中(自我,示例)
18
19如果val==0:
--->20返回int(val)>=自身值
21
22如果val==1:
TypeError:“>=”在“int”和“str”的实例之间不受支持

在数据流中,它在哪里变成字符串?我看不出你追溯了它的历史;。第一次尝试70行代码主要是你的工作。请提供预期的价格。显示中间结果与您预期的不同之处。我们应该能够复制和粘贴一个连续的代码块,执行该文件,并再现您的问题以及跟踪问题点的输出。这让我们可以根据您的测试数据和期望的输出来测试我们的建议。
def find_best_split(df):
    #keeping track of best information gain
    best_gain = 0
    #keep track of the feature/value that produced it
    best_question = None
    current_uncertainity = gini(df)
    n_features = len(df.columns[0:-1]) #number of columns, goes from 0 to x
    
    #iterating through the "features"(columns) in the range of columns
    for col in range(n_features):
        
        #set() builds an unordered collection of unique elements
        #unique values in the columns
        values = set([row[col] for row in df])
        
        #iterating through all values
        for val in values:
            
            question = Question(col, val)
            
            #try splitting the dataset
            true_rows, false_rows = partition(df, question)
            
            #allowing this to skip the previous step if the partitioning
            #question doesn't end up separating the data
            
            if len(true_rows) == 0 or len(false_rows) == 0:
                continue
                
            gain = info_gain(true_rows, false_rows, current_uncertainty)
            
            #can normally just use >, but >= is specific to this example
            #and we will see why
            if gain >= best_gain:
                best_gain, best_question = gain, question
                
    return best_gain, best_question
best_gain, best_question = find_best_split(training_data)
best_question
TypeError                                 Traceback (most recent call last)
<ipython-input-79-48db17d94fd7> in <module>
      1 #example: find the best question to ask for this training dataset
      2 
----> 3 best_gain, best_question = find_best_split(training_data)
      4 best_question

<ipython-input-78-776718e68801> in find_best_split(df)
     20 
     21             #try splitting the dataset
---> 22             true_rows, false_rows = partition(df, question)
     23 
     24             #allowing this to skip the previous step if the partitioning

<ipython-input-64-c3975579f55f> in partition(df, question)
     18         true_rows, false_rows = [],[]
     19         for loc, row in df.iterrows():
---> 20             if question.match(row):
     21                 true_rows.append(row)
     22             else:

<ipython-input-59-30d3649cbfba> in match(self, example)
     18 
     19             if val == 0:
---> 20                 return int(val) >= self.value
     21 
     22             if val == 1:

TypeError: '>=' not supported between instances of 'int' and 'str'