Python 将回归树输出转换为表
这段代码适合python中的回归树。我想将此基于文本的输出转换为表格格式 但是,给定的解决方案不起作用Python 将回归树输出转换为表,python,pandas,Python,Pandas,这段代码适合python中的回归树。我想将此基于文本的输出转换为表格格式 但是,给定的解决方案不起作用 import pandas as pd import numpy as np from sklearn.tree import DecisionTreeRegressor from sklearn import tree dataset = np.array( [['Asset Flip', 100, 1000], ['Text Based', 500, 3000], ['Visual
import pandas as pd
import numpy as np
from sklearn.tree import DecisionTreeRegressor
from sklearn import tree
dataset = np.array(
[['Asset Flip', 100, 1000],
['Text Based', 500, 3000],
['Visual Novel', 1500, 5000],
['2D Pixel Art', 3500, 8000],
['2D Vector Art', 5000, 6500],
['Strategy', 6000, 7000],
['First Person Shooter', 8000, 15000],
['Simulator', 9500, 20000],
['Racing', 12000, 21000],
['RPG', 14000, 25000],
['Sandbox', 15500, 27000],
['Open-World', 16500, 30000],
['MMOFPS', 25000, 52000],
['MMORPG', 30000, 80000]
])
X = dataset[:, 1:2].astype(int)
y = dataset[:, 2].astype(int)
regressor = DecisionTreeRegressor(random_state = 0)
regressor.fit(X, y)
text_rule = tree.export_text(regressor )
print(text_rule)
我得到的输出是这样的
print(text_rule)
|--- feature_0 <= 20750.00
| |--- feature_0 <= 7000.00
| | |--- feature_0 <= 1000.00
| | | |--- feature_0 <= 300.00
| | | | |--- value: [1000.00]
| | | |--- feature_0 > 300.00
| | | | |--- value: [3000.00]
| | |--- feature_0 > 1000.00
| | | |--- feature_0 <= 2500.00
| | | | |--- value: [5000.00]
| | | |--- feature_0 > 2500.00
| | | | |--- feature_0 <= 4250.00
| | | | | |--- value: [8000.00]
| | | | |--- feature_0 > 4250.00
| | | | | |--- feature_0 <= 5500.00
| | | | | | |--- value: [6500.00]
| | | | | |--- feature_0 > 5500.00
| | | | | | |--- value: [7000.00]
| |--- feature_0 > 7000.00
| | |--- feature_0 <= 13000.00
| | | |--- feature_0 <= 8750.00
| | | | |--- value: [15000.00]
| | | |--- feature_0 > 8750.00
| | | | |--- feature_0 <= 10750.00
| | | | | |--- value: [20000.00]
| | | | |--- feature_0 > 10750.00
| | | | | |--- value: [21000.00]
| | |--- feature_0 > 13000.00
| | | |--- feature_0 <= 16000.00
| | | | |--- feature_0 <= 14750.00
| | | | | |--- value: [25000.00]
| | | | |--- feature_0 > 14750.00
| | | | | |--- value: [27000.00]
| | | |--- feature_0 > 16000.00
| | | | |--- value: [30000.00]
|--- feature_0 > 20750.00
| |--- feature_0 <= 27500.00
| | |--- value: [52000.00]
| |--- feature_0 > 27500.00
| | |--- value: [80000.00]
打印(文本\u规则)
|---功能_0 20750.00
||---特征_027500.00
|||---值:[80000.00]
我想在pandas表中转换此规则,类似于以下形式。如何做到这一点
规则的绘图版本如下所示(供参考)。请注意,在表中,我显示了规则的最左边部分
从以下位置修改代码:
导入sklearn
作为pd进口熊猫
定义树到定义树(注册树、特征名称):
tree\uu=reg\u tree.tree_
功能名称=[
如果i!=sklearn.tree.\u tree.tree\u未定义,则功能名称[i]
对于树中的i。功能
]
def递归(节点、行、ret):
如果树特征[节点]!=sklearn.tree.\u tree.tree\u未定义:
名称=特征\名称[节点]
阈值=树\阈值[节点]
#将规则添加到行并搜索左分支
行[-1]。追加(名称+“”+str(阈值))
递归(树\子对象\右[node],行,ret)
其他:
#添加输出规则并开始新行
label=树值[节点]
ret.append(“return”+str(标签[0][0]))
行。追加([]))
#初始化
规则=[]]
VAL=[]
#用初始值调用递归函数
递归(0,规则,VAL)
#转换为表并输出
df=pd.DataFrame(rules).dropna(how='all')
df['Return']=pd.系列(VAL)
返回df
这将返回一个数据帧:
0 1 2 3 Return
0 feature <= 20750.0 feature <= 7000.0 feature <= 1000.0 feature <= 300.0 return 1000.0
1 feature > 300.0 None None None return 3000.0
2 feature > 1000.0 feature <= 2500.0 None None return 5000.0
3 feature > 2500.0 feature <= 4250.0 None None return 8000.0
4 feature > 4250.0 feature <= 5500.0 None None return 6500.0
5 feature > 5500.0 None None None return 7000.0
6 feature > 7000.0 feature <= 13000.0 feature <= 8750.0 None return 15000.0
7 feature > 8750.0 feature <= 10750.0 None None return 20000.0
8 feature > 10750.0 None None None return 21000.0
9 feature > 13000.0 feature <= 16000.0 feature <= 14750.0 None return 25000.0
10 feature > 14750.0 None None None return 27000.0
11 feature > 16000.0 None None None return 30000.0
12 feature > 20750.0 feature <= 27500.0 None None return 52000.0
13 feature > 27500.0 None None None return 80000.0
0123返回
0功能20750.0功能27500.0无返回80000.0
如果您处理的是分类决策树,您可以尝试一下
import pandas as pd
text="""
|--- Age <= 0.63
| |--- EstimatedSalary <= 0.61
| | |--- Age <= -0.16
| | | |--- class: 0
| | |--- Age > -0.16
| | | |--- EstimatedSalary <= -0.06
| | | | |--- class: 0
| | | |--- EstimatedSalary > -0.06
| | | | |--- EstimatedSalary <= 0.40
| | | | | |--- EstimatedSalary <= 0.03
| | | | | | |--- class: 1
"""
def tree_parser(text):
lines=text.splitlines()
max_levels=max([l.count('|') for l in lines])
result={}
for i in range(0,max_levels+1):
result['Column'+str(i)]=[]
for line in lines:
level=line.count('|')
currvalue=result.get('Column'+str(level),[])
currvalue.append(line.replace('|','').replace('-',''))
result['Column'+str(level)]=currvalue
for i in range(0, max_levels + 1):
if i>level and line.find('class')!=-1:
result['Column' + str(i)].append(None)
if i<level:
parent_value=result.get('Column' + str(i),[])
if len(parent_value)!=len(currvalue):
parent_value.append(parent_value[len(parent_value)-1])
return result
result=tree_parser(text)
df=pd.DataFrame(result)
df=df.drop(columns=['Column0'])
df.to_csv('treeout1.csv',index=False)
将熊猫作为pd导入
text=”“”
|---年龄你能分享一个你正在寻找的输出的例子吗?@quizzic_panini已经添加了输出格式和规则的视觉表示。谢谢。这就像一个符咒!在最后一行,而不是“值”,它将是“VAL”