Python 我可以展示不同的死刑方法,并预测未来几年吗
我希望能够预测以下数据集死刑的上升/下降 这是1976年美国死刑数据,可在以下网址找到:。 我想让Y轴显示多年来死亡惩罚的数量,用不同的颜色显示不同的方法,x轴显示1999年以后死亡惩罚的数量 这是到目前为止我的代码Python 我可以展示不同的死刑方法,并预测未来几年吗,python,scikit-learn,Python,Scikit Learn,我希望能够预测以下数据集死刑的上升/下降 这是1976年美国死刑数据,可在以下网址找到:。 我想让Y轴显示多年来死亡惩罚的数量,用不同的颜色显示不同的方法,x轴显示1999年以后死亡惩罚的数量 这是到目前为止我的代码 import pandas as pd import numpy as np import matplotlib.pyplot as plt import datetime as dt from sklearn.model_selection import train_test_
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import datetime as dt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
df['Date'] = pd.to_datetime(df['Date'])
res = df[~(df['Date'] < '1999-01-01')]
print(res)
Count = res['Date'].value_counts()
print(Count)
time= df['Date'] = pd.to_datetime(df['Date'])
df['Date']=df['Date'].map(dt.datetime.toordinal)
print (time)
x = np.array(time)
y = np.array(Count)
xtrain, xtest, ytrain, ytest = train_test_split(x,y,test_size=1/3, random_state=0)
听起来你想要的是重塑你的数据,这样你就有了每个“方法”的时间序列,然后你可以在预测模型中使用它。可能值得指出的是,“方法”的分布确实是倾斜的(值从1999年起),因此很难/不可能预测其中的大多数:
df['Method'].value_counts()
# Lethal Injection 923
# Electrocution 17
# Gas Chamber 1
# Firing Squad 1
下面是一个解决方案,它将帮助您重塑数据,以获得每个“方法”的时间序列数据(我在最后添加了更多的解释):
我们可以检查数据的新形状是否为我们提供了正确的“方法”计数数:
解释
df['Date'] = pd.to_datetime(df['Date'])
# Filter out rows where date values where the year is less than 1999
df = df[df['Date'].dt.year >= 1999]
# Set the index to be the datetime
df = df.set_index('Date')
# This bit gets interesting - we're grouping by each method and then resampling
# within each group so that we get a row per month, where each month now has a
# count of all the previous rows associated with that month. As the dataframe is
# now filled with the same count value for each column, we arbitrarily take the
# first one which is 'Name'
# Note: you can change the resampling frequency to any time period you want,
# I've just chosen month as it is granular enough to cover the whole period
df2 = df.groupby('Method').resample('1M').agg('count')['Name'].to_frame()
# Name
# Method Date
# Electrocution 1999-06-30 1
# 1999-07-31 1
# 1999-08-31 1
# 1999-09-30 0
# 1999-10-31 0
# ... ...
# Lethal Injection 2016-08-31 0
# 2016-09-30 0
# 2016-10-31 2
# 2016-11-30 1
# 2016-12-31 2
df2 = df2.reset_index().pivot(index='Date',columns='Method',values='Name').fillna(0)
# Method Electrocution Firing Squad Gas Chamber Lethal Injection
# Date
# 1999-01-31 0.0 0.0 0.0 10.0
# 1999-02-28 0.0 0.0 0.0 12.0
# 1999-03-31 0.0 0.0 1.0 7.0
# 1999-04-30 0.0 0.0 0.0 10.0
# 1999-05-31 0.0 0.0 0.0 6.0
# ... ... ... ... ...
# 2016-08-31 0.0 0.0 0.0 0.0
# 2016-09-30 0.0 0.0 0.0 0.0
# 2016-10-31 0.0 0.0 0.0 2.0
# 2016-11-30 0.0 0.0 0.0 1.0
# 2016-12-31 0.0 0.0 0.0 2.0
听起来你想要的是重塑你的数据,这样你就有了每个“方法”的时间序列,然后你可以在预测模型中使用它。可能值得指出的是,“方法”的分布确实是倾斜的(值从1999年起),因此很难/不可能预测其中的大多数:
df['Method'].value_counts()
# Lethal Injection 923
# Electrocution 17
# Gas Chamber 1
# Firing Squad 1
下面是一个解决方案,它将帮助您重塑数据,以获得每个“方法”的时间序列数据(我在最后添加了更多的解释):
我们可以检查数据的新形状是否为我们提供了正确的“方法”计数数:
解释
df['Date'] = pd.to_datetime(df['Date'])
# Filter out rows where date values where the year is less than 1999
df = df[df['Date'].dt.year >= 1999]
# Set the index to be the datetime
df = df.set_index('Date')
# This bit gets interesting - we're grouping by each method and then resampling
# within each group so that we get a row per month, where each month now has a
# count of all the previous rows associated with that month. As the dataframe is
# now filled with the same count value for each column, we arbitrarily take the
# first one which is 'Name'
# Note: you can change the resampling frequency to any time period you want,
# I've just chosen month as it is granular enough to cover the whole period
df2 = df.groupby('Method').resample('1M').agg('count')['Name'].to_frame()
# Name
# Method Date
# Electrocution 1999-06-30 1
# 1999-07-31 1
# 1999-08-31 1
# 1999-09-30 0
# 1999-10-31 0
# ... ...
# Lethal Injection 2016-08-31 0
# 2016-09-30 0
# 2016-10-31 2
# 2016-11-30 1
# 2016-12-31 2
df2 = df2.reset_index().pivot(index='Date',columns='Method',values='Name').fillna(0)
# Method Electrocution Firing Squad Gas Chamber Lethal Injection
# Date
# 1999-01-31 0.0 0.0 0.0 10.0
# 1999-02-28 0.0 0.0 0.0 12.0
# 1999-03-31 0.0 0.0 1.0 7.0
# 1999-04-30 0.0 0.0 0.0 10.0
# 1999-05-31 0.0 0.0 0.0 6.0
# ... ... ... ... ...
# 2016-08-31 0.0 0.0 0.0 0.0
# 2016-09-30 0.0 0.0 0.0 0.0
# 2016-10-31 0.0 0.0 0.0 2.0
# 2016-11-30 0.0 0.0 0.0 1.0
# 2016-12-31 0.0 0.0 0.0 2.0
您好,请您再详细说明一下您所使用的特定部件(以及您尝试过的部件和出现的错误消息)?Yeh抱歉,我被卡住了,因为它出现了一个错误,这是错误'ValueError:找到了样本数不一致的输入变量:[1442,834]“当您运行最后一行代码(train_test_split)时,是否会出现错误?是的,我认为这是由于我在那里编辑了代码Previously Hi,能否请您添加一些关于您被卡住的特定部分的详细信息(以及您尝试过的内容和出现的错误消息)?是的,抱歉,我被卡住了,因为它出现了错误,这是错误“ValueError:Found输入变量的样本数不一致:[1442,834]”运行最后一行代码(train_test_split)时是否出现错误?是的,我认为是由于我事先编辑了代码
df['Date'] = pd.to_datetime(df['Date'])
# Filter out rows where date values where the year is less than 1999
df = df[df['Date'].dt.year >= 1999]
# Set the index to be the datetime
df = df.set_index('Date')
# This bit gets interesting - we're grouping by each method and then resampling
# within each group so that we get a row per month, where each month now has a
# count of all the previous rows associated with that month. As the dataframe is
# now filled with the same count value for each column, we arbitrarily take the
# first one which is 'Name'
# Note: you can change the resampling frequency to any time period you want,
# I've just chosen month as it is granular enough to cover the whole period
df2 = df.groupby('Method').resample('1M').agg('count')['Name'].to_frame()
# Name
# Method Date
# Electrocution 1999-06-30 1
# 1999-07-31 1
# 1999-08-31 1
# 1999-09-30 0
# 1999-10-31 0
# ... ...
# Lethal Injection 2016-08-31 0
# 2016-09-30 0
# 2016-10-31 2
# 2016-11-30 1
# 2016-12-31 2
df2 = df2.reset_index().pivot(index='Date',columns='Method',values='Name').fillna(0)
# Method Electrocution Firing Squad Gas Chamber Lethal Injection
# Date
# 1999-01-31 0.0 0.0 0.0 10.0
# 1999-02-28 0.0 0.0 0.0 12.0
# 1999-03-31 0.0 0.0 1.0 7.0
# 1999-04-30 0.0 0.0 0.0 10.0
# 1999-05-31 0.0 0.0 0.0 6.0
# ... ... ... ... ...
# 2016-08-31 0.0 0.0 0.0 0.0
# 2016-09-30 0.0 0.0 0.0 0.0
# 2016-10-31 0.0 0.0 0.0 2.0
# 2016-11-30 0.0 0.0 0.0 1.0
# 2016-12-31 0.0 0.0 0.0 2.0