sqlserver中日期列的线性回归分析
我有下面的代码块,它使用线性回归(最小二乘法)计算趋势线的公式。只需求出X轴和Y轴的R平方和相关系数 如果X轴和Y轴为int和float,这将计算精确值sqlserver中日期列的线性回归分析,sql,sql-server,linear-regression,Sql,Sql Server,Linear Regression,我有下面的代码块,它使用线性回归(最小二乘法)计算趋势线的公式。只需求出X轴和Y轴的R平方和相关系数 如果X轴和Y轴为int和float,这将计算精确值 CREATE FUNCTION [dbo].[LinearReqression] (@Data AS XML) RETURNS TABLE AS RETURN ( WITH Array AS ( SELECT x = n.value('@x', 'float'), y = n.value
CREATE FUNCTION [dbo].[LinearReqression] (@Data AS XML)
RETURNS TABLE AS RETURN (
WITH Array AS (
SELECT x = n.value('@x', 'float'),
y = n.value('@y', 'float')
FROM @Data.nodes('/r/n') v(n)
),
Medians AS (
SELECT xbar = AVG(x), ybar = AVG(y)
FROM Array ),
BetaCalc AS (
SELECT Beta = SUM(xdelta * (y - ybar)) / NULLIF(SUM(xdelta * xdelta), 0)
FROM Array
CROSS JOIN Medians
CROSS APPLY ( SELECT xdelta = (x - xbar) ) xd ),
AlphaCalc AS (
SELECT Alpha = ybar - xbar * beta
FROM Medians
CROSS JOIN BetaCalc),
SSCalc AS (
SELECT SS_tot = SUM((y - ybar) * (y - ybar)),
SS_err = SUM((y - (Alpha + Beta * x)) * (y - (Alpha + Beta * x)))
FROM Array
CROSS JOIN Medians
CROSS JOIN AlphaCalc
CROSS JOIN BetaCalc )
SELECT r_squared = CASE WHEN SS_tot = 0 THEN 1.0
ELSE 1.0 - ( SS_err / SS_tot ) END,
Alpha, Beta
FROM AlphaCalc
CROSS JOIN BetaCalc
CROSS JOIN SSCalc
)
用法:
DECLARE @DataTable TABLE (
SourceID INT,
x Date,
y FLOAT
) ;
INSERT INTO @DataTable ( SourceID, x, y )
SELECT ID = 0, x = 1.2, y = 1.0
UNION ALL SELECT 1, 1.6, 1
UNION ALL SELECT 2, 2.0, 1.5
UNION ALL SELECT 3, 2.0, 1.75
UNION ALL SELECT 4, 2.1, 1.85
UNION ALL SELECT 5, 2.1, 2
UNION ALL SELECT 6, 2.2, 3
UNION ALL SELECT 7, 2.2, 3
UNION ALL SELECT 8, 2.3, 3.5
UNION ALL SELECT 9, 2.4, 4
UNION ALL SELECT 10, 2.5, 4
UNION ALL SELECT 11, 3, 4.5 ;
-- Create and view XML data array
DECLARE @DataXML XML ;
SET @DataXML = (
SELECT -- FLOAT values are formatted in XML like "1.000000000000000e+000", increasing the character count
-- Converting them to VARCHAR first keeps the XML small without sacrificing precision
-- They are unpacked as FLOAT in the function either way
[@x] = CAST(x AS VARCHAR(20)),
[@y] = CAST(y AS VARCHAR(20))
FROM @DataTable
FOR XML PATH('n'), ROOT('r') ) ;
SELECT @DataXML ;
-- Get the results
SELECT * FROM dbo.LinearReqression (@DataXML) ;
在我的情况下,X轴也可能是日期列?那么,我如何用日期列计算相同的回归分析呢?简短的回答是:计算日期的趋势线与计算浮动的趋势线几乎相同 对于日期,您可以选择一些开始日期,并使用开始日期和日期之间的天数作为
X
我没有检查你的函数本身,我假设那里的公式是正确的
另外,我不明白为什么要从表中生成XML并将其解析回函数中的表中。这是相当低效的。你只要把桌子递过去就行了
我使用您的函数创建了两个变量:用于处理浮动和处理日期。
我正在使用SQLServer2008进行此示例
首先创建一个用户定义的表类型,这样我们就可以将一个表传递到函数中:
CREATE TYPE [dbo].[FloatRegressionDataTableType] AS TABLE(
[x] [float] NOT NULL,
[y] [float] NOT NULL
)
GO
然后创建接受该表的函数:
CREATE FUNCTION [dbo].[LinearRegressionFloat] (@ParamData dbo.FloatRegressionDataTableType READONLY)
RETURNS TABLE AS RETURN (
WITH Array AS (
SELECT x,
y
FROM @ParamData
),
Medians AS (
SELECT xbar = AVG(x), ybar = AVG(y)
FROM Array ),
BetaCalc AS (
SELECT Beta = SUM(xdelta * (y - ybar)) / NULLIF(SUM(xdelta * xdelta), 0)
FROM Array
CROSS JOIN Medians
CROSS APPLY ( SELECT xdelta = (x - xbar) ) xd ),
AlphaCalc AS (
SELECT Alpha = ybar - xbar * beta
FROM Medians
CROSS JOIN BetaCalc),
SSCalc AS (
SELECT SS_tot = SUM((y - ybar) * (y - ybar)),
SS_err = SUM((y - (Alpha + Beta * x)) * (y - (Alpha + Beta * x)))
FROM Array
CROSS JOIN Medians
CROSS JOIN AlphaCalc
CROSS JOIN BetaCalc )
SELECT r_squared = CASE WHEN SS_tot = 0 THEN 1.0
ELSE 1.0 - ( SS_err / SS_tot ) END,
Alpha, Beta
FROM AlphaCalc
CROSS JOIN BetaCalc
CROSS JOIN SSCalc
)
GO
同样,为带有日期的表创建一个类型:
CREATE TYPE [dbo].[DateRegressionDataTableType] AS TABLE(
[x] [date] NOT NULL,
[y] [float] NOT NULL
)
GO
并创建一个接受此类表的函数。对于每个给定日期,它使用DATEDIFF
计算2001-01-01
和给定日期x
之间的天数,然后将结果强制转换为float,以确保其余计算是正确的。您可以尝试删除要浮动的强制转换,您将看到不同的结果。您可以选择任何其他开始日期,它不必是2001-01-01
CREATE FUNCTION [dbo].[LinearRegressionDate] (@ParamData dbo.DateRegressionDataTableType READONLY)
RETURNS TABLE AS RETURN (
WITH Array AS (
SELECT CAST(DATEDIFF(day, '2001-01-01', x) AS float) AS x,
y
FROM @ParamData
),
Medians AS (
SELECT xbar = AVG(x), ybar = AVG(y)
FROM Array ),
BetaCalc AS (
SELECT Beta = SUM(xdelta * (y - ybar)) / NULLIF(SUM(xdelta * xdelta), 0)
FROM Array
CROSS JOIN Medians
CROSS APPLY ( SELECT xdelta = (x - xbar) ) xd ),
AlphaCalc AS (
SELECT Alpha = ybar - xbar * beta
FROM Medians
CROSS JOIN BetaCalc),
SSCalc AS (
SELECT SS_tot = SUM((y - ybar) * (y - ybar)),
SS_err = SUM((y - (Alpha + Beta * x)) * (y - (Alpha + Beta * x)))
FROM Array
CROSS JOIN Medians
CROSS JOIN AlphaCalc
CROSS JOIN BetaCalc )
SELECT r_squared = CASE WHEN SS_tot = 0 THEN 1.0
ELSE 1.0 - ( SS_err / SS_tot ) END,
Alpha, Beta
FROM AlphaCalc
CROSS JOIN BetaCalc
CROSS JOIN SSCalc
)
GO
以下是测试功能的方法:
-- test float data
DECLARE @FloatDataTable [dbo].[FloatRegressionDataTableType];
INSERT INTO @FloatDataTable (x, y)
VALUES
(1.2, 1.0)
,(1.6, 1)
,(2.0, 1.5)
,(2.0, 1.75)
,(2.1, 1.85)
,(2.1, 2)
,(2.2, 3)
,(2.2, 3)
,(2.3, 3.5)
,(2.4, 4)
,(2.5, 4)
,(3, 4.5);
SELECT * FROM dbo.LinearRegressionFloat(@FloatDataTable);
-- test date data
DECLARE @DateDataTable [dbo].[DateRegressionDataTableType];
INSERT INTO @DateDataTable (x, y)
VALUES
('2001-01-13', 1.0)
,('2001-01-17', 1)
,('2001-01-21', 1.5)
,('2001-01-21', 1.75)
,('2001-01-22', 1.85)
,('2001-01-22', 2)
,('2001-01-23', 3)
,('2001-01-23', 3)
,('2001-01-24', 3.5)
,('2001-01-25', 4)
,('2001-01-26', 4)
,('2001-01-31', 4.5);
SELECT * FROM dbo.LinearRegressionDate(@DateDataTable);
以下是两个结果集:
r_squared Alpha Beta
----------------------------------------------------------
0.798224907472009 -2.66524390243902 2.46417682926829
r_squared Alpha Beta
----------------------------------------------------------
0.79822490747201 -2.66524390243902 0.246417682926829
日期可以转换为float(从1970年1月1日起的小数天)或bigint(从您选择的任何时间点算起的秒数)