Database 如果我有一个大的坐标列表，如何提取对应于特定x值的y值？_Database_Wolfram Mathematica_Extract_Standard Deviation

Database 如果我有一个大的坐标列表，如何提取对应于特定x值的y值？

database wolfram-mathematica

Database 如果我有一个大的坐标列表，如何提取对应于特定x值的y值？,database,wolfram-mathematica,extract,standard-deviation,Database,Wolfram Mathematica,Extract,Standard Deviation,我有三个数据集编译成一个大数据集 Data1的x值范围为0-47（有序），许多y值（一个小错误）附加到x值。总共约有100000个y值。数据2和3相似，但x值分别为48-80和80-95 最终目标是根据大量y值为每个x值（因此总共96个）生成标准偏差。因此，我认为我应该首先从这些数据集中提取每个x值的y值，然后根据范数确定标准偏差在mathematica中，我尝试过使用select和part函数，但没有效果。看看您是否可以调整此功能 exampledata={{1,1},{1,2},{1,4

我有三个数据集编译成一个大数据集

Data1的x值范围为0-47（有序），许多y值（一个小错误）附加到x值。总共约有100000个y值。
数据2和3相似，但x值分别为48-80和80-95

最终目标是根据大量y值为每个x值（因此总共96个）生成标准偏差。因此，我认为我应该首先从这些数据集中提取每个x值的y值，然后根据范数确定标准偏差

在mathematica中，我尝试过使用select和part函数，但没有效果。

看看您是否可以调整此功能

exampledata={{1,1},{1,2},{1,4},{2,1},{2,2},{2,2},{3,4},{3,5},{3,12}};
(*first a manual calculation to see what the answer should be*)
{StandardDeviation[{1,2,4}],StandardDeviation[{1,2,2}],StandardDeviation[{4,5,12}]}
(*and now automate the calculation*)
(*if your x values are not exact this will need to be changed*)
x=Union[Map[First,exampledata]];
y[x_]:=Map[Last,Cases[exampledata,{x,_}]];
std=Map[StandardDeviation[y[#]]&,x]

(*{Sqrt[7/3], 1/Sqrt[3], Sqrt[19]}*)

(*{Sqrt[7/3], 1/Sqrt[3], Sqrt[19]}*)

既然你有100000双，这可能会加快速度。你说过你的数据是按x排序的，所以我不会在这里排序。如果您的数据没有排序，这将产生不正确的结果

exampledata={{1,1},{1,2},{1,4},{2,1},{2,2},{2,2},{3,4},{3,5},{3,12}};
y[x_]:=Map[Last,x];
std=Map[StandardDeviation[y[#]]&, SplitBy[exampledata,First]]

这将给出完全相同的结果，通过数据的次数更少。您可以比较这两种方法的计时，并验证它们是否产生完全相同的结果

仔细阅读，我不能绝对肯定我是否正确理解了您的口头描述，即数据结构的形式。我以为你有一长串{x，y}点，有很多重复的x值。如果它看起来像是我误解了，您可能会包含一个包含一些样本数据的Mathematica代码的小示例位，那么我会编辑我的代码以进行匹配。

从统计学上讲，最好提供一个预测值为y的预测区间

这里有一段视频：-

使用一些示例数据进行说明，这些数据存储在此处作为二维码

背景

39.8094

26.4425-0.00702613 a+0.0054873 a^2

0.886419

对于x0=50，y0=39.8094，95%预测区间为39.8094±12.1118

满足您的要求：

最终目标是根据大量y值为每个x值（因此总共96个）生成标准偏差

最好的测量方法可能是标准误差，可以通过

lm[“SinglePredictionConfidenceIntervalTable”]

和

lm[“SinglePredictionErrors”]

它们将提供“单一观测结果预测响应的标准误差”。如果一个x有多个y值，那么每个x值仍然只有一个标准错误

参考：（详细信息和选项）

您应该展示您尝试过的内容。

qrimage = Import["https://i.stack.imgur.com/s7Ul7.png"];

data = Uncompress@BarcodeRecognize@qrimage;

ListPlot[data, Frame -> True, Axes -> None]

cl = Map[Function[σ, 2 (CDF[NormalDistribution[0, 1], σ] - 0.5)], {1, 2}];

(* trying a quadratic linear fit *)
lm = LinearModelFit[data, {1, a, a^2}, a];
bands = lm["SinglePredictionBands", ConfidenceLevel -> #] & /@ cl;

(* x value for an observation outside of the sample observations *)
x0 = 50;

(* Predicted value of y *)
y0 = lm[x0]

(* Least-squares regression of Y on X *)
Normal[lm]

(* Confidence interval for y0 given x0 *)
b1 = bands /. a -> x0;

(* R^2 goodness of fit *)
lm["RSquared"]

b2 = {bands, {Normal[lm]}};

(* Prediction intervals plotted over the data range *)
Show[
 Plot[b2, {a, 0, 100}, PlotRange -> {{0, 100}, Automatic}, Filling -> {1 -> {2}}],
 ListPlot[data],
 ListPlot[{{x0, lm[x0]}}, PlotStyle -> Red],
 Graphics[{Red, Line[{{x0, Min[b1]}, {x0, Max[b1]}}]}],
 Frame -> True, Axes -> None]

Row[{"For x0 = ", x0, ", y0 = ", y0,
  " with 95% prediction interval ", y0, " ± ", y0 - Min[b1]}]