如何确定一组值的标准差(stddev)?
我需要知道一个数字与一组数字相比是否超出了1个stddev的均值,等等。
虽然平方和algorithm在大多数情况下都能正常工作,但如果处理的数量很大,则可能会造成很大的麻烦。 你基本上可能会有负面的变化…
另外,永远不要永远不要计算a 2(a,2),a * a几乎肯定会更快。
到目前为止,计算标准偏差的最好方法是Welford的方法 。 我的C很生锈,但可能看起来像这样:
public static double StandardDeviation(List<double> valueList) { double M = 0.0; double S = 0.0; int k = 1; foreach (double value in valueList) { double tmpM = M; M += (value - tmpM) / k; S += (value - tmpM) * (value - M); k++; } return Math.Sqrt(S / (k-2)); }
如果您拥有全部人口(而不是样本人口),则使用return Math.Sqrt(S / (k-1));
。
编辑:我已经更新了代码根据杰森的言论…
编辑:我也更新了代码根据亚历克斯的言论…
比Jaime的解决scheme快10倍 ,但要注意的是,正如Jaime指出的那样:
“虽然平方和algorithm在大多数情况下都能正常工作,但是如果处理的数量很大,则可能会造成很大的麻烦,基本上可能会产生一个负方差”
如果你认为你处理的是非常大的数字或数量非常大的数字,那么你应该使用两种方法来计算,如果结果相同,你肯定知道你可以使用“我的”方法。
public static double StandardDeviation(double[] data) { double stdDev = 0; double sumAll = 0; double sumAllQ = 0; //Sum of x and sum of x² for (int i = 0; i < data.Length; i++) { double x = data[i]; sumAll += x; sumAllQ += x * x; } //Mean (not used here) //double mean = 0; //mean = sumAll / (double)data.Length; //Standard deviation stdDev = System.Math.Sqrt( (sumAllQ - (sumAll * sumAll) / data.Length) * (1.0d / (data.Length - 1)) ); return stdDev; }
Jaime接受的答案很好,除了你需要在最后一行中除以k-2(你需要除以“number_of_elements-1”)。 更好的是,从0开始k:
public static double StandardDeviation(List<double> valueList) { double M = 0.0; double S = 0.0; int k = 0; foreach (double value in valueList) { k++; double tmpM = M; M += (value - tmpM) / k; S += (value - tmpM) * (value - M); } return Math.Sqrt(S / (k-1)); }
代码片段:
public static double StandardDeviation(List<double> valueList) { if (valueList.Count < 2) return 0.0; double sumOfSquares = 0.0; double average = valueList.Average(); //.NET 3.0 foreach (double value in valueList) { sumOfSquares += Math.Pow((value - average), 2); } return Math.Sqrt(sumOfSquares / (valueList.Count - 1)); }
您可以通过累积均值和均方来避免对数据进行两次传递
cnt = 0 mean = 0 meansqr = 0 loop over array cnt++ mean += value meansqr += value*value mean /= cnt meansqr /= cnt
并形成
sigma = sqrt(meansqr - mean^2)
cnt/(cnt-1)
系数通常也是适合的。
顺便说一句 – 第一次通过黛米和McWafflestix答案的数据隐藏在Average
调用。 这样的事情在小列表上肯定是微不足道的,但是如果列表超出了caching的大小,甚至是工作集的大小,这就成为一个竞标交易。
我发现Rob的有用答案与我使用excel所看到的不太一致。 为了匹配excel,我将valueList的Average作为StandardDeviation计算的一部分。
这是我的两分钱…显然你可以从函数里面的valueList计算移动平均数(ma) – 但是我碰巧已经在需要standardDeviation之前。
public double StandardDeviation(List<double> valueList, double ma) { double xMinusMovAvg = 0.0; double Sigma = 0.0; int k = valueList.Count; foreach (double value in valueList){ xMinusMovAvg = value - ma; Sigma = Sigma + (xMinusMovAvg * xMinusMovAvg); } return Math.Sqrt(Sigma / (k - 1)); }
使用扩展方法。
using System; using System.Collections.Generic; namespace SampleApp { internal class Program { private static void Main() { List<double> data = new List<double> {1, 2, 3, 4, 5, 6}; double mean = data.Mean(); double variance = data.Variance(); double sd = data.StandardDeviation(); Console.WriteLine("Mean: {0}, Variance: {1}, SD: {2}", mean, variance, sd); Console.WriteLine("Press any key to continue..."); Console.ReadKey(); } } public static class MyListExtensions { public static double Mean(this List<double> values) { return values.Count == 0 ? 0 : values.Mean(0, values.Count); } public static double Mean(this List<double> values, int start, int end) { double s = 0; for (int i = start; i < end; i++) { s += values[i]; } return s / (end - start); } public static double Variance(this List<double> values) { return values.Variance(values.Mean(), 0, values.Count); } public static double Variance(this List<double> values, double mean) { return values.Variance(mean, 0, values.Count); } public static double Variance(this List<double> values, double mean, int start, int end) { double variance = 0; for (int i = start; i < end; i++) { variance += Math.Pow((values[i] - mean), 2); } int n = end - start; if (start > 0) n -= 1; return variance / (n); } public static double StandardDeviation(this List<double> values) { return values.Count == 0 ? 0 : values.StandardDeviation(0, values.Count); } public static double StandardDeviation(this List<double> values, int start, int end) { double mean = values.Mean(start, end); double variance = values.Variance(mean, start, end); return Math.Sqrt(variance); } } }
Math.NET库为你提供了这个框。
PM>安装包MathNet.Numerics
var populationStdDev = new List<double>(1d, 2d, 3d, 4d, 5d).PopulationStandardDeviation(); var sampleStdDev = new List<double>(2d, 3d, 4d).StandardDeviation();
有关更多信息,请参阅http://numerics.mathdotnet.com/docs/DescriptiveStatistics.html 。
/// <summary> /// Calculates standard deviation, same as MATLAB std(X,0) function /// <seealso cref="http://www.mathworks.co.uk/help/techdoc/ref/std.html"/> /// </summary> /// <param name="values">enumumerable data</param> /// <returns>Standard deviation</returns> public static double GetStandardDeviation(this IEnumerable<double> values) { //validation if (values == null) throw new ArgumentNullException(); int lenght = values.Count(); //saves from devision by 0 if (lenght == 0 || lenght == 1) return 0; double sum = 0.0, sum2 = 0.0; for (int i = 0; i < lenght; i++) { double item = values.ElementAt(i); sum += item; sum2 += item * item; } return Math.Sqrt((sum2 - sum * sum / lenght) / (lenght - 1)); }
所有其他答案的麻烦是,他们假设你有你的数据在一个大阵。 如果您的数据即时传入,这将是一个更好的方法。 无论您如何存储数据,该类都可以工作。 它也给你select华尔道夫方法或和平方法。 这两种方法都使用一次通过。
public final class StatMeasure { private StatMeasure() {} public interface Stats1D { /** Add a value to the population */ void addValue(double value); /** Get the mean of all the added values */ double getMean(); /** Get the standard deviation from a sample of the population. */ double getStDevSample(); /** Gets the standard deviation for the entire population. */ double getStDevPopulation(); } private static class WaldorfPopulation implements Stats1D { private double mean = 0.0; private double sSum = 0.0; private int count = 0; @Override public void addValue(double value) { double tmpMean = mean; double delta = value - tmpMean; mean += delta / ++count; sSum += delta * (value - mean); } @Override public double getMean() { return mean; } @Override public double getStDevSample() { return Math.sqrt(sSum / (count - 1)); } @Override public double getStDevPopulation() { return Math.sqrt(sSum / (count)); } } private static class StandardPopulation implements Stats1D { private double sum = 0.0; private double sumOfSquares = 0.0; private int count = 0; @Override public void addValue(double value) { sum += value; sumOfSquares += value * value; count++; } @Override public double getMean() { return sum / count; } @Override public double getStDevSample() { return (float) Math.sqrt((sumOfSquares - ((sum * sum) / count)) / (count - 1)); } @Override public double getStDevPopulation() { return (float) Math.sqrt((sumOfSquares - ((sum * sum) / count)) / count); } } /** * Returns a way to measure a population of data using Waldorf's method. * This method is better if your population or values are so large that * the sum of x-squared may overflow. It's also probably faster if you * need to recalculate the mean and standard deviation continuously, * for example, if you are continually updating a graphic of the data as * it flows in. * * @return A Stats1D object that uses Waldorf's method. */ public static Stats1D getWaldorfStats() { return new WaldorfPopulation(); } /** * Return a way to measure the population of data using the sum-of-squares * method. This is probably faster than Waldorf's method, but runs the * risk of data overflow. * * @return A Stats1D object that uses the sum-of-squares method */ public static Stats1D getSumOfSquaresStats() { return new StandardPopulation(); } }