NumPy中的加权标准差?
numpy.average()
有一个权重选项,但是numpy.std()
不是。 有没有人有解决方法的build议?
下面简短的“手动计算”怎么样?
def weighted_avg_and_std(values, weights): """ Return the weighted average and standard deviation. values, weights -- Numpy ndarrays with the same shape. """ average = numpy.average(values, weights=weights) variance = numpy.average((values-average)**2, weights=weights) # Fast and numerically precise return (average, math.sqrt(variance))
statsmodels
有一些可以计算加权统计的function: statsmodels.stats.weightstats.DescrStatsW
:
from statsmodels.stats.weightstats import DescrStatsW array = np.array([1,2,1,2,1,2,1,3]) weights = np.ones_like(array) weights[3] = 100 weighted_stats = DescrStatsW(array, weights=weights, ddof=0) weighted_stats.mean # weighted mean of data (equivalent to np.average(array, weights=weights)) # 1.97196261682243 weighted_stats.std # standard deviation with default degrees of freedom correction # 0.21434289609681711 weighted_stats.std_mean # standard deviation of weighted mean # 0.020818822467555047 weighted_stats.var # variance with default degrees of freedom correction # 0.045942877107170932
而这个类的好处是,如果你想计算不同的统计属性,随后的调用会非常快,因为已经计算的(甚至中间的)结果被保存。
在numpy / scipy中似乎还没有这样的function,但有一张票提出了这个附加function。 包括那里你会发现Statistics.py实施加权标准差。
有一个很好的例子:
import pandas as pd import numpy as np # X is the dataset, as a Pandas' DataFrame mean = mean = np.ma.average(X, axis=0, weights=weights) # Computing the weighted sample mean (fast, efficient and precise) mean = pd.Series(mean, index=list(X.keys())) # Convert to a Pandas' Series (it's just aesthetic and more ergonomic, no differenc in computed values) xm = X-mean # xm = X diff to mean xm = xm.fillna(0) # fill NaN with 0 (because anyway a variance of 0 is just void, but at least it keeps the other covariance's values computed correctly)) sigma2 = 1./(w.sum()-1) * xm.mul(w, axis=0).T.dot(xm); # Compute the unbiased weighted sample covariance
加权无偏样本协方差的正确方程,URL(版本:2016-06-28)