| @cindex statistics |
| @cindex mean |
| @cindex standard deviation |
| @cindex variance |
| @cindex estimated standard deviation |
| @cindex estimated variance |
| @cindex t-test |
| @cindex range |
| @cindex min |
| @cindex max |
| |
| This chapter describes the statistical functions in the library. The |
| basic statistical functions include routines to compute the mean, |
| variance and standard deviation. More advanced functions allow you to |
| calculate absolute deviations, skewness, and kurtosis as well as the |
| median and arbitrary percentiles. The algorithms use recurrence |
| relations to compute average quantities in a stable way, without large |
| intermediate values that might overflow. |
| |
| The functions are available in versions for datasets in the standard |
| floating-point and integer types. The versions for double precision |
| floating-point data have the prefix @code{gsl_stats} and are declared in |
| the header file @file{gsl_statistics_double.h}. The versions for integer |
| data have the prefix @code{gsl_stats_int} and are declared in the header |
| file @file{gsl_statistics_int.h}. |
| |
| @menu |
| * Mean and standard deviation and variance:: |
| * Absolute deviation:: |
| * Higher moments (skewness and kurtosis):: |
| * Autocorrelation:: |
| * Covariance:: |
| * Weighted Samples:: |
| * Maximum and Minimum values:: |
| * Median and Percentiles:: |
| * Example statistical programs:: |
| * Statistics References and Further Reading:: |
| @end menu |
| |
| @node Mean and standard deviation and variance |
| @section Mean, Standard Deviation and Variance |
| |
| @deftypefun double gsl_stats_mean (const double @var{data}[], size_t @var{stride}, size_t @var{n}) |
| This function returns the arithmetic mean of @var{data}, a dataset of |
| length @var{n} with stride @var{stride}. The arithmetic mean, or |
| @dfn{sample mean}, is denoted by @math{\Hat\mu} and defined as, |
| @tex |
| \beforedisplay |
| $$ |
| {\Hat\mu} = {1 \over N} \sum x_i |
| $$ |
| \afterdisplay |
| @end tex |
| @ifinfo |
| |
| @example |
| \Hat\mu = (1/N) \sum x_i |
| @end example |
| |
| @end ifinfo |
| @noindent |
| where @math{x_i} are the elements of the dataset @var{data}. For |
| samples drawn from a gaussian distribution the variance of |
| @math{\Hat\mu} is @math{\sigma^2 / N}. |
| @end deftypefun |
| |
| @deftypefun double gsl_stats_variance (const double @var{data}[], size_t @var{stride}, size_t @var{n}) |
| This function returns the estimated, or @dfn{sample}, variance of |
| @var{data}, a dataset of length @var{n} with stride @var{stride}. The |
| estimated variance is denoted by @math{\Hat\sigma^2} and is defined by, |
| @tex |
| \beforedisplay |
| $$ |
| {\Hat\sigma}^2 = {1 \over (N-1)} \sum (x_i - {\Hat\mu})^2 |
| $$ |
| \afterdisplay |
| @end tex |
| @ifinfo |
| |
| @example |
| \Hat\sigma^2 = (1/(N-1)) \sum (x_i - \Hat\mu)^2 |
| @end example |
| |
| @end ifinfo |
| @noindent |
| where @math{x_i} are the elements of the dataset @var{data}. Note that |
| the normalization factor of @math{1/(N-1)} results from the derivation |
| of @math{\Hat\sigma^2} as an unbiased estimator of the population |
| variance @math{\sigma^2}. For samples drawn from a gaussian distribution |
| the variance of @math{\Hat\sigma^2} itself is @math{2 \sigma^4 / N}. |
| |
| This function computes the mean via a call to @code{gsl_stats_mean}. If |
| you have already computed the mean then you can pass it directly to |
| @code{gsl_stats_variance_m}. |
| @end deftypefun |
| |
| @deftypefun double gsl_stats_variance_m (const double @var{data}[], size_t @var{stride}, size_t @var{n}, double @var{mean}) |
| This function returns the sample variance of @var{data} relative to the |
| given value of @var{mean}. The function is computed with @math{\Hat\mu} |
| replaced by the value of @var{mean} that you supply, |
| @tex |
| \beforedisplay |
| $$ |
| {\Hat\sigma}^2 = {1 \over (N-1)} \sum (x_i - mean)^2 |
| $$ |
| \afterdisplay |
| @end tex |
| @ifinfo |
| |
| @example |
| \Hat\sigma^2 = (1/(N-1)) \sum (x_i - mean)^2 |
| @end example |
| @end ifinfo |
| @end deftypefun |
| |
| @deftypefun double gsl_stats_sd (const double @var{data}[], size_t @var{stride}, size_t @var{n}) |
| @deftypefunx double gsl_stats_sd_m (const double @var{data}[], size_t @var{stride}, size_t @var{n}, double @var{mean}) |
| The standard deviation is defined as the square root of the variance. |
| These functions return the square root of the corresponding variance |
| functions above. |
| @end deftypefun |
| |
| @deftypefun double gsl_stats_variance_with_fixed_mean (const double @var{data}[], size_t @var{stride}, size_t @var{n}, double @var{mean}) |
| This function computes an unbiased estimate of the variance of |
| @var{data} when the population mean @var{mean} of the underlying |
| distribution is known @emph{a priori}. In this case the estimator for |
| the variance uses the factor @math{1/N} and the sample mean |
| @math{\Hat\mu} is replaced by the known population mean @math{\mu}, |
| @tex |
| \beforedisplay |
| $$ |
| {\Hat\sigma}^2 = {1 \over N} \sum (x_i - \mu)^2 |
| $$ |
| \afterdisplay |
| @end tex |
| @ifinfo |
| |
| @example |
| \Hat\sigma^2 = (1/N) \sum (x_i - \mu)^2 |
| @end example |
| @end ifinfo |
| @end deftypefun |
| |
| |
| @deftypefun double gsl_stats_sd_with_fixed_mean (const double @var{data}[], size_t @var{stride}, size_t @var{n}, double @var{mean}) |
| This function calculates the standard deviation of @var{data} for a |
| fixed population mean @var{mean}. The result is the square root of the |
| corresponding variance function. |
| @end deftypefun |
| |
| @node Absolute deviation |
| @section Absolute deviation |
| |
| @deftypefun double gsl_stats_absdev (const double @var{data}[], size_t @var{stride}, size_t @var{n}) |
| This function computes the absolute deviation from the mean of |
| @var{data}, a dataset of length @var{n} with stride @var{stride}. The |
| absolute deviation from the mean is defined as, |
| @tex |
| \beforedisplay |
| $$ |
| absdev = {1 \over N} \sum |x_i - {\Hat\mu}| |
| $$ |
| \afterdisplay |
| @end tex |
| @ifinfo |
| |
| @example |
| absdev = (1/N) \sum |x_i - \Hat\mu| |
| @end example |
| |
| @end ifinfo |
| @noindent |
| where @math{x_i} are the elements of the dataset @var{data}. The |
| absolute deviation from the mean provides a more robust measure of the |
| width of a distribution than the variance. This function computes the |
| mean of @var{data} via a call to @code{gsl_stats_mean}. |
| @end deftypefun |
| |
| @deftypefun double gsl_stats_absdev_m (const double @var{data}[], size_t @var{stride}, size_t @var{n}, double @var{mean}) |
| This function computes the absolute deviation of the dataset @var{data} |
| relative to the given value of @var{mean}, |
| @tex |
| \beforedisplay |
| $$ |
| absdev = {1 \over N} \sum |x_i - mean| |
| $$ |
| \afterdisplay |
| @end tex |
| @ifinfo |
| |
| @example |
| absdev = (1/N) \sum |x_i - mean| |
| @end example |
| |
| @end ifinfo |
| @noindent |
| This function is useful if you have already computed the mean of |
| @var{data} (and want to avoid recomputing it), or wish to calculate the |
| absolute deviation relative to another value (such as zero, or the |
| median). |
| @end deftypefun |
| |
| @node Higher moments (skewness and kurtosis) |
| @section Higher moments (skewness and kurtosis) |
| |
| @deftypefun double gsl_stats_skew (const double @var{data}[], size_t @var{stride}, size_t @var{n}) |
| This function computes the skewness of @var{data}, a dataset of length |
| @var{n} with stride @var{stride}. The skewness is defined as, |
| @tex |
| \beforedisplay |
| $$ |
| skew = {1 \over N} \sum |
| {\left( x_i - {\Hat\mu} \over {\Hat\sigma} \right)}^3 |
| $$ |
| \afterdisplay |
| @end tex |
| @ifinfo |
| |
| @example |
| skew = (1/N) \sum ((x_i - \Hat\mu)/\Hat\sigma)^3 |
| @end example |
| |
| @end ifinfo |
| @noindent |
| where @math{x_i} are the elements of the dataset @var{data}. The skewness |
| measures the asymmetry of the tails of a distribution. |
| |
| The function computes the mean and estimated standard deviation of |
| @var{data} via calls to @code{gsl_stats_mean} and @code{gsl_stats_sd}. |
| @end deftypefun |
| |
| @deftypefun double gsl_stats_skew_m_sd (const double @var{data}[], size_t @var{stride}, size_t @var{n}, double @var{mean}, double @var{sd}) |
| This function computes the skewness of the dataset @var{data} using the |
| given values of the mean @var{mean} and standard deviation @var{sd}, |
| @tex |
| \beforedisplay |
| $$ |
| skew = {1 \over N} |
| \sum {\left( x_i - mean \over sd \right)}^3 |
| $$ |
| \afterdisplay |
| @end tex |
| @ifinfo |
| |
| @example |
| skew = (1/N) \sum ((x_i - mean)/sd)^3 |
| @end example |
| |
| @end ifinfo |
| @noindent |
| These functions are useful if you have already computed the mean and |
| standard deviation of @var{data} and want to avoid recomputing them. |
| @end deftypefun |
| |
| @deftypefun double gsl_stats_kurtosis (const double @var{data}[], size_t @var{stride}, size_t @var{n}) |
| This function computes the kurtosis of @var{data}, a dataset of length |
| @var{n} with stride @var{stride}. The kurtosis is defined as, |
| @tex |
| \beforedisplay |
| $$ |
| kurtosis = \left( {1 \over N} \sum |
| {\left(x_i - {\Hat\mu} \over {\Hat\sigma} \right)}^4 |
| \right) |
| - 3 |
| $$ |
| \afterdisplay |
| @end tex |
| @ifinfo |
| |
| @example |
| kurtosis = ((1/N) \sum ((x_i - \Hat\mu)/\Hat\sigma)^4) - 3 |
| @end example |
| |
| @end ifinfo |
| @noindent |
| The kurtosis measures how sharply peaked a distribution is, relative to |
| its width. The kurtosis is normalized to zero for a gaussian |
| distribution. |
| @end deftypefun |
| |
| @deftypefun double gsl_stats_kurtosis_m_sd (const double @var{data}[], size_t @var{stride}, size_t @var{n}, double @var{mean}, double @var{sd}) |
| This function computes the kurtosis of the dataset @var{data} using the |
| given values of the mean @var{mean} and standard deviation @var{sd}, |
| @tex |
| \beforedisplay |
| $$ |
| kurtosis = {1 \over N} |
| \left( \sum {\left(x_i - mean \over sd \right)}^4 \right) |
| - 3 |
| $$ |
| \afterdisplay |
| @end tex |
| @ifinfo |
| |
| @example |
| kurtosis = ((1/N) \sum ((x_i - mean)/sd)^4) - 3 |
| @end example |
| |
| @end ifinfo |
| @noindent |
| This function is useful if you have already computed the mean and |
| standard deviation of @var{data} and want to avoid recomputing them. |
| @end deftypefun |
| |
| @node Autocorrelation |
| @section Autocorrelation |
| |
| @deftypefun double gsl_stats_lag1_autocorrelation (const double @var{data}[], const size_t @var{stride}, const size_t @var{n}) |
| This function computes the lag-1 autocorrelation of the dataset @var{data}. |
| @tex |
| \beforedisplay |
| $$ |
| a_1 = {\sum_{i = 1}^{n} (x_{i} - \Hat\mu) (x_{i-1} - \Hat\mu) |
| \over |
| \sum_{i = 1}^{n} (x_{i} - \Hat\mu) (x_{i} - \Hat\mu)} |
| $$ |
| \afterdisplay |
| @end tex |
| @ifinfo |
| |
| @example |
| a_1 = @{\sum_@{i = 1@}^@{n@} (x_@{i@} - \Hat\mu) (x_@{i-1@} - \Hat\mu) |
| \over |
| \sum_@{i = 1@}^@{n@} (x_@{i@} - \Hat\mu) (x_@{i@} - \Hat\mu)@} |
| @end example |
| @end ifinfo |
| @end deftypefun |
| |
| |
| @deftypefun double gsl_stats_lag1_autocorrelation_m (const double @var{data}[], const size_t @var{stride}, const size_t @var{n}, const double @var{mean}) |
| This function computes the lag-1 autocorrelation of the dataset |
| @var{data} using the given value of the mean @var{mean}. |
| |
| @end deftypefun |
| |
| @node Covariance |
| @section Covariance |
| @cindex covariance, of two datasets |
| |
| @deftypefun double gsl_stats_covariance (const double @var{data1}[], const size_t @var{stride1}, const double @var{data2}[], const size_t @var{stride2}, const size_t @var{n}) |
| This function computes the covariance of the datasets @var{data1} and |
| @var{data2} which must both be of the same length @var{n}. |
| @tex |
| \beforedisplay |
| $$ |
| covar = {1 \over (n - 1)} \sum_{i = 1}^{n} (x_{i} - \Hat x) (y_{i} - \Hat y) |
| $$ |
| \afterdisplay |
| @end tex |
| @ifinfo |
| |
| @example |
| covar = (1/(n - 1)) \sum_@{i = 1@}^@{n@} (x_i - \Hat x) (y_i - \Hat y) |
| @end example |
| @end ifinfo |
| @end deftypefun |
| |
| @deftypefun double gsl_stats_covariance_m (const double @var{data1}[], const size_t @var{stride1}, const double @var{data2}[], const size_t @var{stride2}, const size_t @var{n}, const double @var{mean1}, const double @var{mean2}) |
| This function computes the covariance of the datasets @var{data1} and |
| @var{data2} using the given values of the means, @var{mean1} and |
| @var{mean2}. This is useful if you have already computed the means of |
| @var{data1} and @var{data2} and want to avoid recomputing them. |
| @end deftypefun |
| |
| |
| @node Weighted Samples |
| @section Weighted Samples |
| |
| The functions described in this section allow the computation of |
| statistics for weighted samples. The functions accept an array of |
| samples, @math{x_i}, with associated weights, @math{w_i}. Each sample |
| @math{x_i} is considered as having been drawn from a Gaussian |
| distribution with variance @math{\sigma_i^2}. The sample weight |
| @math{w_i} is defined as the reciprocal of this variance, @math{w_i = |
| 1/\sigma_i^2}. Setting a weight to zero corresponds to removing a |
| sample from a dataset. |
| |
| @deftypefun double gsl_stats_wmean (const double @var{w}[], size_t @var{wstride}, const double @var{data}[], size_t @var{stride}, size_t @var{n}) |
| This function returns the weighted mean of the dataset @var{data} with |
| stride @var{stride} and length @var{n}, using the set of weights @var{w} |
| with stride @var{wstride} and length @var{n}. The weighted mean is defined as, |
| @tex |
| \beforedisplay |
| $$ |
| {\Hat\mu} = {{\sum w_i x_i} \over {\sum w_i}} |
| $$ |
| \afterdisplay |
| @end tex |
| @ifinfo |
| |
| @example |
| \Hat\mu = (\sum w_i x_i) / (\sum w_i) |
| @end example |
| @end ifinfo |
| @end deftypefun |
| |
| |
| @deftypefun double gsl_stats_wvariance (const double @var{w}[], size_t @var{wstride}, const double @var{data}[], size_t @var{stride}, size_t @var{n}) |
| This function returns the estimated variance of the dataset @var{data} |
| with stride @var{stride} and length @var{n}, using the set of weights |
| @var{w} with stride @var{wstride} and length @var{n}. The estimated |
| variance of a weighted dataset is defined as, |
| @tex |
| \beforedisplay |
| $$ |
| \Hat\sigma^2 = {{\sum w_i} \over {(\sum w_i)^2 - \sum (w_i^2)}} |
| \sum w_i (x_i - \Hat\mu)^2 |
| $$ |
| \afterdisplay |
| @end tex |
| @ifinfo |
| |
| @example |
| \Hat\sigma^2 = ((\sum w_i)/((\sum w_i)^2 - \sum (w_i^2))) |
| \sum w_i (x_i - \Hat\mu)^2 |
| @end example |
| |
| @end ifinfo |
| @noindent |
| Note that this expression reduces to an unweighted variance with the |
| familiar @math{1/(N-1)} factor when there are @math{N} equal non-zero |
| weights. |
| @end deftypefun |
| |
| @deftypefun double gsl_stats_wvariance_m (const double @var{w}[], size_t @var{wstride}, const double @var{data}[], size_t @var{stride}, size_t @var{n}, double @var{wmean}) |
| This function returns the estimated variance of the weighted dataset |
| @var{data} using the given weighted mean @var{wmean}. |
| @end deftypefun |
| |
| @deftypefun double gsl_stats_wsd (const double @var{w}[], size_t @var{wstride}, const double @var{data}[], size_t @var{stride}, size_t @var{n}) |
| The standard deviation is defined as the square root of the variance. |
| This function returns the square root of the corresponding variance |
| function @code{gsl_stats_wvariance} above. |
| @end deftypefun |
| |
| @deftypefun double gsl_stats_wsd_m (const double @var{w}[], size_t @var{wstride}, const double @var{data}[], size_t @var{stride}, size_t @var{n}, double @var{wmean}) |
| This function returns the square root of the corresponding variance |
| function @code{gsl_stats_wvariance_m} above. |
| @end deftypefun |
| |
| @deftypefun double gsl_stats_wvariance_with_fixed_mean (const double @var{w}[], size_t @var{wstride}, const double @var{data}[], size_t @var{stride}, size_t @var{n}, const double @var{mean}) |
| This function computes an unbiased estimate of the variance of weighted |
| dataset @var{data} when the population mean @var{mean} of the underlying |
| distribution is known @emph{a priori}. In this case the estimator for |
| the variance replaces the sample mean @math{\Hat\mu} by the known |
| population mean @math{\mu}, |
| @tex |
| \beforedisplay |
| $$ |
| \Hat\sigma^2 = {{\sum w_i (x_i - \mu)^2} \over {\sum w_i}} |
| $$ |
| \afterdisplay |
| @end tex |
| @ifinfo |
| |
| @example |
| \Hat\sigma^2 = (\sum w_i (x_i - \mu)^2) / (\sum w_i) |
| @end example |
| @end ifinfo |
| @end deftypefun |
| |
| @deftypefun double gsl_stats_wsd_with_fixed_mean (const double @var{w}[], size_t @var{wstride}, const double @var{data}[], size_t @var{stride}, size_t @var{n}, const double @var{mean}) |
| The standard deviation is defined as the square root of the variance. |
| This function returns the square root of the corresponding variance |
| function above. |
| @end deftypefun |
| |
| @deftypefun double gsl_stats_wabsdev (const double @var{w}[], size_t @var{wstride}, const double @var{data}[], size_t @var{stride}, size_t @var{n}) |
| This function computes the weighted absolute deviation from the weighted |
| mean of @var{data}. The absolute deviation from the mean is defined as, |
| @tex |
| \beforedisplay |
| $$ |
| absdev = {{\sum w_i |x_i - \Hat\mu|} \over {\sum w_i}} |
| $$ |
| \afterdisplay |
| @end tex |
| @ifinfo |
| |
| @example |
| absdev = (\sum w_i |x_i - \Hat\mu|) / (\sum w_i) |
| @end example |
| @end ifinfo |
| @end deftypefun |
| |
| @deftypefun double gsl_stats_wabsdev_m (const double @var{w}[], size_t @var{wstride}, const double @var{data}[], size_t @var{stride}, size_t @var{n}, double @var{wmean}) |
| This function computes the absolute deviation of the weighted dataset |
| @var{data} about the given weighted mean @var{wmean}. |
| @end deftypefun |
| |
| @deftypefun double gsl_stats_wskew (const double @var{w}[], size_t @var{wstride}, const double @var{data}[], size_t @var{stride}, size_t @var{n}) |
| This function computes the weighted skewness of the dataset @var{data}. |
| @tex |
| \beforedisplay |
| $$ |
| skew = {{\sum w_i ((x_i - xbar)/\sigma)^3} \over {\sum w_i}} |
| $$ |
| \afterdisplay |
| @end tex |
| @ifinfo |
| |
| @example |
| skew = (\sum w_i ((x_i - xbar)/\sigma)^3) / (\sum w_i) |
| @end example |
| @end ifinfo |
| @end deftypefun |
| |
| @deftypefun double gsl_stats_wskew_m_sd (const double @var{w}[], size_t @var{wstride}, const double @var{data}[], size_t @var{stride}, size_t @var{n}, double @var{wmean}, double @var{wsd}) |
| This function computes the weighted skewness of the dataset @var{data} |
| using the given values of the weighted mean and weighted standard |
| deviation, @var{wmean} and @var{wsd}. |
| @end deftypefun |
| |
| @deftypefun double gsl_stats_wkurtosis (const double @var{w}[], size_t @var{wstride}, const double @var{data}[], size_t @var{stride}, size_t @var{n}) |
| This function computes the weighted kurtosis of the dataset @var{data}. |
| |
| @tex |
| \beforedisplay |
| $$ |
| kurtosis = {{\sum w_i ((x_i - xbar)/sigma)^4} \over {\sum w_i}} - 3 |
| $$ |
| \afterdisplay |
| @end tex |
| @ifinfo |
| |
| @example |
| kurtosis = ((\sum w_i ((x_i - xbar)/sigma)^4) / (\sum w_i)) - 3 |
| @end example |
| @end ifinfo |
| @end deftypefun |
| |
| @deftypefun double gsl_stats_wkurtosis_m_sd (const double @var{w}[], size_t @var{wstride}, const double @var{data}[], size_t @var{stride}, size_t @var{n}, double @var{wmean}, double @var{wsd}) |
| This function computes the weighted kurtosis of the dataset @var{data} |
| using the given values of the weighted mean and weighted standard |
| deviation, @var{wmean} and @var{wsd}. |
| @end deftypefun |
| |
| @node Maximum and Minimum values |
| @section Maximum and Minimum values |
| |
| The following functions find the maximum and minimum values of a |
| dataset (or their indices). If the data contains @code{NaN}s then a |
| @code{NaN} will be returned, since the maximum or minimum value is |
| undefined. For functions which return an index, the location of the |
| first @code{NaN} in the array is returned. |
| |
| @deftypefun double gsl_stats_max (const double @var{data}[], size_t @var{stride}, size_t @var{n}) |
| This function returns the maximum value in @var{data}, a dataset of |
| length @var{n} with stride @var{stride}. The maximum value is defined |
| as the value of the element @math{x_i} which satisfies @c{$x_i \ge x_j$} |
| @math{x_i >= x_j} for all @math{j}. |
| |
| If you want instead to find the element with the largest absolute |
| magnitude you will need to apply @code{fabs} or @code{abs} to your data |
| before calling this function. |
| @end deftypefun |
| |
| @deftypefun double gsl_stats_min (const double @var{data}[], size_t @var{stride}, size_t @var{n}) |
| This function returns the minimum value in @var{data}, a dataset of |
| length @var{n} with stride @var{stride}. The minimum value is defined |
| as the value of the element @math{x_i} which satisfies @c{$x_i \le x_j$} |
| @math{x_i <= x_j} for all @math{j}. |
| |
| If you want instead to find the element with the smallest absolute |
| magnitude you will need to apply @code{fabs} or @code{abs} to your data |
| before calling this function. |
| @end deftypefun |
| |
| @deftypefun void gsl_stats_minmax (double * @var{min}, double * @var{max}, const double @var{data}[], size_t @var{stride}, size_t @var{n}) |
| This function finds both the minimum and maximum values @var{min}, |
| @var{max} in @var{data} in a single pass. |
| @end deftypefun |
| |
| @deftypefun size_t gsl_stats_max_index (const double @var{data}[], size_t @var{stride}, size_t @var{n}) |
| This function returns the index of the maximum value in @var{data}, a |
| dataset of length @var{n} with stride @var{stride}. The maximum value is |
| defined as the value of the element @math{x_i} which satisfies |
| @c{$x_i \ge x_j$} |
| @math{x_i >= x_j} for all @math{j}. When there are several equal maximum |
| elements then the first one is chosen. |
| @end deftypefun |
| |
| @deftypefun size_t gsl_stats_min_index (const double @var{data}[], size_t @var{stride}, size_t @var{n}) |
| This function returns the index of the minimum value in @var{data}, a |
| dataset of length @var{n} with stride @var{stride}. The minimum value |
| is defined as the value of the element @math{x_i} which satisfies |
| @c{$x_i \ge x_j$} |
| @math{x_i >= x_j} for all @math{j}. When there are several equal |
| minimum elements then the first one is chosen. |
| @end deftypefun |
| |
| @deftypefun void gsl_stats_minmax_index (size_t * @var{min_index}, size_t * @var{max_index}, const double @var{data}[], size_t @var{stride}, size_t @var{n}) |
| This function returns the indexes @var{min_index}, @var{max_index} of |
| the minimum and maximum values in @var{data} in a single pass. |
| @end deftypefun |
| |
| @node Median and Percentiles |
| @section Median and Percentiles |
| |
| The median and percentile functions described in this section operate on |
| sorted data. For convenience we use @dfn{quantiles}, measured on a scale |
| of 0 to 1, instead of percentiles (which use a scale of 0 to 100). |
| |
| @deftypefun double gsl_stats_median_from_sorted_data (const double @var{sorted_data}[], size_t @var{stride}, size_t @var{n}) |
| This function returns the median value of @var{sorted_data}, a dataset |
| of length @var{n} with stride @var{stride}. The elements of the array |
| must be in ascending numerical order. There are no checks to see |
| whether the data are sorted, so the function @code{gsl_sort} should |
| always be used first. |
| |
| When the dataset has an odd number of elements the median is the value |
| of element @math{(n-1)/2}. When the dataset has an even number of |
| elements the median is the mean of the two nearest middle values, |
| elements @math{(n-1)/2} and @math{n/2}. Since the algorithm for |
| computing the median involves interpolation this function always returns |
| a floating-point number, even for integer data types. |
| @end deftypefun |
| |
| @deftypefun double gsl_stats_quantile_from_sorted_data (const double @var{sorted_data}[], size_t @var{stride}, size_t @var{n}, double @var{f}) |
| This function returns a quantile value of @var{sorted_data}, a |
| double-precision array of length @var{n} with stride @var{stride}. The |
| elements of the array must be in ascending numerical order. The |
| quantile is determined by the @var{f}, a fraction between 0 and 1. For |
| example, to compute the value of the 75th percentile @var{f} should have |
| the value 0.75. |
| |
| There are no checks to see whether the data are sorted, so the function |
| @code{gsl_sort} should always be used first. |
| |
| The quantile is found by interpolation, using the formula |
| @tex |
| \beforedisplay |
| $$ |
| \hbox{quantile} = (1 - \delta) x_i + \delta x_{i+1} |
| $$ |
| \afterdisplay |
| @end tex |
| @ifinfo |
| |
| @example |
| quantile = (1 - \delta) x_i + \delta x_@{i+1@} |
| @end example |
| |
| @end ifinfo |
| @noindent |
| where @math{i} is @code{floor}(@math{(n - 1)f}) and @math{\delta} is |
| @math{(n-1)f - i}. |
| |
| Thus the minimum value of the array (@code{data[0*stride]}) is given by |
| @var{f} equal to zero, the maximum value (@code{data[(n-1)*stride]}) is |
| given by @var{f} equal to one and the median value is given by @var{f} |
| equal to 0.5. Since the algorithm for computing quantiles involves |
| interpolation this function always returns a floating-point number, even |
| for integer data types. |
| @end deftypefun |
| |
| |
| @comment @node Statistical tests |
| @comment @section Statistical tests |
| |
| @comment FIXME, do more work on the statistical tests |
| |
| @comment -@deftypefun double gsl_stats_ttest (const double @var{data1}[], double @var{data2}[], size_t @var{n1}, size_t @var{n2}) |
| @comment -@deftypefunx Statistics double gsl_stats_int_ttest (const double @var{data1}[], double @var{data2}[], size_t @var{n1}, size_t @var{n2}) |
| |
| @comment The function @code{gsl_stats_ttest} computes the t-test statistic for |
| @comment the two arrays @var{data1}[] and @var{data2}[], of lengths @var{n1} and |
| @comment -@var{n2} respectively. |
| |
| @comment The t-test statistic measures the difference between the means of two |
| @comment datasets. |
| |
| @node Example statistical programs |
| @section Examples |
| Here is a basic example of how to use the statistical functions: |
| |
| @example |
| @verbatiminclude examples/stat.c |
| @end example |
| |
| The program should produce the following output, |
| |
| @example |
| @verbatiminclude examples/stat.out |
| @end example |
| |
| |
| Here is an example using sorted data, |
| |
| @example |
| @verbatiminclude examples/statsort.c |
| @end example |
| |
| This program should produce the following output, |
| |
| @example |
| @verbatiminclude examples/statsort.out |
| @end example |
| |
| @node Statistics References and Further Reading |
| @section References and Further Reading |
| |
| The standard reference for almost any topic in statistics is the |
| multi-volume @cite{Advanced Theory of Statistics} by Kendall and Stuart. |
| |
| @itemize @asis |
| @item |
| Maurice Kendall, Alan Stuart, and J. Keith Ord. |
| @cite{The Advanced Theory of Statistics} (multiple volumes) |
| reprinted as @cite{Kendall's Advanced Theory of Statistics}. |
| Wiley, ISBN 047023380X. |
| @end itemize |
| |
| @noindent |
| Many statistical concepts can be more easily understood by a Bayesian |
| approach. The following book by Gelman, Carlin, Stern and Rubin gives a |
| comprehensive coverage of the subject. |
| |
| @itemize @asis |
| @item |
| Andrew Gelman, John B. Carlin, Hal S. Stern, Donald B. Rubin. |
| @cite{Bayesian Data Analysis}. |
| Chapman & Hall, ISBN 0412039915. |
| @end itemize |
| |
| @noindent |
| For physicists the Particle Data Group provides useful reviews of |
| Probability and Statistics in the ``Mathematical Tools'' section of its |
| Annual Review of Particle Physics. |
| |
| @itemize @asis |
| @item |
| @cite{Review of Particle Properties} |
| R.M. Barnett et al., Physical Review D54, 1 (1996) |
| @end itemize |
| |
| @noindent |
| The Review of Particle Physics is available online at |
| the website @uref{http://pdg.lbl.gov/}. |
| |
| |