| @cindex sorting |
| @cindex heapsort |
| This chapter describes functions for sorting data, both directly and |
| indirectly (using an index). All the functions use the @dfn{heapsort} |
| algorithm. Heapsort is an @math{O(N \log N)} algorithm which operates |
| in-place and does not require any additional storage. It also provides |
| consistent performance, the running time for its worst-case (ordered |
| data) being not significantly longer than the average and best cases. |
| Note that the heapsort algorithm does not preserve the relative ordering |
| of equal elements---it is an @dfn{unstable} sort. However the resulting |
| order of equal elements will be consistent across different platforms |
| when using these functions. |
| |
| @menu |
| * Sorting objects:: |
| * Sorting vectors:: |
| * Selecting the k smallest or largest elements:: |
| * Computing the rank:: |
| * Sorting Examples:: |
| * Sorting References and Further Reading:: |
| @end menu |
| |
| @node Sorting objects |
| @section Sorting objects |
| |
| The following function provides a simple alternative to the standard |
| library function @code{qsort}. It is intended for systems lacking |
| @code{qsort}, not as a replacement for it. The function @code{qsort} |
| should be used whenever possible, as it will be faster and can provide |
| stable ordering of equal elements. Documentation for @code{qsort} is |
| available in the @cite{GNU C Library Reference Manual}. |
| |
| The functions described in this section are defined in the header file |
| @file{gsl_heapsort.h}. |
| |
| @cindex comparison functions, definition |
| @deftypefun void gsl_heapsort (void * @var{array}, size_t @var{count}, size_t @var{size}, gsl_comparison_fn_t @var{compare}) |
| |
| This function sorts the @var{count} elements of the array @var{array}, |
| each of size @var{size}, into ascending order using the comparison |
| function @var{compare}. The type of the comparison function is defined by, |
| |
| @example |
| int (*gsl_comparison_fn_t) (const void * a, |
| const void * b) |
| @end example |
| |
| @noindent |
| A comparison function should return a negative integer if the first |
| argument is less than the second argument, @code{0} if the two arguments |
| are equal and a positive integer if the first argument is greater than |
| the second argument. |
| |
| For example, the following function can be used to sort doubles into |
| ascending numerical order. |
| |
| @example |
| int |
| compare_doubles (const double * a, |
| const double * b) |
| @{ |
| if (*a > *b) |
| return 1; |
| else if (*a < *b) |
| return -1; |
| else |
| return 0; |
| @} |
| @end example |
| |
| @noindent |
| The appropriate function call to perform the sort is, |
| |
| @example |
| gsl_heapsort (array, count, sizeof(double), |
| compare_doubles); |
| @end example |
| |
| Note that unlike @code{qsort} the heapsort algorithm cannot be made into |
| a stable sort by pointer arithmetic. The trick of comparing pointers for |
| equal elements in the comparison function does not work for the heapsort |
| algorithm. The heapsort algorithm performs an internal rearrangement of |
| the data which destroys its initial ordering. |
| @end deftypefun |
| |
| @cindex indirect sorting |
| @deftypefun int gsl_heapsort_index (size_t * @var{p}, const void * @var{array}, size_t @var{count}, size_t @var{size}, gsl_comparison_fn_t @var{compare}) |
| |
| This function indirectly sorts the @var{count} elements of the array |
| @var{array}, each of size @var{size}, into ascending order using the |
| comparison function @var{compare}. The resulting permutation is stored |
| in @var{p}, an array of length @var{n}. The elements of @var{p} give the |
| index of the array element which would have been stored in that position |
| if the array had been sorted in place. The first element of @var{p} |
| gives the index of the least element in @var{array}, and the last |
| element of @var{p} gives the index of the greatest element in |
| @var{array}. The array itself is not changed. |
| @end deftypefun |
| |
| @node Sorting vectors |
| @section Sorting vectors |
| |
| The following functions will sort the elements of an array or vector, |
| either directly or indirectly. They are defined for all real and integer |
| types using the normal suffix rules. For example, the @code{float} |
| versions of the array functions are @code{gsl_sort_float} and |
| @code{gsl_sort_float_index}. The corresponding vector functions are |
| @code{gsl_sort_vector_float} and @code{gsl_sort_vector_float_index}. The |
| prototypes are available in the header files @file{gsl_sort_float.h} |
| @file{gsl_sort_vector_float.h}. The complete set of prototypes can be |
| included using the header files @file{gsl_sort.h} and |
| @file{gsl_sort_vector.h}. |
| |
| There are no functions for sorting complex arrays or vectors, since the |
| ordering of complex numbers is not uniquely defined. To sort a complex |
| vector by magnitude compute a real vector containing the magnitudes |
| of the complex elements, and sort this vector indirectly. The resulting |
| index gives the appropriate ordering of the original complex vector. |
| |
| @cindex sorting vector elements |
| @cindex vector, sorting elements of |
| @deftypefun void gsl_sort (double * @var{data}, size_t @var{stride}, size_t @var{n}) |
| This function sorts the @var{n} elements of the array @var{data} with |
| stride @var{stride} into ascending numerical order. |
| @end deftypefun |
| |
| @deftypefun void gsl_sort_vector (gsl_vector * @var{v}) |
| This function sorts the elements of the vector @var{v} into ascending |
| numerical order. |
| @end deftypefun |
| |
| @cindex indirect sorting, of vector elements |
| @deftypefun void gsl_sort_index (size_t * @var{p}, const double * @var{data}, size_t @var{stride}, size_t @var{n}) |
| This function indirectly sorts the @var{n} elements of the array |
| @var{data} with stride @var{stride} into ascending order, storing the |
| resulting permutation in @var{p}. The array @var{p} must be allocated with |
| a sufficient length to store the @var{n} elements of the permutation. |
| The elements of @var{p} give the index of the array element which would |
| have been stored in that position if the array had been sorted in place. |
| The array @var{data} is not changed. |
| @end deftypefun |
| |
| @deftypefun int gsl_sort_vector_index (gsl_permutation * @var{p}, const gsl_vector * @var{v}) |
| This function indirectly sorts the elements of the vector @var{v} into |
| ascending order, storing the resulting permutation in @var{p}. The |
| elements of @var{p} give the index of the vector element which would |
| have been stored in that position if the vector had been sorted in |
| place. The first element of @var{p} gives the index of the least element |
| in @var{v}, and the last element of @var{p} gives the index of the |
| greatest element in @var{v}. The vector @var{v} is not changed. |
| @end deftypefun |
| |
| @node Selecting the k smallest or largest elements |
| @section Selecting the k smallest or largest elements |
| |
| The functions described in this section select the @math{k} smallest |
| or largest elements of a data set of size @math{N}. The routines use an |
| @math{O(kN)} direct insertion algorithm which is suited to subsets that |
| are small compared with the total size of the dataset. For example, the |
| routines are useful for selecting the 10 largest values from one million |
| data points, but not for selecting the largest 100,000 values. If the |
| subset is a significant part of the total dataset it may be faster |
| to sort all the elements of the dataset directly with an @math{O(N \log |
| N)} algorithm and obtain the smallest or largest values that way. |
| |
| @deftypefun int gsl_sort_smallest (double * @var{dest}, size_t @var{k}, const double * @var{src}, size_t @var{stride}, size_t @var{n}) |
| This function copies the @var{k} smallest elements of the array |
| @var{src}, of size @var{n} and stride @var{stride}, in ascending |
| numerical order into the array @var{dest}. The size @var{k} of the subset must be |
| less than or equal to @var{n}. The data @var{src} is not modified by |
| this operation. |
| @end deftypefun |
| |
| @deftypefun int gsl_sort_largest (double * @var{dest}, size_t @var{k}, const double * @var{src}, size_t @var{stride}, size_t @var{n}) |
| This function copies the @var{k} largest elements of the array |
| @var{src}, of size @var{n} and stride @var{stride}, in descending |
| numerical order into the array @var{dest}. @var{k} must be |
| less than or equal to @var{n}. The data @var{src} is not modified by |
| this operation. |
| @end deftypefun |
| |
| @deftypefun int gsl_sort_vector_smallest (double * @var{dest}, size_t @var{k}, const gsl_vector * @var{v}) |
| @deftypefunx int gsl_sort_vector_largest (double * @var{dest}, size_t @var{k}, const gsl_vector * @var{v}) |
| These functions copy the @var{k} smallest or largest elements of the |
| vector @var{v} into the array @var{dest}. @var{k} |
| must be less than or equal to the length of the vector @var{v}. |
| @end deftypefun |
| |
| The following functions find the indices of the @math{k} smallest or |
| largest elements of a dataset, |
| |
| @deftypefun int gsl_sort_smallest_index (size_t * @var{p}, size_t @var{k}, const double * @var{src}, size_t @var{stride}, size_t @var{n}) |
| This function stores the indices of the @var{k} smallest elements of |
| the array @var{src}, of size @var{n} and stride @var{stride}, in the |
| array @var{p}. The indices are chosen so that the corresponding data is |
| in ascending numerical order. @var{k} must be |
| less than or equal to @var{n}. The data @var{src} is not modified by |
| this operation. |
| @end deftypefun |
| |
| @deftypefun int gsl_sort_largest_index (size_t * @var{p}, size_t @var{k}, const double * @var{src}, size_t @var{stride}, size_t @var{n}) |
| This function stores the indices of the @var{k} largest elements of |
| the array @var{src}, of size @var{n} and stride @var{stride}, in the |
| array @var{p}. The indices are chosen so that the corresponding data is |
| in descending numerical order. @var{k} must be |
| less than or equal to @var{n}. The data @var{src} is not modified by |
| this operation. |
| @end deftypefun |
| |
| @deftypefun int gsl_sort_vector_smallest_index (size_t * @var{p}, size_t @var{k}, const gsl_vector * @var{v}) |
| @deftypefunx int gsl_sort_vector_largest_index (size_t * @var{p}, size_t @var{k}, const gsl_vector * @var{v}) |
| These functions store the indices of the @var{k} smallest or largest |
| elements of the vector @var{v} in the array @var{p}. @var{k} must be less than or equal to the length of the vector |
| @var{v}. |
| @end deftypefun |
| |
| |
| @node Computing the rank |
| @section Computing the rank |
| |
| The @dfn{rank} of an element is its order in the sorted data. The rank |
| is the inverse of the index permutation, @var{p}. It can be computed |
| using the following algorithm, |
| |
| @example |
| for (i = 0; i < p->size; i++) |
| @{ |
| size_t pi = p->data[i]; |
| rank->data[pi] = i; |
| @} |
| @end example |
| |
| @noindent |
| This can be computed directly from the function |
| @code{gsl_permutation_inverse(rank,p)}. |
| |
| The following function will print the rank of each element of the vector |
| @var{v}, |
| |
| @example |
| void |
| print_rank (gsl_vector * v) |
| @{ |
| size_t i; |
| size_t n = v->size; |
| gsl_permutation * perm = gsl_permutation_alloc(n); |
| gsl_permutation * rank = gsl_permutation_alloc(n); |
| |
| gsl_sort_vector_index (perm, v); |
| gsl_permutation_inverse (rank, perm); |
| |
| for (i = 0; i < n; i++) |
| @{ |
| double vi = gsl_vector_get(v, i); |
| printf ("element = %d, value = %g, rank = %d\n", |
| i, vi, rank->data[i]); |
| @} |
| |
| gsl_permutation_free (perm); |
| gsl_permutation_free (rank); |
| @} |
| @end example |
| |
| @node Sorting Examples |
| @section Examples |
| |
| The following example shows how to use the permutation @var{p} to print |
| the elements of the vector @var{v} in ascending order, |
| |
| @example |
| gsl_sort_vector_index (p, v); |
| |
| for (i = 0; i < v->size; i++) |
| @{ |
| double vpi = gsl_vector_get (v, p->data[i]); |
| printf ("order = %d, value = %g\n", i, vpi); |
| @} |
| @end example |
| |
| @noindent |
| The next example uses the function @code{gsl_sort_smallest} to select |
| the 5 smallest numbers from 100000 uniform random variates stored in an |
| array, |
| |
| @example |
| @verbatiminclude examples/sortsmall.c |
| @end example |
| The output lists the 5 smallest values, in ascending order, |
| |
| @example |
| $ ./a.out |
| @verbatiminclude examples/sortsmall.out |
| @end example |
| |
| @node Sorting References and Further Reading |
| @section References and Further Reading |
| |
| The subject of sorting is covered extensively in Knuth's |
| @cite{Sorting and Searching}, |
| |
| @itemize @asis |
| @item |
| Donald E. Knuth, @cite{The Art of Computer Programming: Sorting and |
| Searching} (Vol 3, 3rd Ed, 1997), Addison-Wesley, ISBN 0201896850. |
| @end itemize |
| |
| @noindent |
| The Heapsort algorithm is described in the following book, |
| |
| @itemize @asis |
| @item Robert Sedgewick, @cite{Algorithms in C}, Addison-Wesley, |
| ISBN 0201514257. |
| @end itemize |
| |
| |