blob: a276978755f8846fd9cb9c0f76dadd59582c2ae2 [file] [log] [blame]
<HTML>
<BODY>
<H2>Overview</H2>
Example that uses parallel_do to do parallel preorder traversal of a sparse graph.
<P>
Each vertex in the graph is called a "cell".
Each cell has a value.
The value is a matrix.
Some of the cells have operators
that compute the cell's value, using other cell's values as input.
A cell that uses the value of cell x is called a successor of x.
</P><P>
The algorithm works as follows.
<OL>
<LI> Compute the set of cells that have no inputs. This set is called <TT>root_set</TT>.
<LI> Each cell has an associated field <TT>ref_count</TT> that is an atomic integer.
Initialize <TT>ref_count</TT> to the number of inputs for the Cell.
<LI> Update each cell in <TT>root_set</TT>, by applying a <TT>parallel_do</TT> to a <TT>root_set</TT>
<LI> After updating a cell, for each of its successors
<OL>
<LI> Atomically decrement the successor's <TT>ref_count</TT>
<LI> If the count became zero, add the cell to the set of cells to be updated,
by calling <TT>parallel_do_feeder_impl::add</TT>.
</OL>
</OL>
</P><P>
The times printed are for the traversal and update,
and do not include time for computing the root_set.
</P>
<B>NOTE: </B>It is important to understand that this example is unlikely to show speedup
if the cell values are changed to type "float". The reason is twofold.
<UL>
<LI> The smaller value type causes each Cell to be significantly smaller than a cache line,
which leads to false sharing conflicts.
<LI> The time to update the cells becomes very small, and consequently the overhead of
parallel_do swamps the useful work.
</UL>
<H2>Files</H2>
<DL>
<DT><A HREF="parallel_preorder.cpp">parallel_preorder.cpp</A>
<DD>Source code for example.
<DT><A HREF="Graph.cpp">Graph.cpp</A>
<DD>Source code for example.
<DT><A HREF="Graph.h">Graph.h</A>
<DD>Source code for example.
<DT><A HREF="Matrix.h">Matrix.h</A>
<DD>Source code for example.
<DT><A HREF="Makefile">Makefile</A>
<DD>Makefile for building example.
</DL>
<H2>Directories</H2>
<DL>
<DT><A HREF="msvs">msvs</A>
<DD>Contains Microsoft* Visual Studio* 2005 workspace for building and running the example.
<DT><A HREF="xcode">xcode</A>
<DD>Contains Xcode* IDE workspace for building and running the example.
</DL>
<H2>To Build</H2>
General build directions can be found <A HREF=../../index.html#build>here</A>.
<H2>Usage</H2>
<DL>
<DT><TT>parallel_preorder [<I>M</I>[:<I>N</I>] [<I>Rounds</I> [<I>'pause'</I>]]]</TT>
<DD><I>M</I> and <I>N</I> are a range of numbers of threads to be used.
<DD><I>Rounds</I> is the number of rounds the example runs internally. Default value
is 50; reduce it to shorten example run time.
<DD>If 'pause' is specified, the application will wait for a user to hit return before it exits.
<DT>To run a short version of this example, e.g., for use with Intel&reg; Threading Tools:
<DD>Build a <I>debug</I> version of the example
(see the <A HREF=../../index.html#build>build directions</A>).
<BR>Run it with the desired number of threads and smaller number of rounds, e.g., <TT>parallel_preorder 4 5</TT>.
</DL>
<HR>
<A HREF="../index.html">Up to parent directory</A>
<p></p>
Copyright &copy; 2005-2010 Intel Corporation. All Rights Reserved.
<p></p>
Intel, Pentium, Intel Xeon, Itanium, Intel XScale and VTune are
registered trademarks or trademarks of Intel Corporation or its
subsidiaries in the United States and other countries.
<p></p>
* Other names and brands may be claimed as the property of others.
</BODY>
</HTML>