src/parsec/disk-image/parsec/parsec-benchmark/pkgs/libs/tbblib/src/examples/parallel_do/parallel_preorder/index.html - public/gem5-resources - Git at Google

 <HTML>
 <BODY>

 <H2>Overview</H2>
 Example that uses parallel_do to do parallel preorder traversal of a sparse graph.
 <P>
 Each vertex in the graph is called a "cell".
 Each cell has a value.
 The value is a matrix.
 Some of the cells have operators
 that compute the cell's value, using other cell's values as input.
 A cell that uses the value of cell x is called a successor of x.
 </P><P>
 The algorithm works as follows.
 <OL>
 <LI> Compute the set of cells that have no inputs. This set is called <TT>root_set</TT>.
 <LI> Each cell has an associated field <TT>ref_count</TT> that is an atomic integer.
      Initialize <TT>ref_count</TT> to the number of inputs for the Cell.
 <LI> Update each cell in <TT>root_set</TT>, by applying a <TT>parallel_do</TT> to a <TT>root_set</TT>
 <LI> After updating a cell, for each of its successors
 <OL>
 <LI> Atomically decrement the successor's <TT>ref_count</TT>
 <LI> If the count became zero, add the cell to the set of cells to be updated,
      by calling <TT>parallel_do_feeder_impl::add</TT>.
 </OL>
 </OL>
 </P><P>
 The times printed are for the traversal and update,
 and do not include time for computing the root_set.
 </P>
 <B>NOTE: </B>It is important to understand that this example is unlikely to show speedup
 if the cell values are changed to type "float".  The reason is twofold.
 <UL>
 <LI> The smaller value type causes each Cell to be significantly smaller than a cache line,
      which leads to false sharing conflicts.
 <LI> The time to update the cells becomes very small, and consequently the overhead of
      parallel_do swamps the useful work.
 </UL>

 <H2>Files</H2>
 <DL>
 <DT><A HREF="parallel_preorder.cpp">parallel_preorder.cpp</A>
 <DD>Source code for example.
 <DT><A HREF="Graph.cpp">Graph.cpp</A>
 <DD>Source code for example.
 <DT><A HREF="Graph.h">Graph.h</A>
 <DD>Source code for example.
 <DT><A HREF="Matrix.h">Matrix.h</A>
 <DD>Source code for example.
 <DT><A HREF="Makefile">Makefile</A>
 <DD>Makefile for building example.
 </DL>

 <H2>Directories</H2>
 <DL>
 <DT><A HREF="msvs">msvs</A>
 <DD>Contains Microsoft* Visual Studio* 2005 workspace for building and running the example.
 <DT><A HREF="xcode">xcode</A>
 <DD>Contains Xcode* IDE workspace for building and running the example.
 </DL>

 <H2>To Build</H2>
 General build directions can be found <A HREF=../../index.html#build>here</A>.

 <H2>Usage</H2>
 <DL>
 <DT><TT>parallel_preorder [<I>M</I>[:<I>N</I>] [<I>Rounds</I> [<I>'pause'</I>]]]</TT>
 <DD><I>M</I> and <I>N</I> are a range of numbers of threads to be used.
 <DD><I>Rounds</I> is the number of rounds the example runs internally. Default value
     is 50; reduce it to shorten example run time.
 <DD>If 'pause' is specified, the application will wait for a user to hit return before it exits.
 <DT>To run a short version of this example, e.g., for use with Intel&reg; Threading Tools:
 <DD>Build a <I>debug</I> version of the example
     (see the <A HREF=../../index.html#build>build directions</A>).
     <BR>Run it with the desired number of threads and smaller number of rounds, e.g., <TT>parallel_preorder 4 5</TT>.
 </DL>

 <HR>
 <A HREF="../index.html">Up to parent directory</A>
 <p></p>
 Copyright &copy; 2005-2010 Intel Corporation.  All Rights Reserved.
 <p></p>
 Intel, Pentium, Intel Xeon, Itanium, Intel XScale and VTune are
 registered trademarks or trademarks of Intel Corporation or its
 subsidiaries in the United States and other countries.
 <p></p>
 * Other names and brands may be claimed as the property of others.
 </BODY>
 </HTML>
	<HTML>
	<BODY>

	<H2>Overview</H2>
	Example that uses parallel_do to do parallel preorder traversal of a sparse graph.
	<P>
	Each vertex in the graph is called a "cell".
	Each cell has a value.
	The value is a matrix.
	Some of the cells have operators
	that compute the cell's value, using other cell's values as input.
	A cell that uses the value of cell x is called a successor of x.
	</P><P>
	The algorithm works as follows.
	<OL>
	<LI> Compute the set of cells that have no inputs. This set is called <TT>root_set</TT>.
	<LI> Each cell has an associated field <TT>ref_count</TT> that is an atomic integer.
	Initialize <TT>ref_count</TT> to the number of inputs for the Cell.
	<LI> Update each cell in <TT>root_set</TT>, by applying a <TT>parallel_do</TT> to a <TT>root_set</TT>
	<LI> After updating a cell, for each of its successors
	<OL>
	<LI> Atomically decrement the successor's <TT>ref_count</TT>
	<LI> If the count became zero, add the cell to the set of cells to be updated,
	by calling <TT>parallel_do_feeder_impl::add</TT>.
	</OL>
	</OL>
	</P><P>
	The times printed are for the traversal and update,
	and do not include time for computing the root_set.
	</P>
	<B>NOTE: </B>It is important to understand that this example is unlikely to show speedup
	if the cell values are changed to type "float". The reason is twofold.
	<UL>
	<LI> The smaller value type causes each Cell to be significantly smaller than a cache line,
	which leads to false sharing conflicts.
	<LI> The time to update the cells becomes very small, and consequently the overhead of
	parallel_do swamps the useful work.
	</UL>

	<H2>Files</H2>
	<DL>
	<DT><A HREF="parallel_preorder.cpp">parallel_preorder.cpp</A>
	<DD>Source code for example.
	<DT><A HREF="Graph.cpp">Graph.cpp</A>
	<DD>Source code for example.
	<DT><A HREF="Graph.h">Graph.h</A>
	<DD>Source code for example.
	<DT><A HREF="Matrix.h">Matrix.h</A>
	<DD>Source code for example.
	<DT><A HREF="Makefile">Makefile</A>
	<DD>Makefile for building example.
	</DL>

	<H2>Directories</H2>
	<DL>
	<DT><A HREF="msvs">msvs</A>
	<DD>Contains Microsoft* Visual Studio* 2005 workspace for building and running the example.
	<DT><A HREF="xcode">xcode</A>
	<DD>Contains Xcode* IDE workspace for building and running the example.
	</DL>

	<H2>To Build</H2>
	General build directions can be found <A HREF=../../index.html#build>here</A>.

	<H2>Usage</H2>
	<DL>
	<DT><TT>parallel_preorder [<I>M</I>[:<I>N</I>] [<I>Rounds</I> [<I>'pause'</I>]]]</TT>
	<DD><I>M</I> and <I>N</I> are a range of numbers of threads to be used.
	<DD><I>Rounds</I> is the number of rounds the example runs internally. Default value
	is 50; reduce it to shorten example run time.
	<DD>If 'pause' is specified, the application will wait for a user to hit return before it exits.
	<DT>To run a short version of this example, e.g., for use with Intel® Threading Tools:
	<DD>Build a <I>debug</I> version of the example
	(see the <A HREF=../../index.html#build>build directions</A>).
	<BR>Run it with the desired number of threads and smaller number of rounds, e.g., <TT>parallel_preorder 4 5</TT>.
	</DL>

	<HR>
	<A HREF="../index.html">Up to parent directory</A>
	<p></p>
	Copyright © 2005-2010 Intel Corporation. All Rights Reserved.
	<p></p>
	Intel, Pentium, Intel Xeon, Itanium, Intel XScale and VTune are
	registered trademarks or trademarks of Intel Corporation or its
	subsidiaries in the United States and other countries.
	<p></p>
	* Other names and brands may be claimed as the property of others.
	</BODY>
	</HTML>