src/parsec/disk-image/parsec/parsec-benchmark/pkgs/libs/tbblib/src/examples/parallel_for/polygon_overlay/index.html - public/gem5-resources - Git at Google

 <HTML>
 <BODY>

 <H2>Overview</H2>
 Polygon Overlay example that demonstrates the use of parallel_for.
 <P>
 This example is a simple implementation of polygon overlay, as described in
 <a href="http://citeseer.ist.psu.edu/cache/papers/cs/11981/ftp:zSzzSzftp.cs.vu.nlzSzpubzSzbalzSzcowichanzSzPolygonzSzreport.pdf/langendoen95parallelizing.pdf">
 	<i>Parallelizing the Polygon Overlay Problem Using Orca</i>, by H.F. Langendoen</a>.
 </P>
 The solution was implemented in three forms:
 <UL>
 	<LI>The naive serial solution.
 	<LI>The naive parallel solution, by splitting list of polygons from one map and intersecting
 	each sub-list against the entire list of polygons from the second map.
 	<LI>A parallel solution where each map is split into submaps, with each resulting submap being
 	intersected against the corresponding submap from the other map.  This solution requires some
 	redundancy (some polygons are members of more than one submap).  To prevent multiple copies
 	of a polygon from being placed in the solution map, if both polygons are duplicated (that is,
 	if they both appear in more than one map), they are intersected but the result is not placed
 	in the solution map.
 </UL>
 The only optimization in each solution is that the area of the generated sub-polygons are subtracted from
 the original area of one of the source polygons.  When the remaining area is zero, the intersection process
 is halted.
 <P>
 <i>A word about the speedup of the submap case.</i>  One may get superlinear speedup in this case (for instance a
 laptop with Intel&reg; Core(TM) Duo processor got a speedup of about 20 percent over serial.)  This results from two effects:
 </P>
 <UL>
 	<LI>the number of threads used, and
 	<LI>the fact that for each submap, the number of polygons is smaller than that for the other two cases.
 </UL>
 If there are, say, 400 polygons in each map, then on average the number of intersections calculated is
 approximately 80,000 (400 * 200, where 200 is the average number of polygons examined before stopping.)
 If the maps are split into 2 submaps, the time for each submap is about 200*100, or 20,000.  So even
 comparing the two sets of submaps serially should result in a speedup somewhere around 2.  This number
 is affected by the number of redundant polygons being compared; this effect would eventually swamp the gain
 from comparing smaller numbers of polygons per submap.  And remember the submaps are created by intersecting each
 map with a rectangular polygon covering the submap being generated, which is additional work taking about N * O(400)
 in the case above, where N is the number of submaps generated, that can be done in parallel.
 <P>
 Running the default release pover while varying the number of submaps from 1 to 1000, the speedup on the submap
 case for a 2-processor system looks like<br>
 <img src="speedup.gif" alt="Table of speedup for the algorithm"><br>
 </P>
 <P>
 One further optimization would be to sort one map, say <b>map1</b>, by maxY, and sort the other map (<b>map2</b>)
 by minY.  For <b>p1</b> in <b>map1</b>, start testing for intersection at the first <b>p2</b> in <b>map2</b>
 that intersected the last polygon tested in <b>map1</b>.  This would speed up the intersection process greatly,
 but the optimization would apply to all the methods, and the sort would have to be accounted for in the timing.
 </P>
 <P>
 The source maps are generated pseudo-randomly in the manner described in the paper above.  That is, if
 we need N polygons, then N "boxes" are chosen at random, then one-at-a-time the areas are expanded in
 one of fours directions until the area hits an adjacent polygon.  When this process is finished, the
 resulting map is inspected and any remaining unoccupied "boxes" are made into additional polygons, as
 large as possible in each case.  So the actual number of polygons in each map will in general be larger
 than the number of polygons requested (sometimes by 10% or more.)
 </P>
 <P>
 One limitation of the program is that if the number of polygons in the source map is greater than the number of
 "boxes" (pixels in the GUI case), the maps cannot be generated.
 </P>

 <H2>Files</H2>
 <DL>
 <DT><A HREF="polyover.cpp">polyover.cpp</A>
 <DD>Source code for main program.
 <DT><A HREF="polyover.h">polyover.h</A>
 <DD>Global variables, classes and enums.
 <DT><A HREF="pover_video.cpp">pover_video.cpp</A>
 <DD>Source code for the GUI interface.
 <DT><A HREF="pover_video.h">pover_video.h</A>
 <DD>Defines for the GUI version.
 <DT><A HREF="Makefile">Makefile</A>
 <DD>Makefile for building example.
 </DL>

 <H2>Directories</H2>
 <DL>
 <DT><A HREF="msvs">msvs</A>
 <DD>Contains Microsoft* Visual Studio* 2005 workspace for building and running the example.
 <DT><A HREF="xcode">xcode</A>
 <DD>Contains Mac OS* Xcode* workspace for building and running the example.
 </DL>

 <H2>To Build</H2>
 General build directions can be found <A HREF="../../index.html#build">here</A>.  For the various UI options, see the <A HREF="../../common/index.html">common GUI code</A> build instructions.

 <P>
 For Windows* systems, Microsoft* Visual Studio* projects are provided for each of the above versions.
 </P>

 <H2>Usage</H2>
 Building via the above make commands, or via Visual Studio projects on Windows* systems, produces executable files
 named pover.exe.  To run these executables directly, use one or more of the following commands.
 <DL>
 <DT><TT>pover.exe</TT>
 <DD>Run this version (release or debug).
 <DT><TT>pover.exe n:m</TT>
 <DD>Run this version (release or debug) (m-n+1) times, with n threads to m threads inclusive.
 <DT>To run a short version of this example, e.g., for use with Intel&reg; Threading Tools:
 <DD>Build a <I>debug</I> version with the GUI turned off
     (e.g., <TT>make UI=con debug</TT>; see also the build directions above).
     <BR>Run it with a small dataset, e.g., <TT>pover.exe --polys 10 --size 5x5</TT>.
 </DL>

 <H2>Notes</H2>
 <UL>
 <LI>While running with the GUI display should yield reasonable performance in most cases, <I>running with no GUI
     display is strongly recommended</I> in order to demonstrate the full performance and scalability of the example.
 <LI>If using the X-windows (X11) GUI on Mac OS* systems, X11 might not be installed on the system by default.
     To install X11 on Mac OS* systems, use the operating system install disk, choose "Optional installs" and select X11 from
     the "Applications" list.  Alternatively, if X11 is not available, build without the GUI (see build targets above).
 </UL>

 <HR>
 <A HREF="../index.html">Up to parent directory</A>
 <p></p>
 Copyright &copy; 2005-2010 Intel Corporation.  All Rights Reserved.
 <p></p>
 Intel, Pentium, Intel Xeon, Itanium, Intel XScale and VTune are
 registered trademarks or trademarks of Intel Corporation or its
 subsidiaries in the United States and other countries.
 <p></p>
 * Other names and brands may be claimed as the property of others.
 </BODY>
 </HTML>
	<HTML>
	<BODY>

	<H2>Overview</H2>
	Polygon Overlay example that demonstrates the use of parallel_for.
	<P>
	This example is a simple implementation of polygon overlay, as described in
	<a href="http://citeseer.ist.psu.edu/cache/papers/cs/11981/ftp:zSzzSzftp.cs.vu.nlzSzpubzSzbalzSzcowichanzSzPolygonzSzreport.pdf/langendoen95parallelizing.pdf">
	<i>Parallelizing the Polygon Overlay Problem Using Orca</i>, by H.F. Langendoen</a>.
	</P>
	The solution was implemented in three forms:
	<UL>
	<LI>The naive serial solution.
	<LI>The naive parallel solution, by splitting list of polygons from one map and intersecting
	each sub-list against the entire list of polygons from the second map.
	<LI>A parallel solution where each map is split into submaps, with each resulting submap being
	intersected against the corresponding submap from the other map. This solution requires some
	redundancy (some polygons are members of more than one submap). To prevent multiple copies
	of a polygon from being placed in the solution map, if both polygons are duplicated (that is,
	if they both appear in more than one map), they are intersected but the result is not placed
	in the solution map.
	</UL>
	The only optimization in each solution is that the area of the generated sub-polygons are subtracted from
	the original area of one of the source polygons. When the remaining area is zero, the intersection process
	is halted.
	<P>
	<i>A word about the speedup of the submap case.</i> One may get superlinear speedup in this case (for instance a
	laptop with Intel® Core(TM) Duo processor got a speedup of about 20 percent over serial.) This results from two effects:
	</P>
	<UL>
	<LI>the number of threads used, and
	<LI>the fact that for each submap, the number of polygons is smaller than that for the other two cases.
	</UL>
	If there are, say, 400 polygons in each map, then on average the number of intersections calculated is
	approximately 80,000 (400 * 200, where 200 is the average number of polygons examined before stopping.)
	If the maps are split into 2 submaps, the time for each submap is about 200*100, or 20,000. So even
	comparing the two sets of submaps serially should result in a speedup somewhere around 2. This number
	is affected by the number of redundant polygons being compared; this effect would eventually swamp the gain
	from comparing smaller numbers of polygons per submap. And remember the submaps are created by intersecting each
	map with a rectangular polygon covering the submap being generated, which is additional work taking about N * O(400)
	in the case above, where N is the number of submaps generated, that can be done in parallel.
	<P>
	Running the default release pover while varying the number of submaps from 1 to 1000, the speedup on the submap
	case for a 2-processor system looks like<br>
	<img src="speedup.gif" alt="Table of speedup for the algorithm"><br>
	</P>
	<P>
	One further optimization would be to sort one map, say <b>map1</b>, by maxY, and sort the other map (<b>map2</b>)
	by minY. For <b>p1</b> in <b>map1</b>, start testing for intersection at the first <b>p2</b> in <b>map2</b>
	that intersected the last polygon tested in <b>map1</b>. This would speed up the intersection process greatly,
	but the optimization would apply to all the methods, and the sort would have to be accounted for in the timing.
	</P>
	<P>
	The source maps are generated pseudo-randomly in the manner described in the paper above. That is, if
	we need N polygons, then N "boxes" are chosen at random, then one-at-a-time the areas are expanded in
	one of fours directions until the area hits an adjacent polygon. When this process is finished, the
	resulting map is inspected and any remaining unoccupied "boxes" are made into additional polygons, as
	large as possible in each case. So the actual number of polygons in each map will in general be larger
	than the number of polygons requested (sometimes by 10% or more.)
	</P>
	<P>
	One limitation of the program is that if the number of polygons in the source map is greater than the number of
	"boxes" (pixels in the GUI case), the maps cannot be generated.
	</P>

	<H2>Files</H2>
	<DL>
	<DT><A HREF="polyover.cpp">polyover.cpp</A>
	<DD>Source code for main program.
	<DT><A HREF="polyover.h">polyover.h</A>
	<DD>Global variables, classes and enums.
	<DT><A HREF="pover_video.cpp">pover_video.cpp</A>
	<DD>Source code for the GUI interface.
	<DT><A HREF="pover_video.h">pover_video.h</A>
	<DD>Defines for the GUI version.
	<DT><A HREF="Makefile">Makefile</A>
	<DD>Makefile for building example.
	</DL>

	<H2>Directories</H2>
	<DL>
	<DT><A HREF="msvs">msvs</A>
	<DD>Contains Microsoft* Visual Studio* 2005 workspace for building and running the example.
	<DT><A HREF="xcode">xcode</A>
	<DD>Contains Mac OS* Xcode* workspace for building and running the example.
	</DL>

	<H2>To Build</H2>
	General build directions can be found <A HREF="../../index.html#build">here</A>. For the various UI options, see the <A HREF="../../common/index.html">common GUI code</A> build instructions.

	<P>
	For Windows* systems, Microsoft* Visual Studio* projects are provided for each of the above versions.
	</P>

	<H2>Usage</H2>
	Building via the above make commands, or via Visual Studio projects on Windows* systems, produces executable files
	named pover.exe. To run these executables directly, use one or more of the following commands.
	<DL>
	<DT><TT>pover.exe</TT>
	<DD>Run this version (release or debug).
	<DT><TT>pover.exe n:m</TT>
	<DD>Run this version (release or debug) (m-n+1) times, with n threads to m threads inclusive.
	<DT>To run a short version of this example, e.g., for use with Intel® Threading Tools:
	<DD>Build a <I>debug</I> version with the GUI turned off
	(e.g., <TT>make UI=con debug</TT>; see also the build directions above).
	<BR>Run it with a small dataset, e.g., <TT>pover.exe --polys 10 --size 5x5</TT>.
	</DL>

	<H2>Notes</H2>
	<UL>
	<LI>While running with the GUI display should yield reasonable performance in most cases, <I>running with no GUI
	display is strongly recommended</I> in order to demonstrate the full performance and scalability of the example.
	<LI>If using the X-windows (X11) GUI on Mac OS* systems, X11 might not be installed on the system by default.
	To install X11 on Mac OS* systems, use the operating system install disk, choose "Optional installs" and select X11 from
	the "Applications" list. Alternatively, if X11 is not available, build without the GUI (see build targets above).
	</UL>

	<HR>
	<A HREF="../index.html">Up to parent directory</A>
	<p></p>
	Copyright © 2005-2010 Intel Corporation. All Rights Reserved.
	<p></p>
	Intel, Pentium, Intel Xeon, Itanium, Intel XScale and VTune are
	registered trademarks or trademarks of Intel Corporation or its
	subsidiaries in the United States and other countries.
	<p></p>
	* Other names and brands may be claimed as the property of others.
	</BODY>
	</HTML>