target: use a workqueue for I/O completions

Instead of abusing the target processing thread for offloading I/O
completion in the backends to user context add a new workqueue.  This means
completions can be processed as fast as available CPU time allows it,
including in parallel with other completions and more importantly I/O
submission or QUEUE FULL retries.  This should give much better performance
especially on loaded systems.

As a fallout we can merge all the completed states into a single
one.

On the downside this change complicates lun reset handling a bit by
requiring us to cancel a work item only for those states that have it
initialized.  The alternative would be to either always initialize the work
item to a dummy handler, or always use the same handler and do a switch on
the state. The long term solution will be a flag that says that the command
has an initialized work item, but that's only going to be useful once we
have more users.

Signed-off-by: Christoph Hellwig <hch@lst.de>
Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org>
diff --git a/include/target/target_core_base.h b/include/target/target_core_base.h
index 07104bf..8e2c83d 100644
--- a/include/target/target_core_base.h
+++ b/include/target/target_core_base.h
@@ -86,9 +86,7 @@
 	TRANSPORT_WRITE_PENDING	= 3,
 	TRANSPORT_PROCESS_WRITE	= 4,
 	TRANSPORT_PROCESSING	= 5,
-	TRANSPORT_COMPLETE_OK	= 6,
-	TRANSPORT_COMPLETE_FAILURE = 7,
-	TRANSPORT_COMPLETE_TIMEOUT = 8,
+	TRANSPORT_COMPLETE	= 6,
 	TRANSPORT_PROCESS_TMR	= 9,
 	TRANSPORT_ISTATE_PROCESSING = 11,
 	TRANSPORT_NEW_CMD_MAP	= 16,
@@ -492,6 +490,8 @@
 	struct completion	transport_lun_stop_comp;
 	struct scatterlist	*t_tasks_sg_chained;
 
+	struct work_struct	work;
+
 	/*
 	 * Used for pre-registered fabric SGL passthrough WRITE and READ
 	 * with the special SCF_PASSTHROUGH_CONTIG_TO_SG case for TCM_Loop