As a follow up to my previous post: Plan guides and the risk of validation with built in functions I wanted to discuss an idea I had when tuning the query mentioned in that post:
Is there a way to force a query to run in parallel?
As mentioned in the last post I was tuning a stored procedure with a close deadline and I was very limited on the actual code changes I was able to make before that deadline.
This procedure used a User Defined Type table to pass 5000 rows as a parameter to the procedure. UDT tables are basically a table variable, which means they do not have statistics objects. SQL Server uses statistics objects to estimate how many rows in a table will meet a certain condition; this could be when filtering, joining, etc. Then with that estimate the SQL engine decides how to process the operation. If you don’t have statistics then the estimate is always 1. Because of this Microsoft do not recommend the use of UDT tables for more than 1000 rows.
Changing the stored procedure to read the data from a staging table instead of the UDT table variable is probably the long term fix, but this may require changes to the application and time. I still needed to tune this stored procedure to improve the performance before the deadline hit.
If I was only able to change the stored procedure (and not the application) I would have tried passing the data from the UDT table into a #temp table to get the statistics I needed. This would mean that the data would be stored in tempDB twice but that would probably have been OK here.
So in the previous post I chose to set out a plan to get the code changes in place eventually and in the meantime to plug the gaps with a plan guide that told the SQL engine exactly how process the joins. This worked and got the extra performance I needed, but there seemed like there was room for more.
When experimenting with the code I substituted the UDT table with a #temp table to see how the execution plan looked when statistics were present. This gave me the information I needed to make an informed decision on the JOIN option I was going to use in the plan guide. I also noticed that when the statistics were present the execution plan was a parallel plan. This was because (thanks to the statistics of the #temp table) it knew there were enough rows to warrant the extra overhead of splitting the work up into segments and distributing the work across the available processors.
Now I was able to tell the engine how I wanted it to join the data but I also wanted to tell it to do it as a parallel operation. This is when the idea mentioned at the top happened. I was aware of the MAX DOP (Maximum Degree Of Parallelism) query hint but was there a MIN DOP hint?
There isn’t but I did find a connect item opened by Paul White asking for such a feature. He only beat me to it by about 4 years.
In the comments of that connect item Microsoft mention that it is possible to force parallelism with the use of a trace flag 8649. I tested passing this trace flag to my stored procedure. It gave me the parallel plan and the performance I wanted. Normally I would have been showing it off and bragging about the cool feature I had found that was going to save the day but… I don’t like trace flags and it’s an undocumented and unsupported trace flag. Not the sort of thing I want to be throwing into production.
So I lived with it. The plan guide got the procedure to perform well enough but it would have been nice to go further with a documented and supported MINDOP hint.
If you think a MINDOP hint would be useful then give Paul’s connect item a vote. Also be sure to read this excellent article by Paul that I found only after reading the connect item. It goes into much more detail on how you could find yourself in the need of something like a MINDOP hint.