Broken Update With Join

Last month I wrote a post titled Broken Left Join which covered an easy to make mistake that I see fairly regularly when reviewing code. This month I’m covering UPDATE statements that run into trouble because of JOINs.

Updates with joins are not covered by the ANSI standard. To see the problems that can occur with joins in update statements, let’s setup some data to run a test.

IF OBJECT_ID(N'dbo.Child', N'U') IS NOT NULL
DROP TABLE dbo.Child;
GO
IF OBJECT_ID(N'dbo.Parent', N'U') IS NOT NULL
DROP TABLE dbo.Parent;
GO
CREATE TABLE dbo.Parent
(
 ParentID INT NOT NULL IDENTITY(1,1),
 Name VARCHAR(100) NOT NULL,
 FavoriteChild VARCHAR(100) NULL
);
GO
ALTER TABLE dbo.Parent ADD CONSTRAINT PK_ParentID PRIMARY KEY (ParentID);
GO
CREATE TABLE dbo.Child
(
 ChildID INT NOT NULL IDENTITY(1,1),
 ParentID INT NOT NULL,
 Name VARCHAR(100),
 DateOfBirth DATE 
);
GO
ALTER TABLE dbo.Child ADD CONSTRAINT PK_ChildID PRIMARY KEY (ChildID);
GO
ALTER TABLE dbo.Child ADD CONSTRAINT 
 FK_Parent_ParentID FOREIGN KEY (ParentID) 
 REFERENCES dbo.Parent(ParentID);
GO
CREATE NONCLUSTERED INDEX IX_ParentID ON dbo.Child(ParentID, Name);
GO
INSERT INTO dbo.Parent (Name)
VALUES ('Homer'),('Anakin'),('Peter');
INSERT INTO dbo.Child (ParentID, Name, DateOfBirth)
VALUES (1,'Bart', '19900121'),(1,'Lisa', '19920503'),(1,'Maggie', '19950717'),
 (2,'Luke', '19700101'),(2,'Leia', '19720518'),
 (3,'Meg', '20001023'),(3,'Chris', '19980912'),(3,'Stewie', '20061217');
GO

I’ve created two tables named Parent and Child with a foreign key in the Child table that references that ParentID in the Parent table. Selecting from these tables produces the data below.

Parnet Child Hierarchy - Initial values

Next I’ll execute an UPDATE to set each parents favorite child. I’ll do this by joining to the child table in the update statement.

When updating with a join it is only possible to update values in one of the tables. If you try to update more than one, you will see the following:

Update join updating multiple tables throws error

The statement below runs fine as it’s only updating values in one of the tables.

UPDATE p
SET p.FavoriteChild = c.Name
FROM Parent p
JOIN Child c ON p.ParentID = c.ParentID;

Now which child do you think will end up being the favorite for each parent?

To give you a clue, I’ll show you the execution plan.

Execution plan with any aggregate

 

What’s that aggregate operator doing there? We’re not aggregating any values!

Let’s take a closer look at the Defined Values property in the properties panel for that stream aggregate.

Any scalar operator defined value

That’s right; SQL Server will just pick any child. It will not update the parent row for each child. When the MERGE statement runs into this problem, it raises an error and rolls back the UPDATE but our query will silently pick any value and move on. We don’t want to update data like this, we want our query to dictate the logic explicitly. This will require some changes to our query.

I prefer to write these types of updates with the use of a CTE. This allows us to easily highlight the SELECT in the CTE and execute it to see which rows will be updated (@variables can cause problems here though). Adding a COUNT can help to identify the problem in the previous query.

Highlight and execute the code inside the CTE below to check for duplicates. If this returns 0 rows then you are good to remove the COUNT, GROUP BY and HAVING then add the name columns before executing the whole statement.

WITH cte_UpdateJoin
AS
(
 SELECT 
   p.ParentID,
   COUNT(*) AS UpdateCheck
 FROM Parent p
 JOIN Child c ON p.ParentID = c.ParentID
 GROUP BY p.ParentID
 HAVING COUNT(*) > 1
)
UPDATE cte_UpdateJoin 
SET FavoriteChild = Name;

If the code in the CTE returned rows (like ours did) then we have a problem.

Update duplicate check results

If we have multiple child rows for a parent row, we need to filter all but one row or aggregate these child rows so that we only have one value to apply in the UPDATE. In this example, I’m going to make a sweeping generalisation that all parents prefer their first born. While this isn’t always true it does give us some logic that we can apply to our UPDATE statement.

WITH cte_UpdateJoin
AS
(
 SELECT 
   p.FavoriteChild, 
   c.Name, 
   DateOfBirth,
   MIN(DateOfBirth) OVER (PARTITION BY p.ParentID) AS MinDOB
 FROM Parent p
 JOIN Child c ON p.ParentID = c.ParentID
)
UPDATE cte_UpdateJoin 
SET FavoriteChild = Name
WHERE DateOfBirth = MinDOB;

The main advantage of using a CTE here is that we can use window functions in the CTE. These are not allowed in the SET clause. The window function in the CTE returns the earliest date of birth of all children for each parent. My WHERE clause then makes sure that I only use the name of the first born child.

Update via CTE after duplicate check results

This is a contrived example to illustrate the problem (twins would cause this to break) but I hope this highlights something you need to check when writing or reviewing UPDATE statements with JOINs.

SQL Server Update with JOIN
Ted, my first born

About the author

James Anderson

I'm a freelance SQL Server DBA in Southampton, UK. Performance tuning code and improving development processes are what I love to do for work. Outside of work, I live to travel to as many places as possible with my wife Sarah.

Sign Up

James Anderson SQL Server DBA

Hi, I'm James Anderson. When I'm not blogging about SQL Server, I do this.

The SQL People

DBA Stackexchange Profile