Recently released PostgreSQL version 15 contains a new command:
MERGE
. We can use MERGE
to do “upsert” operations, much like when providing
an ON CONFLICT
clause to INSERT
, but MERGE
can be used for other tasks as
well. There’s an interesting difference when doing upserts between the two
commands, and that’s what we’re going to discuss here.
Using INSERT
… ON CONFLICT
Suppose we have a table defined as (note the unique email constraint):
CREATE TABLE people (
id bigserial PRIMARY KEY,
email citext,
name text,
UNIQUE(email)
)
After the table is created, we can run the follow INSERT
with ON CONFLICT
4
times. The first time will insert a new record, and the other times will update
the name
since the email
conflicts with the existing record:
INSERT INTO people (email, name)
VALUES ('alex@email.com', 'Alex')
ON CONFLICT (email) DO UPDATE SET name = EXCLUDED.name
Then, we run the following query to insert another record:
INSERT INTO people (email, name)
VALUES ('billie@email.com', 'Billie')
ON CONFLICT (email) DO UPDATE SET name = EXCLUDED.name
Afterwards, we query the table for all of its records:
SELECT * FROM people
id | email | name
----+------------------+--------
1 | alex@email.com | Alex
5 | billie@email.com | Billie
As we can see, there’s a gap in the value of the IDs, and that’s because the
sequence used for the id
column is advanced with every INSERT
query even it
ends up doing an update via ON CONFLICT
. Because of that behavior, it’s
highly suggested to use bigint
/bigserial
for auto-incrementing columns when
you are going to upsert data into the table.
Using MERGE
Now, let’s see what happens if we use MERGE
instead. We’ll keep the table as
it was, and we’ll upsert that last record several times using a MERGE
query:
MERGE INTO people
USING (VALUES ('billie@email.com', 'Billie')) AS input(email, name)
ON people.email = input.email
WHEN NOT MATCHED THEN
INSERT (email, name) VALUES (input.email, input.name)
WHEN MATCHED THEN
UPDATE SET name = input.name
If we then use MERGE
to insert a new record:
MERGE INTO people
USING (VALUES ('charlie@email.com', 'Charlie')) AS input(email, name)
ON people.email = input.email
WHEN NOT MATCHED THEN
INSERT (email, name) VALUES (input.email, input.name)
WHEN MATCHED THEN
UPDATE SET name = input.name
And check all the records:
SELECT * FROM people
id | email | name
----+-------------------+---------
1 | alex@email.com | Alex
5 | billie@email.com | Billie
6 | charlie@email.com | Charlie
We now see that there’s no gap with the ID of the last two records, and that’s
because our sequence was not advanced when the MERGE
was used as an update
task. Granted, the MERGE
queries are more verbose than the INSERT
…ON
CONFLICT
equivalents, but I believe that verboseness also provides clarity
into the intentions of the query. I should also note that neither Ecto
nor ActiveRecord currently provide first class support for
MERGE
.