Inserting, updating, or both?

The UPSERT statement is a way of performing an UPDATE over an INSERT, allowing an INSERT statement to be automatically rewritten as an UPDATE statement under certain conditions. In particular, if the INSERT statement cannot be performed due to a conflict between data that already exists and data that is going to be inserted, the UPSERT statement enables you to override this conflict.

This technique is similar to what many Object Relational Mappers (ORMs) do when a generic persistence action is performed. The ORM examines the object it has to persist (the data), and if it finds out that an object has never been seen on the database, it executes an INSERT; otherwise, it performs an UPDATE. In this way, the client code (and therefore the developer) never has to consider which database actions have to be performed to persist the data. This is also the case when using the UPSERT statement.

It is important to note that in order for a database to decide if a regular INSERT statement can proceed, or if it has to be converted into an UPDATE, a conflict must exist—otherwise there is no chance of the database understanding if the data already exists. Typically, this conflict happens against a primary key or a unique constraint. This is, effectively, what many ORMs check: if an object already has a primary key, an UPDATE is required. Otherwise, a normal INSERT statement can be issued instead.

In order to better understand how an UPSERT statement works, insert the following data into the files table, as follows:

testdb=> INSERT INTO files( f_name, f_hash )
VALUES ( 'chapter1.org', 'f029d04a81c322f158c608596951c105' )
     , ( 'chapter2.org', '14b8f225d4e6462022657d7285bb77ba' );
Listing 1: Inserting two entries in the file table

Let's not say that another INSERT statement for the same file content has been issued. In this case, the database has to reject the incoming data because of the unique constraint defined over the f_hash column, as shown in the following listing:

testdb=> INSERT INTO files( f_name, f_hash ) 
VALUES ( 'chapter2-reviewed.org', '14b8f225d4e6462022657d7285bb77ba' );

ERROR: duplicate key value violates unique constraint "files_f_hash_key" DETAIL: Key (f_hash)=(14b8f225d4e6462022657d7285bb77ba) already exists.
Listing 2: Unique constraints in action

In order to allow the database to accept the incoming data and perform an UPDATE to resolve the conflict, we need to specify the ON CONFLICT predicate, as shown in the following:

testdb=> INSERT INTO files( f_name, f_hash ) 
VALUES ( 'chapter2-reviewed.org', '14b8f225d4e6462022657d7285bb77ba' ) ON CONFLICT ( f_hash ) DO UPDATE SET f_name = EXCLUDED.f_name;
Listing 3: UPSERT in a ction

When the query shown in the preceding Listing 3 is executed, the system finds a conflict over the unique constraint for the f_hash column. This time, however, there is an ON CONFLICT resolution strategy. The ON CONFLICT (f_hash) DO UPDATE predicate indicates that if a conflict arises from the f_hash column, the database should perform an UPDATE instead of INSERT. In the UPDATE part of the statement, the special alias EXCLUDED represents the tuple that has been rejected due to the conflict. In other words, in the code of Listing 3, the database will either perform a regular INSERT statement if the data is not already present, or it will update the f_name of the conflicting tuple.

It is worth noting that the UPDATE predicate can also include a WHERE condition in order to better filter the update to be performed. Since the excluded tuple is aliased as EXCLUDED, the current tuple must be referenced with the table name, as we can see in the following listing. In this case, only tuples with the f_type org can be updated when a conflicting INSERT statement is issued. Other types of files, such as f_type, will simply fail if a unique constraint violation occurs:

testdb=> INSERT INTO files( f_name, f_hash ) 
VALUES ( 'chapter2-reviewed.org', '14b8f225d4e6462022657d7285bb77ba' ) ON CONFLICT ( f_hash ) DO UPDATE SET f_name = EXCLUDED.f_name WHERE files.f_type = 'org';
Listing 4:  Filtering the update within an UPSERT

This is not the only strategy available, of course; it is also possible to simply do nothing at all. Note that doing nothing is different from the default behavior, which is failing on conflicts. When an INSERT statement is instructed to do nothing upon conflict, data is gracefully rejected, no error is reported, and our transaction will not abort. Everything works as if we had never executed the INSERT statement.

As shown in the following listing, all we need to do is to specify DO NOTHING within the ON CONFLICT part of the statement to ensure the conflicting tuples are gracefully thrown away:

testdb=> INSERT INTO files( f_name, f_hash )
VALUES ( 'chapter2-reviewed.org', '14b8f225d4e6462022657d7285bb77ba' ) ON CONFLICT ( f_hash ) DO NOTHING;
Listing 5:  Rejecting conflicting data without aborting the current transaction