Incremental models & read_csv
dbt has the notion of “Incremental Materializations” - models that are handled in a different flow and require more explicit definition, and thus can be built incrementally. These models usually require a unique_key, if no key is provided, the model is treated as “append only”.
Furthermore, incremental models must define which pieces of the model run incrementally.
When invoked in normal dbt build or dbt run, incremental models will do the following:
- Insert new data into a temp table based on the defined increment.
- Delete any data from the existing model that matches the unique_key defined in the config block.
- Insert data from the temp table into the existing model.
This obviously means that changes to the schema of your model need to be carefully considered - new columns mean that the model must be rebuilt entirely. A rebuild of the model is called a “full refresh” in dbt can be invoked with the full-refresh flag in the CLI.
As described in the pre_hook, the variable my_list contains a list of files to process, and the config block also contains the relevant information for the model type and key.
select
info.symbol || '-' || info.filename as id,
info.*,
files.modified_ts,
now() at time zone 'UTC' as updated_ts
from read_csv(getvariable('my_list'), filename = true, union_by_name = true) as info
left join {{ ref("files") }} as files on info.filename = files.file
{% if is_incremental() %}
where not exists (select 1 from {{ this }} ck where ck.filename = info.filename)
{% endif %}
This also introduces the concept of {{ this }}, which is a dbt relation and is a reference to the current model.