Loading Methods: Scheduled vs. On-Demand
Learn how to choose between the two loading strategies: scheduled and on-demand loading.
We'll cover the following
Besides deciding how to host and deploy the repository, we should also consider how the data will be loaded into it. Or, more specifically, what should trigger data loading into the repository?
There are two main strategies for loading data into a repository:
Scheduled loading
On-demand loading
Scheduled loading
Scheduled loading is a data loading strategy in which data is transferred from its source to the destination repository at a predetermined interval. This interval can be daily, weekly, monthly, or even yearly, depending on the needs of the business. This approach is often used when there’s a need to maintain a consistent and up-to-date view of the data in the data warehouse or data lake.
Employing this approach means that our ETL pipeline loads data in a predictable manner and according to a schedule, which in turn reduces the complexity of applying custom logic to run the pipeline. The downside of this approach is the high amount of resources used.
For example, a company’s sales department requests that its analytical dashboard be updated with fresh data every four hours. The problem is that the company’s data warehouse is updated once daily.
To facilitate their needs, we can create an ETL pipeline that loads the latest batch of data relevant only to their department, and store it in a custom data mart designed to assist the sales department. From there, the dashboard will pull data from the data mart and showcase the latest updated data.
The pipeline will run every four hours, independent of the main ETL pipeline that loads data to the company’s data warehouse.
On-demand loading
On-demand loading refers to a method of processing data where data is only loaded and processed when needed, rather than being loaded and processed in advance. This approach is designed to save time and resources by avoiding unnecessary processing, and it can be especially useful for organizations that don’t require a consistent and up-to-date view of the data.
There are several benefits to using on-demand loading in ETL pipelines. By applying a logic for loading data other than a simple time interval we can analyze and optimize the best metrics for loading data. We can implement an ETL pipeline that loads data only after a certain number of events have been collected, data reaches a certain size or a number of transactions have accumulated, all while fine-tuning it to our needs.
This leads to less data that needs to be processed and stored, improved performance, reduced risk of data loss, corruption, inconsistent data, etc
On-demand loading can come in various forms:
Trigger based loading: Where a user triggers the ETL pipeline/loading stage
Measure based loading: Where the process is triggered according to a measure, such as data size
Event based loading: Where the process is triggered because a new event has occurred, such as a new transaction or order
Get hands-on with 1200+ tech skills courses.