Data warehousing is a process in which data is collected from different resources and integrated into one comprehensive database. It is used mainly for data analysis and performing queries over the collected data.
The architecture of a data warehouse is as follows:
Data warehousing is not a single-designed infrastructure. It has certain stages which ensure that it is well maintained.
The general stages of data warehousing are as follows:
Offline database
Offline data warehouse
Real-time data warehouse
Integrated data warehouse
At this stage, data is transferred from the daily operational systems to an external server for backup. The data does not disturb ongoing processes such as loading and reporting etc.
At this stage, the data is not guaranteed to be always up to date. Data is updated frequently (weekly, monthly, etc.) from the operational database.
At this stage, data warehouses are updated every time a transaction happens in the operational database. Data is collected using event-based triggers, which notify the data warehouse to update its records. An example could be a reservation for an airline ticket.
At this stage, data warehouses are continuously updated when the operational systems perform any transaction. They also pass back to the operational systems to provide the latest data and prevent disturbance in the data collection. Data in this stage is most secure and updated. Therefore, this stage is considered the most reliable.
Data warehousing provides the speed and power for accessing data. Therefore, it provides a higher query performance without compromising data quality and security. Moreover, it gives corporate decision-makers a competitive edge in their business market by guiding them in their business strategies.