Basic concepts of data warehouse

Data Warehouse: A Data Warehouse is a data system used for storing, analyzing, and reporting. The purpose of a data warehouse is to build an integrated data environment for analysis, providing decision support for businesses.

The focus of a data warehouse is on analysis, and its results support decision-making in enterprises:

  • A data warehouse does not “produce” any data itself; it obtains data from various external systems.
  • Similarly, a data warehouse does not need to “consume” any data; its results are made available to external applications.
  • This is why it is called a “warehouse” rather than a “factory.”
  1. Theme Orientation: A theme is an abstract concept that represents the comprehensive classification, analysis, and utilization of data in enterprise information systems at a higher level. In logical terms, it corresponds to the analytical objects involved in a macro analysis domain within an enterprise.

  2. Integration: Data related to a theme is usually distributed across multiple operational systems, which are dispersed, independent, and heterogeneous. Therefore, before the data enters the data warehouse, it must undergo standardization and integration. This involves data extraction, cleansing, transformation, and aggregation. This step is the most critical and complex in data warehouse construction. The tasks to be completed include unifying all contradictions in the source data and performing data integration and calculations.

  3. Non-volatility and Non-volatility: A data warehouse is a platform for analyzing data, not for creating data. The data in the data warehouse reflects the content of historical data over a considerable period of time. While there are typically a large number of query operations in a data warehouse, modification and deletion operations are rare.

  4. Time-Variant: A data warehouse contains historical data at various levels of granularity, which may be associated with specific dates, weeks, months, quarters, or years. It loses its timeliness when business changes occur. Therefore, the data in the data warehouse needs to be updated over time to meet the needs of decision-making. From this perspective, data warehouse construction is not only a project but also a process.

Structured Query Language (SQL) is a database query and programming language used for accessing data, as well as querying, updating, and managing data.

Structured data, also known as row data, is data that is logically expressed and implemented using a two-dimensional table structure. It strictly follows data format and length specifications and is primarily stored and managed through relational databases.

SQL syntax can be primarily classified into two parts: Data Definition Language (DDL) and Data Manipulation Language (DML).

  • DDL:

    • CREATE DATABASE: Creates a new database.
    • CREATE TABLE: Creates a new table.
  • DML:

    • SELECT: Retrieves data from a database table.
    • UPDATE: Updates data in a database table.
    • DELETE: Deletes data from a database table.
    • INSERT: Inserts data into a database table.