Data Analysis

The data analysis

  • Batch processing:Analyzing existing data from the past, focusing on historical information. This involves making significant batch changes in the time dimension. (Performing one analysis per week or one analysis per day)
  • Real Time Processing, Streaming: for the moment, analyzing data generated in real time, divided into milliseconds and microseconds.
  • Predictive analytics, machine learning: predicting future events based on historical and real-time data, focusing on the application of mathematical algorithms such as classification, clustering, correlation, prediction
  • Ask a question
  • Obtain data: data from scratch, data transfer and handling (business data, log data, crawler data, open internet data)
  • Data Processing: Data Cleaning, Data Transformation, Data Extraction, Data Calculation to get clean and structured data
  • Data Analysis: PEST Analysis (Political, Economic, Social, Technological)
  • Data Presentation: Data Visualization
  • Report Writing

A collection of data that cannot be captured, managed, and processed within a certain timeframe using conventional software; a massive, high-growth, and diverse information asset that requires a new processing model in order to have stronger decision-making, insight discovery, and process optimization capabilities.

  • Volume 大量的
  • Variety 种类来源多样化
  • Value 价值密度低
  • Velocity 速度快
  • Veracity 真实度高
  • Distributed: multiple machines, each deploying different components (distributed storage, distributed computing)
  • Clustering: multiple machines, each deploying the same components

Data processing architecture

  • Transaction processing (OLTP) involves handling low amounts of data with a fast response time.
  • Analytical processing (OLAP) deals with high volumes of data but has a slower response time.
  • Lambda: two systems, both low latency and accurate results Good results, hard to iterate
  • Kappa is a system that offers both low latency and accurate results. It utilizes the new generation stream processor, Flink.