15-Big Data

type
status
date
slug
summary
tags
category
icon
password
  1. 3 Vs
    1. volume
    2. variety
    3. velocity
  1. redshift
    1. data warehouse
    2. large relational db
    3. base postgreSQL
    4. info
      1. multi-az
      2. snapshots
      3. no conversions for az
    5. support 16pb data
    6. support s3
  1. EMR
    1. EC2
    2. elastic map reduce
    3. support hive spark 。。。
    4. ETL:
      1. extract
      2. transform
      3. load
    5. EMR storage
      1. HDFS
        1. Hadoop distributed file system
      2. EMR File System
        1. EMRFS
      3. local file system
    6. cluster and nodes
      1. primary node
      2. core node
      3. task node
    7. Architecture
  1. kinesis
    1. real time streaming data
    2. roles
      1. producers
      2. kinesis
      3. consumer
    3. kinesis data analytics
    4. vs sqs
      1. realtime ⇒ kinesis
      2. SQS - simpler
      3. Kinesis fassster and store data for up a year
    5. data streaming not auto scale, data firehose does
  1. amazon athena & aws glue
    1. athena: serverless sql solution
      1. query service
    2. glue: serverless data integration
      1. serverless ETL service
  1. Amazon QuickSight
    1. bi data visualization service
    2. column level security
    3. SPICE: in memory engine
    4. create a dashboard
  1. AWS Data Pipeline
    1. Extract Transform Load service
    2. automated workflows
    3. data driven
  1. Amazon Managed Streaming for Apache Kafka
    1. Amazon MSK
    2. manage data plane operations
  1. Amazon OpenSearch Service
    1. elastic search
16-Serverlessx-Exam Preperation