IBM: Big Data, Hadoop, and Spark Basics

 IBM: Big Data, Hadoop, and Spark Basics

https://www.edx.org/learn/big-data/ibm-big-data-hadoop-and-spark-basics



學習目標
  • Explain the impact of Big Data, including use cases, tools, and processing methods.
    解釋大數據的影響,包括用例、工具和處理方法。

  • Describe Apache Hadoop architecture, ecosystem, practices, and user-related applications, including Hive, HDFS, HBase, Spark, and MapReduce.
    描述 Apache Hadoop 架構、生態系統、實踐和與使用者相關的應用程式,包括 Hive、HDFS、HBase、Spark 和 MapReduce。
  • Apply Spark programming basics, including parallel programming basics for DataFrames, data sets, and Spark SQL
    應用Spark 程式設計基礎知識,包括 DataFrames、數據集和 Spark SQL 的並行程式設計基礎知識

  • Use Spark’s RDDs and data sets, optimize Spark SQL using Catalyst and Tungsten, and use Spark’s development and runtime environment options.
    使用 Spark 的 RDD 和數據集,使用 Catalyst 和 Tungsten 優化 Spark SQL,並使用 Spark 的開發和運行時環境選項。


Syllabus  教學大綱

Module 1: What is Big Data?

Module Introduction and Learning Objectives
  • What is Big Data?
  • Impact of Big Data
  • Parallel Processing, Scaling, and Data Parallelism
  • Big Data Tools and Ecosystem
  • Open Source and Big Data
  • Beyond the Hype
  • Big Data Use Cases
  • Summary & Highlights: Introduction to Big Data
  • Practice Quiz: Introduction to Big Data
  • Module 1 Glossary: What is Big Data? 
  • Graded Quiz: What is Big Data? 


Module 2: Introduction to the Hadoop

Module Introduction and Learning Objectives
  • Introduction to Hadoop
  • Intro to MapReduce
  • Hadoop Ecosystem
  • HDFS
  • HIVE (蜂巢)
  • Hands-on Lab: Getting Started with Hive 
  • HBASE
  • Hands-on Lab: Hadoop MapReduce
  • Summary & Highlights: Introduction to Hadoop
  • Practice Quiz: Introduction to Hadoop
  • Cheat Sheet: Introduction to the Hadoop Ecosystem 
  • Module 2 Glossary: Introduction to the Hadoop Ecosystem 
  • Graded Quiz: Introduction to the Hadoop Ecosystem 


Module 3: Apache Spark

Module Introduction and Learning Objectives
  • Why use Apache Spark?
  • Functional Programming Basics (函數式程式設計基礎)
  • Parallel Programming using Resilient Distributed Datasets 
    (使用彈性分散式數據集進行並行程式設計)
  • Scale out / Data Parallelism in Apache Spark
    (Apache Spark 中的橫向擴展/數據並行性)
  • Dataframes and SparkSQL
  • Hands-on Lab: Getting Started with Spark using Python 
  • Summary & Highlights: Introduction to Apache Spark 
  • Practice Quiz: Introduction to Apache Spark
  • Cheat Sheet: Apache Spark 
  • Module 3 Glossary: Apache Spark  
  • Graded Quiz: Apache Spark  


Module 4: DataFrames and SparkSQL

Module Introduction and Learning Objectives
  • RDDs in Parallel Programming and Spark (並行程式設計和Spark中的 RDD)
  • Data-frames and Datasets (數據幀和數據集)
  • Catalyst and Tungsten (催化劑和鎢)
  • ETL with Data-frames(藉由數據幀實作 ETL)
  • Hands-on Lab: Introduction to Data-Frames 
  • Real-world usage of SparkSQL
  • Common Transformations and Optimization Techniques in Spark
    (Spark 中的常見轉換和優化技術)
  • Hands-on Lab: Introduction to SparkSQL
  • Summary & Highlights: Introduction to Data-Frames & SparkSQL 
  • Practice Quiz: Introduction to Data-Frames & SparkSQL
  • Cheat Sheet: Data-Frames & SparkSQL 
  • Module 4 Glossary: Data-Frames & SparkSQL
  • Graded Quiz: Data-Frames & SparkSQL


Module 5: Development and Runtime Environment options

Module Introduction and Learning Objectives
  • Apache Spark Architecture(Apache Spark 架構)
  • Overview of Apache Spark Cluster Modes(Apache Spark 集群模式概述)
  • How to Run an Apache Spark Application(如何運行 Apache Spark 應用程式)
  • Hands-on Lab: Submit Apache Spark Applications 
  • Summary & Highlights: Spark Architecture 
  • Practice Quiz: Spark Architecture
  • Overview of Spark Environments - Options about Spark Environment 
    (Spark 環境概述 - 有關 Spark 環境的選項)
  • Using Apache Spark on IBM Cloud(在IBM Cloud上使用Apache Spark)
  • How to set-up your own Spark Environment (Optional)
    (如何設定您自己的 Spark 環境)
  • Setting Apache Spark Configuration(設置 Apache Spark 配置)
  • Running Spark on Kubernetes (在 Kubernetes 上運行 Spark)
  • Hands-on Lab: Apache Spark on Kubernetes 
  • Summary & Highlights: Spark Runtime Environments 
  • Practice Quiz: Spark Runtime Environments
  • Cheat Sheet: Development and Runtime Environment Options (開發和運行時環境選項)
  • Module 5 Glossary: Development and Runtime Environment Options 
  • Graded Quiz: Development and Runtime Environment Options 


Module 6: Monitoring & Tuning

Module Introduction and Learning Objectives
  • The Apache Spark User Interface(Apache Spark 用戶介面)
  • Monitoring Application Progress(監控應用程式進度)
  • Debugging Apache Spark Application Issues(調試 Apache Spark 應用程式問題)
  • Understanding Memory resources(瞭解記憶體資源)
  • Understanding Processor resources(瞭解處理器資源)
  • Hands-on Lab: Monitoring and Performance tuning
  • Summary and Highlights: Introduction to Monitoring & Tuning
  • Practice Quiz: Introduction to Monitoring & Tuning
  • Cheat Sheet: Monitoring & Tuning
  • Module 6 Glossary: Monitoring and Tuning
  • Graded Quiz: Monitoring & Tuning


Module 7: Final Project and Assessment

Module Introduction and Learning Objectives
  • Final Project: Data Processing using Spark(最終專案:使用Spark進行數據處理)
  • Final Exam Instructions
  • Final Exam
  • Course Rating
  • Badges Frequently Asked Questions
  • Claim badge here
  • Introduction to Big Data with Spark and Hadoop Glossary
  • Congratulations and Next Steps
  • Team and Acknowledgements
  • Copyrights and Trademarks

GRADING SCHEME
This section contains information for those earning a certificate. Those auditing the course can skip this section and click next.

  1. The course contains 6 Graded Quizzes and 1 final exam. 
  2. The minimum passing mark for the course is 70%.
  3. Permitted attempts are per question:
    One attempt - For True/False questions
    Two attempts - For any question other than True/False
  4. There are no penalties for incorrect attempts.
  5. Clicking the "Submit" button when it appears, means your submission is FINAL.  You will NOT be able to resubmit your answer to that question again.
  6. Check your grades in the course at any time by clicking on the "Progress" tab.




IBM's Data Engineering Professional Certificate


留言

這個網誌中的熱門文章

何謂淨重(Net Weight)、皮重(Tare Weight)與毛重(Gross Weight)

經得起原始碼資安弱點掃描的程式設計習慣培養(五)_Missing HSTS Header

Architecture(架構) 和 Framework(框架) 有何不同?_軟體設計前的事前規劃的藍圖概念