IBM: Big Data, Hadoop, and Spark Basics
IBM: Big Data, Hadoop, and Spark Basics
https://www.edx.org/learn/big-data/ibm-big-data-hadoop-and-spark-basics
學習目標
- Explain the impact of Big Data, including use cases, tools, and processing methods.
解釋大數據的影響,包括用例、工具和處理方法。 - Describe Apache Hadoop architecture, ecosystem, practices, and user-related applications, including Hive, HDFS, HBase, Spark, and MapReduce.
描述 Apache Hadoop 架構、生態系統、實踐和與使用者相關的應用程式,包括 Hive、HDFS、HBase、Spark 和 MapReduce。 - Apply Spark programming basics, including parallel programming basics for DataFrames, data sets, and Spark SQL
應用Spark 程式設計基礎知識,包括 DataFrames、數據集和 Spark SQL 的並行程式設計基礎知識 - Use Spark’s RDDs and data sets, optimize Spark SQL using Catalyst and Tungsten, and use Spark’s development and runtime environment options.
使用 Spark 的 RDD 和數據集,使用 Catalyst 和 Tungsten 優化 Spark SQL,並使用 Spark 的開發和運行時環境選項。
Syllabus 教學大綱
Module 1: What is Big Data?
Module Introduction and Learning Objectives
- What is Big Data?
- Impact of Big Data
- Parallel Processing, Scaling, and Data Parallelism
- Big Data Tools and Ecosystem
- Open Source and Big Data
- Beyond the Hype
- Big Data Use Cases
- Summary & Highlights: Introduction to Big Data
- Practice Quiz: Introduction to Big Data
- Module 1 Glossary: What is Big Data?
- Graded Quiz: What is Big Data?
Module 2: Introduction to the Hadoop
Module Introduction and Learning Objectives
- Introduction to Hadoop
- Intro to MapReduce
- Hadoop Ecosystem
- HDFS
- HIVE (蜂巢)
- Hands-on Lab: Getting Started with Hive
- HBASE
- Hands-on Lab: Hadoop MapReduce
- Summary & Highlights: Introduction to Hadoop
- Practice Quiz: Introduction to Hadoop
- Cheat Sheet: Introduction to the Hadoop Ecosystem
- Module 2 Glossary: Introduction to the Hadoop Ecosystem
- Graded Quiz: Introduction to the Hadoop Ecosystem
Module 3: Apache Spark
Module Introduction and Learning Objectives
- Why use Apache Spark?
- Functional Programming Basics (函數式程式設計基礎)
- Parallel Programming using Resilient Distributed Datasets
(使用彈性分散式數據集進行並行程式設計) - Scale out / Data Parallelism in Apache Spark
(Apache Spark 中的橫向擴展/數據並行性) - Dataframes and SparkSQL
- Hands-on Lab: Getting Started with Spark using Python
- Summary & Highlights: Introduction to Apache Spark
- Practice Quiz: Introduction to Apache Spark
- Cheat Sheet: Apache Spark
- Module 3 Glossary: Apache Spark
- Graded Quiz: Apache Spark
Module 4: DataFrames and SparkSQL
Module Introduction and Learning Objectives
- RDDs in Parallel Programming and Spark (並行程式設計和Spark中的 RDD)
- Data-frames and Datasets (數據幀和數據集)
- Catalyst and Tungsten (催化劑和鎢)
- ETL with Data-frames(藉由數據幀實作 ETL)
- Hands-on Lab: Introduction to Data-Frames
- Real-world usage of SparkSQL
- Common Transformations and Optimization Techniques in Spark
(Spark 中的常見轉換和優化技術) - Hands-on Lab: Introduction to SparkSQL
- Summary & Highlights: Introduction to Data-Frames & SparkSQL
- Practice Quiz: Introduction to Data-Frames & SparkSQL
- Cheat Sheet: Data-Frames & SparkSQL
- Module 4 Glossary: Data-Frames & SparkSQL
- Graded Quiz: Data-Frames & SparkSQL
Module 5: Development and Runtime Environment options
Module Introduction and Learning Objectives
- Apache Spark Architecture(Apache Spark 架構)
- Overview of Apache Spark Cluster Modes(Apache Spark 集群模式概述)
- How to Run an Apache Spark Application(如何運行 Apache Spark 應用程式)
- Hands-on Lab: Submit Apache Spark Applications
- Summary & Highlights: Spark Architecture
- Practice Quiz: Spark Architecture
- Overview of Spark Environments - Options about Spark Environment
(Spark 環境概述 - 有關 Spark 環境的選項) - Using Apache Spark on IBM Cloud(在IBM Cloud上使用Apache Spark)
- How to set-up your own Spark Environment (Optional)
(如何設定您自己的 Spark 環境) - Setting Apache Spark Configuration(設置 Apache Spark 配置)
- Running Spark on Kubernetes (在 Kubernetes 上運行 Spark)
- Hands-on Lab: Apache Spark on Kubernetes
- Summary & Highlights: Spark Runtime Environments
- Practice Quiz: Spark Runtime Environments
- Cheat Sheet: Development and Runtime Environment Options (開發和運行時環境選項)
- Module 5 Glossary: Development and Runtime Environment Options
- Graded Quiz: Development and Runtime Environment Options
Module 6: Monitoring & Tuning
Module Introduction and Learning Objectives
- The Apache Spark User Interface(Apache Spark 用戶介面)
- Monitoring Application Progress(監控應用程式進度)
- Debugging Apache Spark Application Issues(調試 Apache Spark 應用程式問題)
- Understanding Memory resources(瞭解記憶體資源)
- Understanding Processor resources(瞭解處理器資源)
- Hands-on Lab: Monitoring and Performance tuning
- Summary and Highlights: Introduction to Monitoring & Tuning
- Practice Quiz: Introduction to Monitoring & Tuning
- Cheat Sheet: Monitoring & Tuning
- Module 6 Glossary: Monitoring and Tuning
- Graded Quiz: Monitoring & Tuning
Module 7: Final Project and Assessment
Module Introduction and Learning Objectives
- Final Project: Data Processing using Spark(最終專案:使用Spark進行數據處理)
- Final Exam Instructions
- Final Exam
- Course Rating
- Badges Frequently Asked Questions
- Claim badge here
- Introduction to Big Data with Spark and Hadoop Glossary
- Congratulations and Next Steps
- Team and Acknowledgements
- Copyrights and Trademarks
GRADING SCHEME
This section contains information for those earning a certificate. Those auditing the course can skip this section and click next.
- The course contains 6 Graded Quizzes and 1 final exam.
- The minimum passing mark for the course is 70%.
- Permitted attempts are per question:
One attempt - For True/False questions
Two attempts - For any question other than True/False - There are no penalties for incorrect attempts.
- Clicking the "Submit" button when it appears, means your submission is FINAL. You will NOT be able to resubmit your answer to that question again.
- Check your grades in the course at any time by clicking on the "Progress" tab.
IBM's Data Engineering Professional Certificate
留言
張貼留言