IBM: Big Data, Hadoop, and Spark Basics

IBM: Big Data, Hadoop, and Spark Basics

https://www.edx.org/learn/big-data/ibm-big-data-hadoop-and-spark-basics

學習目標

Explain the impact of Big Data, including use cases, tools, and processing methods.
解釋大數據的影響，包括用例、工具和處理方法。
Describe Apache Hadoop architecture, ecosystem, practices, and user-related applications, including Hive, HDFS, HBase, Spark, and MapReduce.
描述 Apache Hadoop 架構、生態系統、實踐和與使用者相關的應用程式，包括 Hive、HDFS、HBase、Spark 和 MapReduce。
Apply Spark programming basics, including parallel programming basics for DataFrames, data sets, and Spark SQL
應用Spark 程式設計基礎知識，包括 DataFrames、數據集和 Spark SQL 的並行程式設計基礎知識
Use Spark’s RDDs and data sets, optimize Spark SQL using Catalyst and Tungsten, and use Spark’s development and runtime environment options.
使用 Spark 的 RDD 和數據集，使用 Catalyst 和 Tungsten 優化 Spark SQL，並使用 Spark 的開發和運行時環境選項。

Syllabus 教學大綱

Module 1: What is Big Data?

Module Introduction and Learning Objectives

Module 2: Introduction to the Hadoop

Module Introduction and Learning Objectives

Module 3: Apache Spark

Module Introduction and Learning Objectives

Module 4: DataFrames and SparkSQL

Module Introduction and Learning Objectives

Module 5: Development and Runtime Environment options

Module Introduction and Learning Objectives

Apache Spark Architecture(Apache Spark 架構)
Overview of Apache Spark Cluster Modes(Apache Spark 集群模式概述)
How to Run an Apache Spark Application(如何運行 Apache Spark 應用程式)
Hands-on Lab: Submit Apache Spark Applications
Summary & Highlights: Spark Architecture
Practice Quiz: Spark Architecture
Overview of Spark Environments - Options about Spark Environment
(Spark 環境概述 - 有關 Spark 環境的選項)
Using Apache Spark on IBM Cloud(在IBM Cloud上使用Apache Spark)
How to set-up your own Spark Environment (Optional)
(如何設定您自己的 Spark 環境)
Setting Apache Spark Configuration(設置 Apache Spark 配置)
Running Spark on Kubernetes (在 Kubernetes 上運行 Spark)
Hands-on Lab: Apache Spark on Kubernetes
Summary & Highlights: Spark Runtime Environments
Practice Quiz: Spark Runtime Environments
Cheat Sheet: Development and Runtime Environment Options (開發和運行時環境選項)
Module 5 Glossary: Development and Runtime Environment Options
Graded Quiz: Development and Runtime Environment Options

Module 6: Monitoring & Tuning

Module Introduction and Learning Objectives

Module 7: Final Project and Assessment

Module Introduction and Learning Objectives

GRADING SCHEME

This section contains information for those earning a certificate. Those auditing the course can skip this section and click next.

The course contains 6 Graded Quizzes and 1 final exam.
The minimum passing mark for the course is 70%.
Permitted attempts are per question:
One attempt - For True/False questions
Two attempts - For any question other than True/False
There are no penalties for incorrect attempts.
Clicking the "Submit" button when it appears, means your submission is FINAL. You will NOT be able to resubmit your answer to that question again.
Check your grades in the course at any time by clicking on the "Progress" tab.

IBM's Data Engineering Professional Certificate

第25個冬天