Skip to content
MQ Blog
Github

Spark基本知识

Bigdata, Spark1 min read

架构

Standalone
Standalone
Mesos
Mesos
Yarn
Yarn
Kubernetes
Kubernetes
Spark Core API
Spark Core API
Java
Java
Scala
Scala
Python
Python
R
R
SQL
SQL
Spark
SQL
Spark...
Spark
Streaming
Spark...
Spark
MLlib
Spark...
Spark
GraphX
Spark...
Text is not SVG - cannot display

组件

Driver Program
Driver Program
SparkContext
SparkContext
Cluster Manager
Cluster Manager
Worker Node
Worker Node
Executor
Executor
Cache
Cache
Task
Task
Task
Task
Worker Node
Worker Node
Executor
Executor
Cache
Cache
Task
Task
Task
Task
Text is not SVG - cannot display

作业组成

Application
Application
Job 1
Job 1
Stage 1
Stage 1
Task Set
Task Set
Task 1
Task 1
……
……
Task N
Task N
Stage 2
Stage 2
……
……
Stage N
Stage N
Job 2
Job 2
……
……
Job N
Job N
Text is not SVG - cannot display

术语

术语含义
Application基于Spark构建的应用程序
Application jar应用程序Jar包,包含程序代码跟三方依赖,不包含Spark和Hadoop相关Jar
Driver ProgramDriver程序。创建SparkContext及main函数的程序
Cluster Manager集群管理。Spark通常会运行在Standalone、Yarn、K8S、Mesos等环境中
Deploy mode部署模式。Cluster模式Driver节点在集群中运行,Client模式Driver节点在集群外运行
Worker node执行业务代码的节点。Yarn模式就是NodeManager
Executor执行器。在Worker node上启动的进程,用来执行作业Task
TaskSpark作业的执行单元,在Executor执行
Job由Spark中Action算子(save、collect等)触发,一个Spark应用中可以包含一个或者多个Job
Stage一个Job由一个或者多个Stage组成,一个Stage包含一个或多个Task
© 2024 based on MQ Blog. All rights reserved.
Theme based on LekoArts