1、spark 概述

  • Spark是一种基于内存的快速、通用、可扩展的大数据分析计算引擎
  • Spark Core中提供了Spark的最基础的与最核心的功能
  • Spark SQL是Spark用来操作结构化数据的组件
  • Spark Streaming是Spark平台上针对实时数据进行流式计算的API
  • Spark MLib 是Spark提供一个机器学习算法库
  • Spark GraphX 是spark面向图计算提供的框架与算法
  • Spark 3.0默认使用的scala编译版本为2.12

2、wordcount案例

  1. 使用idea创建一个maven项目

  2. 需要配置本地spark运行环境

  3. log4j.properties配置文件

log4j.rootCategory=ERROR, console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n
log4j.logger.org.apache.spark.repl.Main=WARN
log4j.logger.org.spark_project.jetty=WARN
log4j.logger.org.spark_project.jetty.util.component.AbstractLifeCycle=ERROR
log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO
log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO
log4j.logger.org.apache.parquet=ERROR
log4j.logger.parquet=ERROR
  1. 数据源
hadoop spark hive
hadoop hive sqoop
flume hbase sqoop
hive flume spark
hadoop spark hive
hadoop hive sqoop
flume hbase sqoop
hive flume spark
  1. 实现代码
import org.apache.spark.rdd.RDD
import org.apache.spark.{SparkConf, SparkContext}

object Test {
    def main(args: Array[String]): Unit = {
        val conf: SparkConf = new SparkConf().setMaster("local[*]").setAppName("WordCount")
        val context = new SparkContext(conf)
        // 读取文件,读取目录下所有文件
        val values: RDD[String] = context.textFile("data")
        val words: RDD[String] = values.flatMap(_.split(" "))
        //根据单词进行分组
        val word_group: RDD[(String, Iterable[String])] = words.groupBy(word => word)
        val resoult: RDD[(String, Int)] = word_group.map {
            case (word, list) => {
                (word, list.size)
            }
        }
        //将数据结果集输出到控制台
        resoult.foreach(println)
        //    关闭连接
        context.stop()
    }
}
  1. 代码结果
(spark,4)
(hadoop,4)
(flume,4)
(hbase,2)
(hive,6)
(sqoop,4)

3、hadoop2.7伪分布式部署

3.1 安装java

3.1.1 解压jdk

[root@master ~]#tar -xzvf /opt/software/jdk-8u201-linux-x64.tar.gz -C /usr/local/src/

3.1.2 重命名为java

[root@master ~]#mv /usr/local/src/jdk1.8.0_201 /usr/local/src/java

3.1.3 配置环境变量

最后一行插入:

[root@master ~]# vi /etc/profile

export JAVA_HOME=/usr/local/src/java
export PATH=$PATH:$JAVA_HOME/bin
3.1.4 加载环境变量文件

[root@master ~]# source /etc/profile

3.1.5 查看jps进程

[root@master ~]# jps

2714 Jps

3.2 安装hadoop

3.2.1 解压hadoop

[root@master ~]# tar -xzvf /opt/software/hadoop-2.7.1.tar.gz -C /usr/local/src/

3.2.2 重命名为hadoop

[root@master ~]# mv /usr/local/src/hadoop-2.7.1/ /usr/local/src/hadoop

配置hadoop环境变

最后一行插入:

[root@master ~]# vi /etc/profile

export HADOOP_HOME=/usr/local/src/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
3.2.3 加载环境变量文件

[root@master ~]# source /etc/profile

3.2.4 查看hadoop的版本

[root@master ~]# hadoop version

Hadoop 2.7.1
Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r 15ecc87ccf4a0228f35af08fc56de536e6ce657a
Compiled by jenkins on 2015-06-29T06:04Z
Compiled with protoc 2.5.0
From source with checksum fc0a1a23fc1868e4d5ee7fa2b28a58a
This command was run using /usr/local/src/hadoop/share/hadoop/common/hadoop-common-2.7.1.jar
3.2.5 配置core-site.xml

[root@master ~]# vi /usr/local/src/hadoop/etc/hadoop/core-site.xml

<configuration>
  <property>
    <!--namenode的URL地址(必须写)-->
    <name>fs.defaultFS</name>
    <value>hdfs://master:9000</value>
  </property>
  <property>
    <!--SequenceFiles中使用的读/写缓冲区的大小,单位为KB,131072KB默认为64M(该配置可选)-->
    <name>io.file.buffer.size</name>
    <value>131072</value>
  </property>
</configuration>
3.2.6 配置hdfs-site.xml

[root@master ~]# vi /usr/local/src/hadoop/etc/hadoop/hdfs-site.xml

<configuration>
  <property>
    <!--hadoop的副本数量,默认为3(必须写)-->
    <name>dfs.replication</name>
    <value>1</value>
  </property>
  <property>
    <!--在本地文件系统所在的NameNode的存储空间和持续化处理日志(必须写)-->
    <name>dfs.namenode.name.dir</name>
    <value>/usr/local/src/hadoop/dfs/name</value>
  </property>
  <property>
    <!--在本地文件系统所在的DataNode的存储空间和持续化处理日志(必须写)-->
    <name>dfs.datanode.data.dir</name>
    <value>/usr/local/src/hadoop/dfs/data</value>
  </property>
</configuration>
3.2.7 配置mapred-site.xml

[root@master ~]# cp /usr/local/src/hadoop/etc/hadoop/mapred-site.xml.template /usr/local/src/hadoop/etc/hadoop/mapred-site.xml
[root@master ~]# vi /usr/local/src/hadoop/etc/hadoop/mapred-site.xml

<configuration>
  <property>
    <!--指定MapReduce的计算框架-->
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
  </property>
  <property>
    <!--指定job的历史服务器-->
    <name>mapreduce.jobhistory.address</name>
    <value>master:10020</value>
  </property>
  <property>
    <!--指定日志服务器的web访问端口-->
    <name>mapreduce.jobhistory.webapp.address</name>
    <value>master:19888</value>
  </property>
</configuration>
3.2.8 配置yarn-site.xml

[root@master ~]# vi /usr/local/src/hadoop/etc/hadoop/yarn-site.xml

<configuration>
  <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
  </property>
  <property>  
    <name>yarn.resourcemanager.address</name>  
    <value>master:8032</value>  
  </property> 
  <property>
    <name>yarn.resourcemanager.scheduler.address</name>
    <value>master:8030</value>
  </property>
  <property>
    <name>yarn.resourcemanager.resource-tracker.address</name>
    <value>master:8031</value>
  </property>
</configuration>
3.2.9 配置slaves和hadoop-env.sh

[root@master ~]#vi /usr/local/src/hadoop/etc/hadoop/slaves

master

[root@master ~]# vim /usr/local/src/hadoop/etc/hadoop/hadoop-env.sh

export JAVA_HOME=/usr/local/src/java
3.2.10 namenode进行格式化

ssh已经配置好,主机名已经修改完成!

[root@master ~]# ping master -c 2

PING master (192.168.222.133) 56(84) bytes of data.
64 bytes from master (192.168.222.133): icmp_seq=1 ttl=64 time=0.018 ms
64 bytes from master (192.168.222.133): icmp_seq=2 ttl=64 time=0.032 ms

--- master ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1007ms
rtt min/avg/max/mdev = 0.018/0.025/0.032/0.007 ms

[root@master ~]# hdfs namenode -format

/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = master/192.168.222.133
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 2.7.1
STARTUP_MSG:   classpath = /usr.......
......
21/09/02 02:56:39 INFO namenode.FSImage: Allocated new BlockPoolId: BP-140091637-192.168.222.133-1630522599042
21/09/02 02:56:39 INFO common.Storage: Storage directory /usr/local/src/hadoop/dfs/name has been successfully formatted.
21/09/02 02:56:39 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
21/09/02 02:56:39 INFO util.ExitUtil: Exiting with status 0
21/09/02 02:56:39 INFO namenode.NameNode: SHUTDOWN_MSG: 
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at master/192.168.222.133
************************************************************/
3.2.11 启动hadoop集群

[root@master ~]# start-all.sh

This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [master]
master: starting namenode, logging to /usr/local/src/hadoop/logs/hadoop-root-namenode-master.out
master: starting datanode, logging to /usr/local/src/hadoop/logs/hadoop-root-datanode-master.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /usr/local/src/hadoop/logs/hadoop-root-secondarynamenode-master.out
starting yarn daemons
starting resourcemanager, logging to /usr/local/src/hadoop/logs/yarn-root-resourcemanager-master.out
master: starting nodemanager, logging to /usr/local/src/hadoop/logs/yarn-root-nodemanager-master.out

[root@master ~]# jps

5376 SecondaryNameNode
5219 DataNode
5651 NodeManager
5526 ResourceManager
5098 NameNode
5930 Jps
3.2.12 访问hadoop集群

如果访问不了,请查看防火墙是否关闭

在浏览器输入: http://master:50070
在这里插入图片描述
在浏览器输入: http://master:8088
在这里插入图片描述
运行pi验证

[root@master ~]# hadoop jar /usr/local/src/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar pi 10 10

Number of Maps  = 10
Samples per Map = 10
Wrote input for Map #0
Wrote input for Map #1
Wrote input for Map #2
Wrote input for Map #3
Wrote input for Map #4
Wrote input for Map #5
Wrote input for Map #6
Wrote input for Map #7
Wrote input for Map #8
Wrote input for Map #9
Starting Job
21/09/02 03:10:25 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.222.133:8032
21/09/02 03:10:26 INFO input.FileInputFormat: Total input paths to process : 10
21/09/02 03:10:26 INFO mapreduce.JobSubmitter: number of splits:10
21/09/02 03:10:26 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1630523022106_0001
21/09/02 03:10:27 INFO impl.YarnClientImpl: Submitted application application_1630523022106_0001
21/09/02 03:10:27 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1630523022106_0001/
21/09/02 03:10:27 INFO mapreduce.Job: Running job: job_1630523022106_0001
21/09/02 03:10:34 INFO mapreduce.Job: Job job_1630523022106_0001 running in uber mode : false
21/09/02 03:10:34 INFO mapreduce.Job:  map 0% reduce 0%
21/09/02 03:10:53 INFO mapreduce.Job:  map 60% reduce 0%
21/09/02 03:11:06 INFO mapreduce.Job:  map 80% reduce 0%
21/09/02 03:11:07 INFO mapreduce.Job:  map 100% reduce 0%
21/09/02 03:11:08 INFO mapreduce.Job:  map 100% reduce 100%
21/09/02 03:11:08 INFO mapreduce.Job: Job job_1630523022106_0001 completed successfully
21/09/02 03:11:08 INFO mapreduce.Job: Counters: 49
	File System Counters
		FILE: Number of bytes read=226
		FILE: Number of bytes written=1272194
		FILE: Number of read operations=0
		FILE: Number of large read operations=0
		FILE: Number of write operations=0
		HDFS: Number of bytes read=2610
		HDFS: Number of bytes written=215
		HDFS: Number of read operations=43
		HDFS: Number of large read operations=0
		HDFS: Number of write operations=3
	Job Counters 
		Launched map tasks=10
		Launched reduce tasks=1
		Data-local map tasks=10
		Total time spent by all maps in occupied slots (ms)=148040
		Total time spent by all reduces in occupied slots (ms)=12304
		Total time spent by all map tasks (ms)=148040
		Total time spent by all reduce tasks (ms)=12304
		Total vcore-seconds taken by all map tasks=148040
		Total vcore-seconds taken by all reduce tasks=12304
		Total megabyte-seconds taken by all map tasks=151592960
		Total megabyte-seconds taken by all reduce tasks=12599296
	Map-Reduce Framework
		Map input records=10
		Map output records=20
		Map output bytes=180
		Map output materialized bytes=280
		Input split bytes=1430
		Combine input records=0
		Combine output records=0
		Reduce input groups=2
		Reduce shuffle bytes=280
		Reduce input records=20
		Reduce output records=0
		Spilled Records=40
		Shuffled Maps =10
		Failed Shuffles=0
		Merged Map outputs=10
		GC time elapsed (ms)=1712
		CPU time spent (ms)=3930
		Physical memory (bytes) snapshot=2006552576
		Virtual memory (bytes) snapshot=22826283008
		Total committed heap usage (bytes)=1383833600
	Shuffle Errors
		BAD_ID=0
		CONNECTION=0
		IO_ERROR=0
		WRONG_LENGTH=0
		WRONG_MAP=0
		WRONG_REDUCE=0
	File Input Format Counters 
		Bytes Read=1180
	File Output Format Counters 
		Bytes Written=97
Job Finished in 43.243 seconds
Estimated value of Pi is 3.20000000000000000000

3.3 安装scala

3.3.1 解压scala

[root@master ~]#tar -xzvf /opt/software/scala-2.12.11.tgz -C /usr/local/src/

3.3.2 重名为scala

[root@master ~]# mv /usr/local/src/scala-2.12.11 /usr/local/src/scala

3.3.3 配置环境变量并生效

[root@master ~]# vi /etc/profile

export SCALA_HOME=/usr/local/src/scala
export PATH=$PATH:$SCALA_HOME/bin

[root@master ~]# source /etc/profile

3.3.4 进入到scala shell当中

[root@master ~]# scala

Welcome to Scala 2.12.11 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_201).
Type in expressions for evaluation. Or try :help.

scala> 

3.4 spark 伪分布式

3.4.1 解压spark并重命名

[root@master ~]# tar -xzvf /opt/software/spark-3.0.3-bin-hadoop2.7.tgz -C /usr/local/src/

[root@master ~]# mv /usr/local/src/spark-3.0.3-bin-hadoop2.7 /usr/local/src/spark

3.4.2 配置环境变量并生效

[root@master ~]# vi /etc/profile

export SPARK_HOME=/usr/local/src/spark
export PATH=$PATH:$SPARK_HOME/bin

[root@master ~]# source /etc/profile

3.4.3 进入spark shell

[root@master ~]# spark-shell

1/09/02 03:24:15 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Spark context Web UI available at http://master:4040
Spark context available as 'sc' (master = local[*], app id = local-1630524263370).
Spark session available as 'spark'.
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /___/ .__/\_,_/_/ /_/\_\   version 3.0.3
      /_/
         
Using Scala version 2.12.10 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_201)
Type in expressions to have them evaluated.
Type :help for more information.

scala> 

3.4.4 配置spark-env.sh

[root@master ~]#cp /usr/local/src/spark/conf/spark-env.sh.template /usr/local/src/spark/conf/spark-env.sh

[root@master ~]# vi /usr/local/src/spark/conf/spark-env.sh

# java位置
export JAVA_HOME=/usr/local/src/java
# master节点IP或域名
export SPARK_MASTER_IP=master
# worker内存大小
export SPARK_WORKER_MEMORY=1G
# Worker的cpu核数
SPARK_WORKER_CORES=1
# hadoop配置文件路径
export HADOOP_CONF_DIR=/usr/local/src/hadoop/etc/hadoop
# 日志服务器
export SPARK_HISTORY_OPTS="
-Dspark.history.ui.port=18080
-Dspark.history.fs.logDirectory=hdfs://master:9000/spark/log
-Dspark.history.retainedApplications=30
"
3.4.5 配置slaves

[root@master ~]#cp /usr/local/src/spark/conf/slaves.template /usr/local/src/spark/conf/slaves

[root@master ~]# vi /usr/local/src/spark/conf/slaves

master
3.5.6 配置历史服务器

[root@master ~]# cp /usr/local/src/spark/conf/spark-defaults.conf.template /usr/local/src/spark/conf/spark-defaults.conf

# 启用日志服务器
spark.eventLog.enabled true
spark.eventLog.dir hdfs://master:9000/spark/log
3.4.7 启动spark

hadoop必须是启动状态

[root@master ~]# /usr/local/src/spark/sbin/start-all.sh

starting org.apache.spark.deploy.master.Master, logging to /usr/local/src/spark/logs/spark-root-org.apache.spark.deploy.master.Master-1-master.out
master: starting org.apache.spark.deploy.worker.Worker, logging to /usr/local/src/spark/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-master.out

[root@master ~]#/usr/local/src/spark/sbin/start-history-server.sh

starting org.apache.spark.deploy.history.HistoryServer, logging to /usr/local/src/spark/logs/spark-root-org.apache.spark.deploy.history.HistoryServer-1-master.out

[root@master ~]# jps

5376 SecondaryNameNode
7280 Jps
7234 Worker
5219 DataNode
5651 NodeManager
5526 ResourceManager
5098 NameNode
7167 Master
7998 HistoryServer
3.4.8 访问spark

在浏览器上输入:http://192.168.222.133:8080
在这里插入图片描述
在这里插入图片描述

3.4.9 使用spark测试pi

[root@master ~]# /usr/local/src/spark/bin/spark-submit --class org.apache.spark.examples.SparkPi --master local[*] /usr/local/src/spark/examples/jars/spark-examples_2.12-3.0.3.jar

21/09/02 03:36:56 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
21/09/02 03:36:57 INFO spark.SparkContext: Running Spark version 3.0.3
21/09/02 03:36:57 INFO resource.ResourceUtils: ==============================================================
21/09/02 03:36:57 INFO resource.ResourceUtils: Resources for spark.driver:

21/09/02 03:36:57 INFO resource.ResourceUtils: ==============================================================
21/09/02 03:36:57 INFO spark.SparkContext: Submitted application: Spark Pi
21/09/02 03:36:57 INFO spark.SecurityManager: Changing view acls to: root
21/09/02 03:36:57 INFO spark.SecurityManager: Changing modify acls to: root
21/09/02 03:36:57 INFO spark.SecurityManager: Changing view acls groups to: 
21/09/02 03:36:57 INFO spark.SecurityManager: Changing modify acls groups to: 
21/09/02 03:36:57 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users  with view permissions: Set(root); groups with view permissions: Set(); users  with modify permissions: Set(root); groups with modify permissions: Set()
21/09/02 03:36:57 INFO util.Utils: Successfully started service 'sparkDriver' on port 34812.
21/09/02 03:36:57 INFO spark.SparkEnv: Registering MapOutputTracker
21/09/02 03:36:57 INFO spark.SparkEnv: Registering BlockManagerMaster
21/09/02 03:36:57 INFO storage.BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
21/09/02 03:36:57 INFO storage.BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
21/09/02 03:36:57 INFO spark.SparkEnv: Registering BlockManagerMasterHeartbeat
21/09/02 03:36:57 INFO storage.DiskBlockManager: Created local directory at /tmp/blockmgr-07780364-6c77-4f49-b955-dc8cd697abdf
21/09/02 03:36:57 INFO memory.MemoryStore: MemoryStore started with capacity 413.9 MiB
21/09/02 03:36:57 INFO spark.SparkEnv: Registering OutputCommitCoordinator
21/09/02 03:36:57 INFO util.log: Logging initialized @2485ms to org.sparkproject.jetty.util.log.Slf4jLog
21/09/02 03:36:58 INFO server.Server: jetty-9.4.40.v20210413; built: 2021-04-13T20:42:42.668Z; git: b881a572662e1943a14ae12e7e1207989f218b74; jvm 1.8.0_201-b09
21/09/02 03:36:58 INFO server.Server: Started @2620ms
21/09/02 03:36:58 INFO server.AbstractConnector: Started ServerConnector@e362c57{HTTP/1.1, (http/1.1)}{0.0.0.0:4040}
21/09/02 03:36:58 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.
21/09/02 03:36:58 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@4fdf8f12{/jobs,null,AVAILABLE,@Spark}
21/09/02 03:36:58 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@26f3d90c{/jobs/json,null,AVAILABLE,@Spark}
21/09/02 03:36:58 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@515f4131{/jobs/job,null,AVAILABLE,@Spark}
21/09/02 03:36:58 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3f3ddbd9{/jobs/job/json,null,AVAILABLE,@Spark}
21/09/02 03:36:58 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@6c2d4cc6{/stages,null,AVAILABLE,@Spark}
21/09/02 03:36:58 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@6134ac4a{/stages/json,null,AVAILABLE,@Spark}
21/09/02 03:36:58 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@71b1a49c{/stages/stage,null,AVAILABLE,@Spark}
21/09/02 03:36:58 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@589b028e{/stages/stage/json,null,AVAILABLE,@Spark}
21/09/02 03:36:58 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@9fecdf1{/stages/pool,null,AVAILABLE,@Spark}
21/09/02 03:36:58 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3b0f7d9d{/stages/pool/json,null,AVAILABLE,@Spark}
21/09/02 03:36:58 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5c84624f{/storage,null,AVAILABLE,@Spark}
21/09/02 03:36:58 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@232024b9{/storage/json,null,AVAILABLE,@Spark}
21/09/02 03:36:58 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@2a415aa9{/storage/rdd,null,AVAILABLE,@Spark}
21/09/02 03:36:58 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@71ea1fda{/storage/rdd/json,null,AVAILABLE,@Spark}
21/09/02 03:36:58 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@420745d7{/environment,null,AVAILABLE,@Spark}
21/09/02 03:36:58 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5fa47fea{/environment/json,null,AVAILABLE,@Spark}
21/09/02 03:36:58 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5b43e173{/executors,null,AVAILABLE,@Spark}
21/09/02 03:36:58 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@545f80bf{/executors/json,null,AVAILABLE,@Spark}
21/09/02 03:36:58 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@22fa55b2{/executors/threadDump,null,AVAILABLE,@Spark}
21/09/02 03:36:58 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@6594402a{/executors/threadDump/json,null,AVAILABLE,@Spark}
21/09/02 03:36:58 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@405325cf{/static,null,AVAILABLE,@Spark}
21/09/02 03:36:58 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@44d70181{/,null,AVAILABLE,@Spark}
21/09/02 03:36:58 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@23c650a3{/api,null,AVAILABLE,@Spark}
21/09/02 03:36:58 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5fa05212{/jobs/job/kill,null,AVAILABLE,@Spark}
21/09/02 03:36:58 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5c09d180{/stages/stage/kill,null,AVAILABLE,@Spark}
21/09/02 03:36:58 INFO ui.SparkUI: Bound SparkUI to 0.0.0.0, and started at http://master:4040
21/09/02 03:36:58 INFO spark.SparkContext: Added JAR file:/usr/local/src/spark/examples/jars/spark-examples_2.12-3.0.3.jar at spark://master:34812/jars/spark-examples_2.12-3.0.3.jar with timestamp 1630525017145
21/09/02 03:36:58 INFO executor.Executor: Starting executor ID driver on host master
21/09/02 03:36:58 INFO executor.Executor: Fetching spark://master:34812/jars/spark-examples_2.12-3.0.3.jar with timestamp 1630525017145
21/09/02 03:36:58 INFO client.TransportClientFactory: Successfully created connection to master/192.168.222.133:34812 after 58 ms (0 ms spent in bootstraps)
21/09/02 03:36:58 INFO util.Utils: Fetching spark://master:34812/jars/spark-examples_2.12-3.0.3.jar to /tmp/spark-25cf5769-586c-430b-9622-fee47d0f21e7/userFiles-f2964ecd-e2b5-494f-9468-1de07a5f4b31/fetchFileTemp192687052862215240.tmp
21/09/02 03:36:58 INFO executor.Executor: Adding file:/tmp/spark-25cf5769-586c-430b-9622-fee47d0f21e7/userFiles-f2964ecd-e2b5-494f-9468-1de07a5f4b31/spark-examples_2.12-3.0.3.jar to class loader
21/09/02 03:36:58 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 56312.
21/09/02 03:36:58 INFO netty.NettyBlockTransferService: Server created on master:56312
21/09/02 03:36:58 INFO storage.BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
21/09/02 03:36:58 INFO storage.BlockManagerMaster: Registering BlockManager BlockManagerId(driver, master, 56312, None)
21/09/02 03:36:58 INFO storage.BlockManagerMasterEndpoint: Registering block manager master:56312 with 413.9 MiB RAM, BlockManagerId(driver, master, 56312, None)
21/09/02 03:36:58 INFO storage.BlockManagerMaster: Registered BlockManager BlockManagerId(driver, master, 56312, None)
21/09/02 03:36:58 INFO storage.BlockManager: Initialized BlockManager: BlockManagerId(driver, master, 56312, None)
21/09/02 03:36:58 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@6b760460{/metrics/json,null,AVAILABLE,@Spark}
21/09/02 03:36:59 INFO spark.SparkContext: Starting job: reduce at SparkPi.scala:38
21/09/02 03:36:59 INFO scheduler.DAGScheduler: Got job 0 (reduce at SparkPi.scala:38) with 2 output partitions
21/09/02 03:36:59 INFO scheduler.DAGScheduler: Final stage: ResultStage 0 (reduce at SparkPi.scala:38)
21/09/02 03:36:59 INFO scheduler.DAGScheduler: Parents of final stage: List()
21/09/02 03:36:59 INFO scheduler.DAGScheduler: Missing parents: List()
21/09/02 03:36:59 INFO scheduler.DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:34), which has no missing parents
21/09/02 03:36:59 INFO memory.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 3.1 KiB, free 413.9 MiB)
21/09/02 03:36:59 INFO memory.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 1816.0 B, free 413.9 MiB)
21/09/02 03:36:59 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on master:56312 (size: 1816.0 B, free: 413.9 MiB)
21/09/02 03:36:59 INFO spark.SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1223
21/09/02 03:36:59 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:34) (first 15 tasks are for partitions Vector(0, 1))
21/09/02 03:36:59 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
21/09/02 03:36:59 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, master, executor driver, partition 0, PROCESS_LOCAL, 7393 bytes)
21/09/02 03:36:59 INFO executor.Executor: Running task 0.0 in stage 0.0 (TID 0)
21/09/02 03:37:00 INFO executor.Executor: Finished task 0.0 in stage 0.0 (TID 0). 1000 bytes result sent to driver
21/09/02 03:37:00 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, master, executor driver, partition 1, PROCESS_LOCAL, 7393 bytes)
21/09/02 03:37:00 INFO executor.Executor: Running task 1.0 in stage 0.0 (TID 1)
21/09/02 03:37:00 INFO executor.Executor: Finished task 1.0 in stage 0.0 (TID 1). 957 bytes result sent to driver
21/09/02 03:37:00 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 640 ms on master (executor driver) (1/2)
21/09/02 03:37:00 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 81 ms on master (executor driver) (2/2)
21/09/02 03:37:00 INFO scheduler.DAGScheduler: ResultStage 0 (reduce at SparkPi.scala:38) finished in 0.919 s
21/09/02 03:37:00 INFO scheduler.DAGScheduler: Job 0 is finished. Cancelling potential speculative or zombie tasks for this job
21/09/02 03:37:00 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 
21/09/02 03:37:00 INFO scheduler.TaskSchedulerImpl: Killing all running tasks in stage 0: Stage finished
21/09/02 03:37:00 INFO scheduler.DAGScheduler: Job 0 finished: reduce at SparkPi.scala:38, took 0.993845 s
Pi is roughly 3.137475687378437
21/09/02 03:37:00 INFO server.AbstractConnector: Stopped Spark@e362c57{HTTP/1.1, (http/1.1)}{0.0.0.0:4040}
21/09/02 03:37:00 INFO ui.SparkUI: Stopped Spark web UI at http://master:4040
21/09/02 03:37:00 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
21/09/02 03:37:00 INFO memory.MemoryStore: MemoryStore cleared
21/09/02 03:37:00 INFO storage.BlockManager: BlockManager stopped
21/09/02 03:37:00 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
21/09/02 03:37:00 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
21/09/02 03:37:00 INFO spark.SparkContext: Successfully stopped SparkContext
21/09/02 03:37:00 INFO util.ShutdownHookManager: Shutdown hook called
21/09/02 03:37:00 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-6780cb97-e642-4516-9b47-320836aa5890
21/09/02 03:37:00 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-25cf5769-586c-430b-9622-fee47d0f21e7

参数说明:

参数解释
–classSpark程序中包含主函数的类
–masterSpark程序运行的模式
application-jar打包好的应用jar,包含依赖
–executor-memory 1G指定每个executor可用内存为1G
–total-executor-cores 2指定每个excutor可用CPU的核数
Logo

CSDN联合极客时间,共同打造面向开发者的精品内容学习社区,助力成长!

更多推荐