Spark 3.0.3 伪分布式
1、spark 概述Spark是一种基于内存的快速、通用、可扩展的大数据分析计算引擎Spark Core中提供了Spark的最基础的与最核心的功能Spark SQL是Spark用来操作结构化数据的组件Spark Streaming是Spark平台上针对实时数据进行流式计算的APISpark MLib 是Spark提供一个机器学习算法库Spark GraphX 是spark面向图计算提供的框架与算法
1、spark 概述
- Spark是一种基于内存的快速、通用、可扩展的大数据分析计算引擎
- Spark Core中提供了Spark的最基础的与最核心的功能
- Spark SQL是Spark用来操作结构化数据的组件
- Spark Streaming是Spark平台上针对实时数据进行流式计算的API
- Spark MLib 是Spark提供一个机器学习算法库
- Spark GraphX 是spark面向图计算提供的框架与算法
- Spark 3.0默认使用的scala编译版本为2.12
2、wordcount案例
-
使用idea创建一个maven项目
-
需要配置本地spark运行环境
-
log4j.properties配置文件
log4j.rootCategory=ERROR, console
log4j.appender.console=org.apache.log4j.ConsoleAppender
log4j.appender.console.target=System.err
log4j.appender.console.layout=org.apache.log4j.PatternLayout
log4j.appender.console.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n
log4j.logger.org.apache.spark.repl.Main=WARN
log4j.logger.org.spark_project.jetty=WARN
log4j.logger.org.spark_project.jetty.util.component.AbstractLifeCycle=ERROR
log4j.logger.org.apache.spark.repl.SparkIMain$exprTyper=INFO
log4j.logger.org.apache.spark.repl.SparkILoop$SparkILoopInterpreter=INFO
log4j.logger.org.apache.parquet=ERROR
log4j.logger.parquet=ERROR
- 数据源
hadoop spark hive
hadoop hive sqoop
flume hbase sqoop
hive flume spark
hadoop spark hive
hadoop hive sqoop
flume hbase sqoop
hive flume spark
- 实现代码
import org.apache.spark.rdd.RDD
import org.apache.spark.{SparkConf, SparkContext}
object Test {
def main(args: Array[String]): Unit = {
val conf: SparkConf = new SparkConf().setMaster("local[*]").setAppName("WordCount")
val context = new SparkContext(conf)
// 读取文件,读取目录下所有文件
val values: RDD[String] = context.textFile("data")
val words: RDD[String] = values.flatMap(_.split(" "))
//根据单词进行分组
val word_group: RDD[(String, Iterable[String])] = words.groupBy(word => word)
val resoult: RDD[(String, Int)] = word_group.map {
case (word, list) => {
(word, list.size)
}
}
//将数据结果集输出到控制台
resoult.foreach(println)
// 关闭连接
context.stop()
}
}
- 代码结果
(spark,4)
(hadoop,4)
(flume,4)
(hbase,2)
(hive,6)
(sqoop,4)
3、hadoop2.7伪分布式部署
3.1 安装java
3.1.1 解压jdk
[root@master ~]#tar -xzvf /opt/software/jdk-8u201-linux-x64.tar.gz -C /usr/local/src/
3.1.2 重命名为java
[root@master ~]#mv /usr/local/src/jdk1.8.0_201 /usr/local/src/java
3.1.3 配置环境变量
最后一行插入:
[root@master ~]# vi /etc/profile
export JAVA_HOME=/usr/local/src/java
export PATH=$PATH:$JAVA_HOME/bin
3.1.4 加载环境变量文件
[root@master ~]# source /etc/profile
3.1.5 查看jps进程
[root@master ~]# jps
2714 Jps
3.2 安装hadoop
3.2.1 解压hadoop
[root@master ~]# tar -xzvf /opt/software/hadoop-2.7.1.tar.gz -C /usr/local/src/
3.2.2 重命名为hadoop
[root@master ~]# mv /usr/local/src/hadoop-2.7.1/ /usr/local/src/hadoop
配置hadoop环境变
最后一行插入:
[root@master ~]# vi /etc/profile
export HADOOP_HOME=/usr/local/src/hadoop
export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
3.2.3 加载环境变量文件
[root@master ~]# source /etc/profile
3.2.4 查看hadoop的版本
[root@master ~]# hadoop version
Hadoop 2.7.1
Subversion https://git-wip-us.apache.org/repos/asf/hadoop.git -r 15ecc87ccf4a0228f35af08fc56de536e6ce657a
Compiled by jenkins on 2015-06-29T06:04Z
Compiled with protoc 2.5.0
From source with checksum fc0a1a23fc1868e4d5ee7fa2b28a58a
This command was run using /usr/local/src/hadoop/share/hadoop/common/hadoop-common-2.7.1.jar
3.2.5 配置core-site.xml
[root@master ~]# vi /usr/local/src/hadoop/etc/hadoop/core-site.xml
<configuration>
<property>
<!--namenode的URL地址(必须写)-->
<name>fs.defaultFS</name>
<value>hdfs://master:9000</value>
</property>
<property>
<!--SequenceFiles中使用的读/写缓冲区的大小,单位为KB,131072KB默认为64M(该配置可选)-->
<name>io.file.buffer.size</name>
<value>131072</value>
</property>
</configuration>
3.2.6 配置hdfs-site.xml
[root@master ~]# vi /usr/local/src/hadoop/etc/hadoop/hdfs-site.xml
<configuration>
<property>
<!--hadoop的副本数量,默认为3(必须写)-->
<name>dfs.replication</name>
<value>1</value>
</property>
<property>
<!--在本地文件系统所在的NameNode的存储空间和持续化处理日志(必须写)-->
<name>dfs.namenode.name.dir</name>
<value>/usr/local/src/hadoop/dfs/name</value>
</property>
<property>
<!--在本地文件系统所在的DataNode的存储空间和持续化处理日志(必须写)-->
<name>dfs.datanode.data.dir</name>
<value>/usr/local/src/hadoop/dfs/data</value>
</property>
</configuration>
3.2.7 配置mapred-site.xml
[root@master ~]# cp /usr/local/src/hadoop/etc/hadoop/mapred-site.xml.template /usr/local/src/hadoop/etc/hadoop/mapred-site.xml
[root@master ~]# vi /usr/local/src/hadoop/etc/hadoop/mapred-site.xml
<configuration>
<property>
<!--指定MapReduce的计算框架-->
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<!--指定job的历史服务器-->
<name>mapreduce.jobhistory.address</name>
<value>master:10020</value>
</property>
<property>
<!--指定日志服务器的web访问端口-->
<name>mapreduce.jobhistory.webapp.address</name>
<value>master:19888</value>
</property>
</configuration>
3.2.8 配置yarn-site.xml
[root@master ~]# vi /usr/local/src/hadoop/etc/hadoop/yarn-site.xml
<configuration>
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.address</name>
<value>master:8032</value>
</property>
<property>
<name>yarn.resourcemanager.scheduler.address</name>
<value>master:8030</value>
</property>
<property>
<name>yarn.resourcemanager.resource-tracker.address</name>
<value>master:8031</value>
</property>
</configuration>
3.2.9 配置slaves和hadoop-env.sh
[root@master ~]#vi /usr/local/src/hadoop/etc/hadoop/slaves
master
[root@master ~]# vim /usr/local/src/hadoop/etc/hadoop/hadoop-env.sh
export JAVA_HOME=/usr/local/src/java
3.2.10 namenode进行格式化
ssh已经配置好,主机名已经修改完成!
[root@master ~]# ping master -c 2
PING master (192.168.222.133) 56(84) bytes of data.
64 bytes from master (192.168.222.133): icmp_seq=1 ttl=64 time=0.018 ms
64 bytes from master (192.168.222.133): icmp_seq=2 ttl=64 time=0.032 ms
--- master ping statistics ---
2 packets transmitted, 2 received, 0% packet loss, time 1007ms
rtt min/avg/max/mdev = 0.018/0.025/0.032/0.007 ms
[root@master ~]# hdfs namenode -format
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG: host = master/192.168.222.133
STARTUP_MSG: args = [-format]
STARTUP_MSG: version = 2.7.1
STARTUP_MSG: classpath = /usr.......
......
21/09/02 02:56:39 INFO namenode.FSImage: Allocated new BlockPoolId: BP-140091637-192.168.222.133-1630522599042
21/09/02 02:56:39 INFO common.Storage: Storage directory /usr/local/src/hadoop/dfs/name has been successfully formatted.
21/09/02 02:56:39 INFO namenode.NNStorageRetentionManager: Going to retain 1 images with txid >= 0
21/09/02 02:56:39 INFO util.ExitUtil: Exiting with status 0
21/09/02 02:56:39 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at master/192.168.222.133
************************************************************/
3.2.11 启动hadoop集群
[root@master ~]# start-all.sh
This script is Deprecated. Instead use start-dfs.sh and start-yarn.sh
Starting namenodes on [master]
master: starting namenode, logging to /usr/local/src/hadoop/logs/hadoop-root-namenode-master.out
master: starting datanode, logging to /usr/local/src/hadoop/logs/hadoop-root-datanode-master.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /usr/local/src/hadoop/logs/hadoop-root-secondarynamenode-master.out
starting yarn daemons
starting resourcemanager, logging to /usr/local/src/hadoop/logs/yarn-root-resourcemanager-master.out
master: starting nodemanager, logging to /usr/local/src/hadoop/logs/yarn-root-nodemanager-master.out
[root@master ~]# jps
5376 SecondaryNameNode
5219 DataNode
5651 NodeManager
5526 ResourceManager
5098 NameNode
5930 Jps
3.2.12 访问hadoop集群
如果访问不了,请查看防火墙是否关闭
在浏览器输入: http://master:50070
在浏览器输入: http://master:8088
运行pi验证
[root@master ~]# hadoop jar /usr/local/src/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar pi 10 10
Number of Maps = 10
Samples per Map = 10
Wrote input for Map #0
Wrote input for Map #1
Wrote input for Map #2
Wrote input for Map #3
Wrote input for Map #4
Wrote input for Map #5
Wrote input for Map #6
Wrote input for Map #7
Wrote input for Map #8
Wrote input for Map #9
Starting Job
21/09/02 03:10:25 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.222.133:8032
21/09/02 03:10:26 INFO input.FileInputFormat: Total input paths to process : 10
21/09/02 03:10:26 INFO mapreduce.JobSubmitter: number of splits:10
21/09/02 03:10:26 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1630523022106_0001
21/09/02 03:10:27 INFO impl.YarnClientImpl: Submitted application application_1630523022106_0001
21/09/02 03:10:27 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1630523022106_0001/
21/09/02 03:10:27 INFO mapreduce.Job: Running job: job_1630523022106_0001
21/09/02 03:10:34 INFO mapreduce.Job: Job job_1630523022106_0001 running in uber mode : false
21/09/02 03:10:34 INFO mapreduce.Job: map 0% reduce 0%
21/09/02 03:10:53 INFO mapreduce.Job: map 60% reduce 0%
21/09/02 03:11:06 INFO mapreduce.Job: map 80% reduce 0%
21/09/02 03:11:07 INFO mapreduce.Job: map 100% reduce 0%
21/09/02 03:11:08 INFO mapreduce.Job: map 100% reduce 100%
21/09/02 03:11:08 INFO mapreduce.Job: Job job_1630523022106_0001 completed successfully
21/09/02 03:11:08 INFO mapreduce.Job: Counters: 49
File System Counters
FILE: Number of bytes read=226
FILE: Number of bytes written=1272194
FILE: Number of read operations=0
FILE: Number of large read operations=0
FILE: Number of write operations=0
HDFS: Number of bytes read=2610
HDFS: Number of bytes written=215
HDFS: Number of read operations=43
HDFS: Number of large read operations=0
HDFS: Number of write operations=3
Job Counters
Launched map tasks=10
Launched reduce tasks=1
Data-local map tasks=10
Total time spent by all maps in occupied slots (ms)=148040
Total time spent by all reduces in occupied slots (ms)=12304
Total time spent by all map tasks (ms)=148040
Total time spent by all reduce tasks (ms)=12304
Total vcore-seconds taken by all map tasks=148040
Total vcore-seconds taken by all reduce tasks=12304
Total megabyte-seconds taken by all map tasks=151592960
Total megabyte-seconds taken by all reduce tasks=12599296
Map-Reduce Framework
Map input records=10
Map output records=20
Map output bytes=180
Map output materialized bytes=280
Input split bytes=1430
Combine input records=0
Combine output records=0
Reduce input groups=2
Reduce shuffle bytes=280
Reduce input records=20
Reduce output records=0
Spilled Records=40
Shuffled Maps =10
Failed Shuffles=0
Merged Map outputs=10
GC time elapsed (ms)=1712
CPU time spent (ms)=3930
Physical memory (bytes) snapshot=2006552576
Virtual memory (bytes) snapshot=22826283008
Total committed heap usage (bytes)=1383833600
Shuffle Errors
BAD_ID=0
CONNECTION=0
IO_ERROR=0
WRONG_LENGTH=0
WRONG_MAP=0
WRONG_REDUCE=0
File Input Format Counters
Bytes Read=1180
File Output Format Counters
Bytes Written=97
Job Finished in 43.243 seconds
Estimated value of Pi is 3.20000000000000000000
3.3 安装scala
3.3.1 解压scala
[root@master ~]#tar -xzvf /opt/software/scala-2.12.11.tgz -C /usr/local/src/
3.3.2 重名为scala
[root@master ~]# mv /usr/local/src/scala-2.12.11 /usr/local/src/scala
3.3.3 配置环境变量并生效
[root@master ~]# vi /etc/profile
export SCALA_HOME=/usr/local/src/scala
export PATH=$PATH:$SCALA_HOME/bin
[root@master ~]# source /etc/profile
3.3.4 进入到scala shell当中
[root@master ~]# scala
Welcome to Scala 2.12.11 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_201).
Type in expressions for evaluation. Or try :help.
scala>
3.4 spark 伪分布式
3.4.1 解压spark并重命名
[root@master ~]# tar -xzvf /opt/software/spark-3.0.3-bin-hadoop2.7.tgz -C /usr/local/src/
[root@master ~]# mv /usr/local/src/spark-3.0.3-bin-hadoop2.7 /usr/local/src/spark
3.4.2 配置环境变量并生效
[root@master ~]# vi /etc/profile
export SPARK_HOME=/usr/local/src/spark
export PATH=$PATH:$SPARK_HOME/bin
[root@master ~]# source /etc/profile
3.4.3 进入spark shell
[root@master ~]# spark-shell
1/09/02 03:24:15 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Spark context Web UI available at http://master:4040
Spark context available as 'sc' (master = local[*], app id = local-1630524263370).
Spark session available as 'spark'.
Welcome to
____ __
/ __/__ ___ _____/ /__
_\ \/ _ \/ _ `/ __/ '_/
/___/ .__/\_,_/_/ /_/\_\ version 3.0.3
/_/
Using Scala version 2.12.10 (Java HotSpot(TM) 64-Bit Server VM, Java 1.8.0_201)
Type in expressions to have them evaluated.
Type :help for more information.
scala>
3.4.4 配置spark-env.sh
[root@master ~]#cp /usr/local/src/spark/conf/spark-env.sh.template /usr/local/src/spark/conf/spark-env.sh
[root@master ~]# vi /usr/local/src/spark/conf/spark-env.sh
# java位置
export JAVA_HOME=/usr/local/src/java
# master节点IP或域名
export SPARK_MASTER_IP=master
# worker内存大小
export SPARK_WORKER_MEMORY=1G
# Worker的cpu核数
SPARK_WORKER_CORES=1
# hadoop配置文件路径
export HADOOP_CONF_DIR=/usr/local/src/hadoop/etc/hadoop
# 日志服务器
export SPARK_HISTORY_OPTS="
-Dspark.history.ui.port=18080
-Dspark.history.fs.logDirectory=hdfs://master:9000/spark/log
-Dspark.history.retainedApplications=30
"
3.4.5 配置slaves
[root@master ~]#cp /usr/local/src/spark/conf/slaves.template /usr/local/src/spark/conf/slaves
[root@master ~]# vi /usr/local/src/spark/conf/slaves
master
3.5.6 配置历史服务器
[root@master ~]# cp /usr/local/src/spark/conf/spark-defaults.conf.template /usr/local/src/spark/conf/spark-defaults.conf
# 启用日志服务器
spark.eventLog.enabled true
spark.eventLog.dir hdfs://master:9000/spark/log
3.4.7 启动spark
hadoop必须是启动状态
[root@master ~]# /usr/local/src/spark/sbin/start-all.sh
starting org.apache.spark.deploy.master.Master, logging to /usr/local/src/spark/logs/spark-root-org.apache.spark.deploy.master.Master-1-master.out
master: starting org.apache.spark.deploy.worker.Worker, logging to /usr/local/src/spark/logs/spark-root-org.apache.spark.deploy.worker.Worker-1-master.out
[root@master ~]#/usr/local/src/spark/sbin/start-history-server.sh
starting org.apache.spark.deploy.history.HistoryServer, logging to /usr/local/src/spark/logs/spark-root-org.apache.spark.deploy.history.HistoryServer-1-master.out
[root@master ~]# jps
5376 SecondaryNameNode
7280 Jps
7234 Worker
5219 DataNode
5651 NodeManager
5526 ResourceManager
5098 NameNode
7167 Master
7998 HistoryServer
3.4.8 访问spark
在浏览器上输入:http://192.168.222.133:8080
3.4.9 使用spark测试pi
[root@master ~]# /usr/local/src/spark/bin/spark-submit --class org.apache.spark.examples.SparkPi --master local[*] /usr/local/src/spark/examples/jars/spark-examples_2.12-3.0.3.jar
21/09/02 03:36:56 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
21/09/02 03:36:57 INFO spark.SparkContext: Running Spark version 3.0.3
21/09/02 03:36:57 INFO resource.ResourceUtils: ==============================================================
21/09/02 03:36:57 INFO resource.ResourceUtils: Resources for spark.driver:
21/09/02 03:36:57 INFO resource.ResourceUtils: ==============================================================
21/09/02 03:36:57 INFO spark.SparkContext: Submitted application: Spark Pi
21/09/02 03:36:57 INFO spark.SecurityManager: Changing view acls to: root
21/09/02 03:36:57 INFO spark.SecurityManager: Changing modify acls to: root
21/09/02 03:36:57 INFO spark.SecurityManager: Changing view acls groups to:
21/09/02 03:36:57 INFO spark.SecurityManager: Changing modify acls groups to:
21/09/02 03:36:57 INFO spark.SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(root); groups with view permissions: Set(); users with modify permissions: Set(root); groups with modify permissions: Set()
21/09/02 03:36:57 INFO util.Utils: Successfully started service 'sparkDriver' on port 34812.
21/09/02 03:36:57 INFO spark.SparkEnv: Registering MapOutputTracker
21/09/02 03:36:57 INFO spark.SparkEnv: Registering BlockManagerMaster
21/09/02 03:36:57 INFO storage.BlockManagerMasterEndpoint: Using org.apache.spark.storage.DefaultTopologyMapper for getting topology information
21/09/02 03:36:57 INFO storage.BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
21/09/02 03:36:57 INFO spark.SparkEnv: Registering BlockManagerMasterHeartbeat
21/09/02 03:36:57 INFO storage.DiskBlockManager: Created local directory at /tmp/blockmgr-07780364-6c77-4f49-b955-dc8cd697abdf
21/09/02 03:36:57 INFO memory.MemoryStore: MemoryStore started with capacity 413.9 MiB
21/09/02 03:36:57 INFO spark.SparkEnv: Registering OutputCommitCoordinator
21/09/02 03:36:57 INFO util.log: Logging initialized @2485ms to org.sparkproject.jetty.util.log.Slf4jLog
21/09/02 03:36:58 INFO server.Server: jetty-9.4.40.v20210413; built: 2021-04-13T20:42:42.668Z; git: b881a572662e1943a14ae12e7e1207989f218b74; jvm 1.8.0_201-b09
21/09/02 03:36:58 INFO server.Server: Started @2620ms
21/09/02 03:36:58 INFO server.AbstractConnector: Started ServerConnector@e362c57{HTTP/1.1, (http/1.1)}{0.0.0.0:4040}
21/09/02 03:36:58 INFO util.Utils: Successfully started service 'SparkUI' on port 4040.
21/09/02 03:36:58 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@4fdf8f12{/jobs,null,AVAILABLE,@Spark}
21/09/02 03:36:58 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@26f3d90c{/jobs/json,null,AVAILABLE,@Spark}
21/09/02 03:36:58 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@515f4131{/jobs/job,null,AVAILABLE,@Spark}
21/09/02 03:36:58 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3f3ddbd9{/jobs/job/json,null,AVAILABLE,@Spark}
21/09/02 03:36:58 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@6c2d4cc6{/stages,null,AVAILABLE,@Spark}
21/09/02 03:36:58 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@6134ac4a{/stages/json,null,AVAILABLE,@Spark}
21/09/02 03:36:58 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@71b1a49c{/stages/stage,null,AVAILABLE,@Spark}
21/09/02 03:36:58 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@589b028e{/stages/stage/json,null,AVAILABLE,@Spark}
21/09/02 03:36:58 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@9fecdf1{/stages/pool,null,AVAILABLE,@Spark}
21/09/02 03:36:58 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@3b0f7d9d{/stages/pool/json,null,AVAILABLE,@Spark}
21/09/02 03:36:58 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5c84624f{/storage,null,AVAILABLE,@Spark}
21/09/02 03:36:58 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@232024b9{/storage/json,null,AVAILABLE,@Spark}
21/09/02 03:36:58 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@2a415aa9{/storage/rdd,null,AVAILABLE,@Spark}
21/09/02 03:36:58 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@71ea1fda{/storage/rdd/json,null,AVAILABLE,@Spark}
21/09/02 03:36:58 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@420745d7{/environment,null,AVAILABLE,@Spark}
21/09/02 03:36:58 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5fa47fea{/environment/json,null,AVAILABLE,@Spark}
21/09/02 03:36:58 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5b43e173{/executors,null,AVAILABLE,@Spark}
21/09/02 03:36:58 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@545f80bf{/executors/json,null,AVAILABLE,@Spark}
21/09/02 03:36:58 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@22fa55b2{/executors/threadDump,null,AVAILABLE,@Spark}
21/09/02 03:36:58 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@6594402a{/executors/threadDump/json,null,AVAILABLE,@Spark}
21/09/02 03:36:58 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@405325cf{/static,null,AVAILABLE,@Spark}
21/09/02 03:36:58 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@44d70181{/,null,AVAILABLE,@Spark}
21/09/02 03:36:58 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@23c650a3{/api,null,AVAILABLE,@Spark}
21/09/02 03:36:58 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5fa05212{/jobs/job/kill,null,AVAILABLE,@Spark}
21/09/02 03:36:58 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@5c09d180{/stages/stage/kill,null,AVAILABLE,@Spark}
21/09/02 03:36:58 INFO ui.SparkUI: Bound SparkUI to 0.0.0.0, and started at http://master:4040
21/09/02 03:36:58 INFO spark.SparkContext: Added JAR file:/usr/local/src/spark/examples/jars/spark-examples_2.12-3.0.3.jar at spark://master:34812/jars/spark-examples_2.12-3.0.3.jar with timestamp 1630525017145
21/09/02 03:36:58 INFO executor.Executor: Starting executor ID driver on host master
21/09/02 03:36:58 INFO executor.Executor: Fetching spark://master:34812/jars/spark-examples_2.12-3.0.3.jar with timestamp 1630525017145
21/09/02 03:36:58 INFO client.TransportClientFactory: Successfully created connection to master/192.168.222.133:34812 after 58 ms (0 ms spent in bootstraps)
21/09/02 03:36:58 INFO util.Utils: Fetching spark://master:34812/jars/spark-examples_2.12-3.0.3.jar to /tmp/spark-25cf5769-586c-430b-9622-fee47d0f21e7/userFiles-f2964ecd-e2b5-494f-9468-1de07a5f4b31/fetchFileTemp192687052862215240.tmp
21/09/02 03:36:58 INFO executor.Executor: Adding file:/tmp/spark-25cf5769-586c-430b-9622-fee47d0f21e7/userFiles-f2964ecd-e2b5-494f-9468-1de07a5f4b31/spark-examples_2.12-3.0.3.jar to class loader
21/09/02 03:36:58 INFO util.Utils: Successfully started service 'org.apache.spark.network.netty.NettyBlockTransferService' on port 56312.
21/09/02 03:36:58 INFO netty.NettyBlockTransferService: Server created on master:56312
21/09/02 03:36:58 INFO storage.BlockManager: Using org.apache.spark.storage.RandomBlockReplicationPolicy for block replication policy
21/09/02 03:36:58 INFO storage.BlockManagerMaster: Registering BlockManager BlockManagerId(driver, master, 56312, None)
21/09/02 03:36:58 INFO storage.BlockManagerMasterEndpoint: Registering block manager master:56312 with 413.9 MiB RAM, BlockManagerId(driver, master, 56312, None)
21/09/02 03:36:58 INFO storage.BlockManagerMaster: Registered BlockManager BlockManagerId(driver, master, 56312, None)
21/09/02 03:36:58 INFO storage.BlockManager: Initialized BlockManager: BlockManagerId(driver, master, 56312, None)
21/09/02 03:36:58 INFO handler.ContextHandler: Started o.s.j.s.ServletContextHandler@6b760460{/metrics/json,null,AVAILABLE,@Spark}
21/09/02 03:36:59 INFO spark.SparkContext: Starting job: reduce at SparkPi.scala:38
21/09/02 03:36:59 INFO scheduler.DAGScheduler: Got job 0 (reduce at SparkPi.scala:38) with 2 output partitions
21/09/02 03:36:59 INFO scheduler.DAGScheduler: Final stage: ResultStage 0 (reduce at SparkPi.scala:38)
21/09/02 03:36:59 INFO scheduler.DAGScheduler: Parents of final stage: List()
21/09/02 03:36:59 INFO scheduler.DAGScheduler: Missing parents: List()
21/09/02 03:36:59 INFO scheduler.DAGScheduler: Submitting ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:34), which has no missing parents
21/09/02 03:36:59 INFO memory.MemoryStore: Block broadcast_0 stored as values in memory (estimated size 3.1 KiB, free 413.9 MiB)
21/09/02 03:36:59 INFO memory.MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 1816.0 B, free 413.9 MiB)
21/09/02 03:36:59 INFO storage.BlockManagerInfo: Added broadcast_0_piece0 in memory on master:56312 (size: 1816.0 B, free: 413.9 MiB)
21/09/02 03:36:59 INFO spark.SparkContext: Created broadcast 0 from broadcast at DAGScheduler.scala:1223
21/09/02 03:36:59 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from ResultStage 0 (MapPartitionsRDD[1] at map at SparkPi.scala:34) (first 15 tasks are for partitions Vector(0, 1))
21/09/02 03:36:59 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 2 tasks
21/09/02 03:36:59 INFO scheduler.TaskSetManager: Starting task 0.0 in stage 0.0 (TID 0, master, executor driver, partition 0, PROCESS_LOCAL, 7393 bytes)
21/09/02 03:36:59 INFO executor.Executor: Running task 0.0 in stage 0.0 (TID 0)
21/09/02 03:37:00 INFO executor.Executor: Finished task 0.0 in stage 0.0 (TID 0). 1000 bytes result sent to driver
21/09/02 03:37:00 INFO scheduler.TaskSetManager: Starting task 1.0 in stage 0.0 (TID 1, master, executor driver, partition 1, PROCESS_LOCAL, 7393 bytes)
21/09/02 03:37:00 INFO executor.Executor: Running task 1.0 in stage 0.0 (TID 1)
21/09/02 03:37:00 INFO executor.Executor: Finished task 1.0 in stage 0.0 (TID 1). 957 bytes result sent to driver
21/09/02 03:37:00 INFO scheduler.TaskSetManager: Finished task 0.0 in stage 0.0 (TID 0) in 640 ms on master (executor driver) (1/2)
21/09/02 03:37:00 INFO scheduler.TaskSetManager: Finished task 1.0 in stage 0.0 (TID 1) in 81 ms on master (executor driver) (2/2)
21/09/02 03:37:00 INFO scheduler.DAGScheduler: ResultStage 0 (reduce at SparkPi.scala:38) finished in 0.919 s
21/09/02 03:37:00 INFO scheduler.DAGScheduler: Job 0 is finished. Cancelling potential speculative or zombie tasks for this job
21/09/02 03:37:00 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool
21/09/02 03:37:00 INFO scheduler.TaskSchedulerImpl: Killing all running tasks in stage 0: Stage finished
21/09/02 03:37:00 INFO scheduler.DAGScheduler: Job 0 finished: reduce at SparkPi.scala:38, took 0.993845 s
Pi is roughly 3.137475687378437
21/09/02 03:37:00 INFO server.AbstractConnector: Stopped Spark@e362c57{HTTP/1.1, (http/1.1)}{0.0.0.0:4040}
21/09/02 03:37:00 INFO ui.SparkUI: Stopped Spark web UI at http://master:4040
21/09/02 03:37:00 INFO spark.MapOutputTrackerMasterEndpoint: MapOutputTrackerMasterEndpoint stopped!
21/09/02 03:37:00 INFO memory.MemoryStore: MemoryStore cleared
21/09/02 03:37:00 INFO storage.BlockManager: BlockManager stopped
21/09/02 03:37:00 INFO storage.BlockManagerMaster: BlockManagerMaster stopped
21/09/02 03:37:00 INFO scheduler.OutputCommitCoordinator$OutputCommitCoordinatorEndpoint: OutputCommitCoordinator stopped!
21/09/02 03:37:00 INFO spark.SparkContext: Successfully stopped SparkContext
21/09/02 03:37:00 INFO util.ShutdownHookManager: Shutdown hook called
21/09/02 03:37:00 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-6780cb97-e642-4516-9b47-320836aa5890
21/09/02 03:37:00 INFO util.ShutdownHookManager: Deleting directory /tmp/spark-25cf5769-586c-430b-9622-fee47d0f21e7
参数说明:
参数 | 解释 |
---|---|
–class | Spark程序中包含主函数的类 |
–master | Spark程序运行的模式 |
application-jar | 打包好的应用jar,包含依赖 |
–executor-memory 1G | 指定每个executor可用内存为1G |
–total-executor-cores 2 | 指定每个excutor可用CPU的核数 |
更多推荐
所有评论(0)