pyspark设置python的版本

pyspark设置python的版本一般情况下，spark内置的版本,与操作系统中的版本一致，现在想把python的版本切换成3的版本，步骤：1、查看操作系统中的版本[root@master local]# pythonPython 2.7.5 (default, Oct 30 2018, 23:45:53)[GCC 4.8.5 20150623 (Red Hat 4.8.5-36)] on li

ruth13156402807

4969人浏览 · 2021-07-21 11:09:29

ruth13156402807 · 2021-07-21 11:09:29 发布

pyspark设置python的版本

一般情况下，spark内置的版本,与操作系统中的版本一致，现在想把python的版本切换成3的版本，步骤：

1、查看操作系统中的版本

[root@master local]# python
Python 2.7.5 (default, Oct 30 2018, 23:45:53) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-36)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>>

2、pyspark启动的版本

[hadoop@master bin]$ pyspark 
Python 2.7.5 (default, Oct 30 2018, 23:45:53) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-36)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
2021-07-21 11:00:23 WARN  Utils:66 - Your hostname, master resolves to a loopback address: 127.0.0.1; using 192.168.1.100 instead (on interface ens33)
2021-07-21 11:00:23 WARN  Utils:66 - Set SPARK_LOCAL_IP if you need to bind to another address
2021-07-21 11:00:24 WARN  NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 2.3.0
      /_/

Using Python version 2.7.5 (default, Oct 30 2018 23:45:53)
SparkSession available as 'spark'.
>>>

3、首先Linux操作系统中安装python3并配置环境变量

方法参考：

https://blog.csdn.net/ruth13156402807/article/details/118962024

4、修改spark-env.sh文件，在末尾添加export PYSPARK_PYTHON=/usr/local/bin/python3

[hadoop@master conf]$ vim spark-env.sh
export PYSPARK_PYTHON=/usr/local/bin/python3

5、修改spark安装包bin目录下的pyspark，修改下图红色方框的位置，将原来PYSPARK_PYTHON=python改成PYSPARK_PYTHON=python3

[hadoop@master bin]$ vim pyspark

if [[ -z "$PYSPARK_PYTHON" ]]; then
  if [[ $PYSPARK_DRIVER_PYTHON == *ipython* && ! $WORKS_WITH_IPYTHON ]]; then
    echo "IPython requires Python 2.7+; please install python2.7 or set PYSPARK_PYTHON" 1>&2
    exit 1
  else
    PYSPARK_PYTHON=python3
  fi
fi
export PYSPARK_PYTHON

6、启动pyspark,版本已经更换

[hadoop@master bin]$ pyspark
Python 3.6.4 (default, Jul 21 2021, 09:52:44) 
[GCC 4.8.5 20150623 (Red Hat 4.8.5-44)] on linux
Type "help", "copyright", "credits" or "license" for more information.
2021-07-21 11:04:29 WARN  Utils:66 - Your hostname, master resolves to a loopback address: 127.0.0.1; using 192.168.1.100 instead (on interface ens33)
2021-07-21 11:04:29 WARN  Utils:66 - Set SPARK_LOCAL_IP if you need to bind to another address
2021-07-21 11:04:30 WARN  NativeCodeLoader:62 - Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Welcome to
      ____              __
     / __/__  ___ _____/ /__
    _\ \/ _ \/ _ `/ __/  '_/
   /__ / .__/\_,_/_/ /_/\_\   version 2.3.0
      /_/

Using Python version 3.6.4 (default, Jul 21 2021 09:52:44)
SparkSession available as 'spark'.
>>>