Apache Airflow 安装
安装环境:ubuntu 14.04root:root用户darren:普通用户(有sudo权限)安装:第一步:安装pythonsudo apt-get updatesudo apt-get install python3如果已经安装过python3,可以跳过此步。第二步:配置软连接sudo ln -s /usr/bin/python3.4 /usr/bin...
安装环境:
ubuntu 14.04
root:root用户
darren:普通用户(有sudo权限)
安装:
第一步:安装python
sudo apt-get update
sudo apt-get install python3
如果已经安装过python3,可以跳过此步。
第二步:配置软连接
sudo ln -s /usr/bin/python3.4 /usr/bin/python
使用默认源安装完python3是3.4,创建一个软连接, 然后使用python -V查看版本信息:
darren@ubuntu:~$ python -V
Python 3.4.3
第三步:安装pip
sudo apt-get install python3-pip
之后会在/usr/bin下生成pip3可执行文件,可以使用pip3 -V 查看版本信息:
darren@ubuntu:~$ /usr/bin/pip3 -V
pip 1.5.4 from /usr/lib/python3/dist-packages (python 3.4)
可是这个版本太低了,安装airflow会有很多问题,所以要升级pip
第四步:pip升级
sudo /usr/bin/pip3 install --upgrade pip
升级完成之后你会发现
darren@ubuntu:~$ /usr/bin/pip3 -V
pip 1.5.4 from /usr/lib/python3/dist-packages (python 3.4)
版本依然没有变化,可是都提示升级成功了,这是为什么? 这是因为升级后的pip3放到了/usr/local/bin下
darren@ubuntu:~$ /usr/local/bin/pip3 -V
pip 18.1 from /usr/local/lib/python3.4/dist-packages/pip (python 3.4)
此时的版本是18.1,然后可以安装airflow了。
第五步:安装airflow
# exchange root user
su root
# airflow needs a home, ~/airflow is the default,
# but you can lay foundation somewhere else if you prefer
# (optional)
export AIRFLOW_HOME=/home/darren/airflow
# set GPL dependency
export SLUGIFY_USES_TEXT_UNIDECODE=yes
# install from pypi using pip
/usr/local/bin/pip3 install apache-airflow
安装过程中可能会出现如下错误:
Unknown distribution option: 'python_requires'
此时可以安装setuptools
/usr/local/bin/pip3 install --upgrade setuptools
安装升级成功后再执行如下命令,继续安装
/usr/local/bin/pip3 install apache-airflow
然后可能会出现新的问题:
thrift 0.11.0 has requirement six>=1.7.2, but you'll have six 1.5.2 which is incompatible.
tenacity 4.8.0 has requirement six>=1.9.0, but you'll have six 1.5.2 which is incompatible.
html5lib 1.0.1 has requirement six>=1.9, but you'll have six 1.5.2 which is incompatible.
意思是说six的版本低了,这个好办,升级six
/usr/local/bin/pip3 install six --upgrade --ignore-installed six
再次重试安装,又可能会报错误:
Cannot uninstall 'colorama'. It is a distutils installed project and thus we cannot accurately determine
which files belong to it which would lead to only a partial uninstall.
解决方法如下:
find / -name colorama
# 我搜索之后的路径是
#/usr/lib/python3/dist-packages/colorama
#/usr/lib/python3/dist-packages/colorama-0.2.5.egg-info
#第一行是个文件夹,第二行是个文件,删除他们
rm -r colorama
rm colorama-0.2.5.egg-info
就可以解决这个问题,然后重试安装,如果还有类似的问题,同样的方法解决。
安装完成后可以在AIRFLOW_HOME目录下看到如下信息
-rw-rw-r-- 1 darren darren 20738 Jan 14 09:34 airflow.cfg
-rw-r--r-- 1 darren darren 105472 Jan 14 15:05 airflow.db
drwxrwxr-x 5 darren darren 4096 Jan 14 10:04 logs/
-rw-rw-r-- 1 darren darren 2304 Jan 14 09:34 unittests.cfg
从上到下分别是配置文件,数据文件,日志文件夹,单元测试配置文件
第六步:初始化数据库
# initialize the database
airflow initdb
Airflow默认使用SQLite数据库,下次补充介绍如何使用MySQL数据库
第七步: 安装MySQL
sudo apt-get install mysql-server mysql-client
配置用户和创建数据库
新建用户
CREATE USER airflow;
新建数据库
CREATE DATABASE airflow;
给权限
GRANT all privileges on airflow.* TO 'airflow'@'%' IDENTIFIED BY 'airflow';
GRANT all privileges on airflow.* TO 'airflow'@'localhost' IDENTIFIED BY 'airflow';
GRANT all privileges on airflow.* TO 'airflow'@'127.0.0.1' IDENTIFIED BY 'airflow';
刷新
flush privileges;
由于我的系统是Ubuntu14.04, 所以自带的mysql的版本是5.5.62
修改airflow.cfg,默认使用的是SQLite数据库
#sql_alchemy_conn = sqlite:home/darren/airflow/airflow.db
sql_alchemy_conn = mysql://airflow:airflow@192.168.137.1:3306/airflow
然后重新执行
# initialize the database
airflow initdb
可能会遇到如下错误
ModuleNotFoundError: No module named 'MySQLdb'
这是因为你只安装了mysql数据库,但是没有安装python访问数据库的驱动程序,还需要安装如下程序:
sudo /usr/local/bin/pip3 install pymysql
sudo apt-get install libmysqlclient-dev
sudo /usr/local/bin/pip3 install mysqlclient
再次执行数据库初始化命令,可能会有如下信息
sqlalchemy.exc.ProgrammingError: (MySQLdb._exceptions.ProgrammingError) (1064, "You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '(6) NULL' at line 1") [SQL: 'ALTER TABLE dag MODIFY last_scheduler_run DATETIME(6) NULL'] (Background on this error at: http://sqlalche.me/e/f405)
这个就比较麻烦了,查了资料说是mysql版本太低了,需要升级到5.7以上,参考(https://blog.csdn.net/u013525058/article/details/81188175)
所以建议使用Ubuntu16.04,或Ubuntu18.04版本安装,中间能省不少时间
这里有两种方案,升级MySQL,时间花费不少,在本机安装MySQL,可行,重新安装一台高版本的虚拟机安装MySQL,也可行,选择哪种方案,自己决定就行
解决完版本问题之后,再试一次初始化,可能会遇到如下问题:
run_migrations_online()
File "/home/darren/program/python3.6.5/lib/python3.6/site-packages/airflow/migrations/env.py", line 86, in run_migrations_online
context.run_migrations()
File "<string>", line 8, in run_migrations
File "/home/darren/program/python3.6.5/lib/python3.6/site-packages/alembic/runtime/environment.py", line 807, in run_migrations
self.get_context().run_migrations(**kw)
File "/home/darren/program/python3.6.5/lib/python3.6/site-packages/alembic/runtime/migration.py", line 321, in run_migrations
step.migration_fn(**kw)
File "/home/darren/program/python3.6.5/lib/python3.6/site-packages/airflow/migrations/versions/0e2a74e0fc9f_add_time_zone_awareness.py", line 46, in upgrade
raise Exception("Global variable explicit_defaults_for_timestamp needs to be on (1) for mysql")
Exception: Global variable explicit_defaults_for_timestamp needs to be on (1) for mysql
需要在MySQL配置文件中配置如下信息
[mysqld]
explicit_defaults_for_timestamp = 1
参考(https://blog.csdn.net/qq_29719097/article/details/83577021)
再次初始化,最终成功
第八步:启动服务
# start the web server, default port is 8080
airflow webserver -p 8080
打开浏览器,访问host://8080, host是你安装airflow的主机地址
看到如下界面,安装完成。
注:如果使用anaconda安装请参考https://blog.csdn.net/zpf336/article/details/104480732
参考:
https://www.jianshu.com/p/16b5aa09b67c
https://www.jianshu.com/p/28e2ae6fbd75
更多推荐
所有评论(0)