好好的为什么要折腾这个事?说来也是郁闷,在测试ELL的过程中,遇到了一个Open MPI package依赖的问题,它要用到 libmpi.so.12 这个动态链接库对应版本的Open MPI,而Ubuntu 14.04系统上,用 apt-get install libopenmpi-dev 安装上的是很旧的版本,因此只能升级系统里已经安装的Open MPI了。
『1』使用Open MPI旧版本引起的错误
运行ELL的demo程序 cntkDemo.py 的时候,旧版本的Open MPI会导致的错误:
OpenBLAS : Your OS does not support AVX instructions. OpenBLAS is using Nehalem kernels as a fallback, which may give poorer performance.Traceback (most recent call last):File "/home/codelast/.miniconda3/envs/py36/lib/python3.6/site-packages/cntk/cntk_py.py", line 18, in swig_import_helperreturn importlib.import_module(mname)File "/home/codelast/.miniconda3/envs/py36/lib/python3.6/importlib/__init__.py", line 126, in import_modulereturn _bootstrap._gcd_import(name[level:], package, level)File "<frozen importlib._bootstrap>", line 978, in _gcd_importFile "<frozen importlib._bootstrap>", line 961, in _find_and_loadFile "<frozen importlib._bootstrap>", line 948, in _find_and_load_unlockedModuleNotFoundError: No module named 'cntk._cntk_py'During handling of the above exception, another exception occurred:Traceback (most recent call last):File "cntkDemo.py", line 7, in <module>import cntk_to_ellFile "./../../../../build/tools/importers/CNTK/cntk_to_ell.py", line 17, in <module>from cntk.layers import Convolution, MaxPooling, AveragePooling, Dropout, BatchNormalization, DenseFile "/home/codelast/.miniconda3/envs/py36/lib/python3.6/site-packages/cntk/__init__.py", line 10, in <module>from . import cntk_pyFile "/home/codelast/.miniconda3/envs/py36/lib/python3.6/site-packages/cntk/cntk_py.py", line 21, in <module>_cntk_py = swig_import_helper()File "/home/codelast/.miniconda3/envs/py36/lib/python3.6/site-packages/cntk/cntk_py.py", line 20, in swig_import_helperreturn importlib.import_module('_cntk_py')File "/home/codelast/.miniconda3/envs/py36/lib/python3.6/importlib/__init__.py", line 126, in import_modulereturn _bootstrap._gcd_import(name[level:], package, level)ImportError: libmpi.so.12: cannot open shared object file: No such file or directory
红色那一行是关键信息。
文章来源:https://www.codelast.com/
『2』检查Ubuntu 14.04上已经安装的Open MPI
[codelast@ ~]$ ll /usr/lib/libmpi.so*lrwxrwxrwx 1 root root 27 7月 12 00:03 /usr/lib/libmpi.so -> /etc/alternatives/libmpi.solrwxrwxrwx 1 root root 15 12月 29 2013 /usr/lib/libmpi.so.1 -> libmpi.so.1.0.8lrwxrwxrwx 1 root root 27 12月 29 2013 /usr/lib/libmpi.so.1.0.8 -> openmpi/lib/libmpi.so.1.0.8[codelast@ ~]$ apt-cache show libopenmpi-devVersion: 1.6.5-8
我们需要把Ubuntu 14.04系统里旧版的Open MPI卸载掉。
先查找:
sudo dpkg -l | grep openmpi
然后视情况卸载。我卸载了下面这些:
sudo apt-get remove libmpich-dev mpi-default-dev libopenmpi-dev libopenmpi1.6 openmpi-common
文章来源:https://www.codelast.com/
『3』确定 libmpi.so.12 对应的Open MPI版本
要使用新版的Open MPI,只能自己编译安装了,因此,要先确定应该安装哪个版本的Open MPI,光有一个动态链接库名 libmpi.so.12 是不够的。
我在Google上搜了一下,查到 libmpi.so.12 应该是在Ubuntu 16.04系统上,用 apt-get install libopenmpi-dev 安装上的版本,因此,我直接到一台Ubuntu 16.04系统的机器上试装了一下,果然如此。安装之后检查一下:
[codelast@ ~]$ ll /usr/lib/libmpi.so*lrwxrwxrwx 1 root root 27 7月 12 00:08 /usr/lib/libmpi.so -> /etc/alternatives/libmpi.solrwxrwxrwx 1 root root 16 2月 26 2016 /usr/lib/libmpi.so.12 -> libmpi.so.12.0.2lrwxrwxrwx 1 root root 28 2月 26 2016 /usr/lib/libmpi.so.12.0.2 -> openmpi/lib/libmpi.so.12.0.2[codelast@ ~]$ apt-cache show libopenmpi-devVersion: 1.10.2-8ubuntu1
文章来源:https://www.codelast.com/
『4』在Ubuntu 14.04上编译安装新版的Open MPI
到Open MPI网站的这个页面上,下载 1.10.2 版本的包(openmpi-1.10.2.tar.gz),然后编译安装:
tar zxf openmpi-1.10.2.tar.gzcd openmpi-1.10.2/./configuremakesudo make install
如果一切顺利,新版的Open MPI就安装成功了。
为了保险,检查一下:
[codelast@ ~]$ sudo find ./ -name libmpi.so.12./usr/local/lib/libmpi.so.12