用OpenBLAS集成编译numpy
我正在尝试安装OpenBLAS
numpy
,但是我不知道如何编写site.cfg
文件。
遵循安装过程时,安装完成时没有错误,但是从1(由环境variablesOMP_NUM_THREADS控制)增加OpenBLAS使用的线程数会降低性能。
我不确定OpenBLAS集成是否完美。 任何一个可以提供一个site.cfg
文件来实现相同的。
PS:在其他工具包(如基于Python的Theano)中集成OpenBLAS,可以在同一台机器上增加线程数量,大大提高性能。
我刚刚在一个OpenBLAS
集成的virtualenv
内部编译了numpy
,看起来工作正常。
这是我的过程:
-
编译
OpenBLAS
:$ git clone https://github.com/xianyi/OpenBLAS $ cd OpenBLAS && make FC=gfortran $ sudo make PREFIX=/opt/OpenBLAS install
如果您没有pipe理员权限,则可以将
PREFIX=
设置为具有写权限的目录(只需修改相应的步骤即可)。 -
确保包含
libopenblas.so
的目录位于共享库searchpath中。-
要在本地执行此操作,可以编辑您的
~/.bashrc
文件以包含该行export LD_LIBRARY_PATH=/opt/OpenBLAS/lib:$LD_LIBRARY_PATH
当您启动一个新的terminal会话时,
LD_LIBRARY_PATH
环境variables将被更新(使用$ source ~/.bashrc
在同一个会话中强制更新)。 -
另一个适用于多用户的选项是在
/etc/ld.so.conf.d/
中创build一个包含/opt/OpenBLAS/lib
行的.conf
文件,例如:$ sudo sh -c "echo '/opt/OpenBLAS/lib' > /etc/ld.so.conf.d/openblas.conf"
一旦你完成任何一个选项,运行
$ sudo ldconfig
-
-
抓取
numpy
源代码:$ git clone https://github.com/numpy/numpy $ cd numpy
-
将
site.cfg.example
复制到site.cfg
并编辑副本:$ cp site.cfg.example site.cfg $ nano site.cfg
取消这些行的注释:
.... [openblas] libraries = openblas library_dirs = /opt/OpenBLAS/lib include_dirs = /opt/OpenBLAS/include ....
-
检查configuration,
virtualenv
,安装(可选的在virtualenv
)$ python setup.py config
输出应该是这样的:
... openblas_info: FOUND: libraries = ['openblas', 'openblas'] library_dirs = ['/opt/OpenBLAS/lib'] language = c define_macros = [('HAVE_CBLAS', None)] FOUND: libraries = ['openblas', 'openblas'] library_dirs = ['/opt/OpenBLAS/lib'] language = c define_macros = [('HAVE_CBLAS', None)] ...
使用
pip
安装比使用python setup.py install
更可取 ,因为pip
将跟踪包的元数据,并允许您将来轻松地卸载或升级numpy。$ pip install .
-
可选:您可以使用此脚本来testing不同线程数的性能。
$ OMP_NUM_THREADS=1 python build/test_numpy.py version: 1.10.0.dev0+8e026a2 maxint: 9223372036854775807 BLAS info: * libraries ['openblas', 'openblas'] * library_dirs ['/opt/OpenBLAS/lib'] * define_macros [('HAVE_CBLAS', None)] * language c dot: 0.099796795845 sec $ OMP_NUM_THREADS=8 python build/test_numpy.py version: 1.10.0.dev0+8e026a2 maxint: 9223372036854775807 BLAS info: * libraries ['openblas', 'openblas'] * library_dirs ['/opt/OpenBLAS/lib'] * define_macros [('HAVE_CBLAS', None)] * language c dot: 0.0439578056335 sec
高线程数似乎有明显的改善。 不过,我还没有非常系统地testing过,而且对于更小的matrix来说,额外的开销可能会超过线程数更高的性能优势。
以防万一你使用的Ubuntu或薄荷,你可以很容易地有openblas链接numpy通过安装numpy和openblas通过apt-get
sudo apt-get install numpy libopenblas-dev
在一个新的Docker Ubuntu上,我testing了从博客文章“安装Numpy和OpenBLAS”复制的以下脚本
import numpy as np import numpy.random as npr import time # --- Test 1 N = 1 n = 1000 A = npr.randn(n,n) B = npr.randn(n,n) t = time.time() for i in range(N): C = np.dot(A, B) td = time.time() - t print("dotted two (%d,%d) matrices in %0.1f ms" % (n, n, 1e3*td/N)) # --- Test 2 N = 100 n = 4000 A = npr.randn(n) B = npr.randn(n) t = time.time() for i in range(N): C = np.dot(A, B) td = time.time() - t print("dotted two (%d) vectors in %0.2f us" % (n, 1e6*td/N)) # --- Test 3 m,n = (2000,1000) A = npr.randn(m,n) t = time.time() [U,s,V] = np.linalg.svd(A, full_matrices=False) td = time.time() - t print("SVD of (%d,%d) matrix in %0.3fs" % (m, n, td)) # --- Test 4 n = 1500 A = npr.randn(n,n) t = time.time() w, v = np.linalg.eig(A) td = time.time() - t print("Eigendecomp of (%d,%d) matrix in %0.3fs" % (n, n, td))
没有openblas的结果是:
dotted two (1000,1000) matrices in 563.8 ms dotted two (4000) vectors in 5.16 us SVD of (2000,1000) matrix in 6.084 s Eigendecomp of (1500,1500) matrix in 14.605 s
我安装openblas与apt install openblas-dev
,我检查了与numpy的联系
import numpy as np np.__config__.show()
和信息是
atlas_threads_info: NOT AVAILABLE openblas_info: NOT AVAILABLE atlas_blas_info: NOT AVAILABLE atlas_3_10_threads_info: NOT AVAILABLE blas_info: library_dirs = ['/usr/lib'] libraries = ['blas', 'blas'] language = c define_macros = [('HAVE_CBLAS', None)] mkl_info: NOT AVAILABLE atlas_3_10_blas_threads_info: NOT AVAILABLE atlas_3_10_blas_info: NOT AVAILABLE openblas_lapack_info: NOT AVAILABLE lapack_opt_info: library_dirs = ['/usr/lib'] libraries = ['lapack', 'lapack', 'blas', 'blas'] language = c define_macros = [('NO_ATLAS_INFO', 1), ('HAVE_CBLAS', None)] blas_opt_info: library_dirs = ['/usr/lib'] libraries = ['blas', 'blas'] language = c define_macros = [('NO_ATLAS_INFO', 1), ('HAVE_CBLAS', None)] atlas_info: NOT AVAILABLE blas_mkl_info: NOT AVAILABLE lapack_mkl_info: NOT AVAILABLE atlas_3_10_info: NOT AVAILABLE lapack_info: library_dirs = ['/usr/lib'] libraries = ['lapack', 'lapack'] language = f77 atlas_blas_threads_info: NOT AVAILABLE
它不显示与openblas的链接。 但是,脚本的新结果显示numpy必须使用openblas:
dotted two (1000,1000) matrices in 15.2 ms dotted two (4000) vectors in 2.64 us SVD of (2000,1000) matrix in 0.469 s Eigendecomp of (1500,1500) matrix in 2.794 s