在Haskell中优化数组的性能
我正在研究类似MineCraft的世界的地形生成algorithm。 目前,我正在使用Simplex Noise Demystified(PDF)中的实现方法来使用单工噪声,因为单纯的噪声应该比Perlin噪声更快并且产生更less的伪像。 这看起来相当不错(见图片),但到目前为止,它也很慢。
对于块(16x16x128块)中的每个块运行噪声函数10次(对于地形高度,温度,树木位置等,我需要具有不同波长的噪声),每个块有3个八度的噪声,或者大约100万次调用噪声function总共大约需要700-800毫秒。 尽pipe事实上在algorithm中没有明显的昂贵的操作(至less对我来说),但对于产生具有任何体面的速度的地形而言,这至less是一个数量级太慢的速度。 只是楼,模,一些数组查找和基本的算术。 下面列出了algorithm(用Haskell编写)。 SCC的意见是为了分析。 我省略了二维噪声function,因为它们的工作方式是一样的。
g3 :: (Floating a, RealFrac a) => a g3 = 1/6 {-# INLINE int #-} int :: (Integral a, Num b) => a -> b int = fromIntegral grad3 :: (Floating a, RealFrac a) => V.Vector (a,a,a) grad3 = V.fromList $ [(1,1,0),(-1, 1,0),(1,-1, 0),(-1,-1, 0), (1,0,1),(-1, 0,1),(1, 0,-1),(-1, 0,-1), (0,1,1),( 0,-1,1),(0, 1,-1),( 0,-1,-1)] {-# INLINE dot3 #-} dot3 :: Num a => (a, a, a) -> a -> a -> a -> a dot3 (a,b,c) xyz = a * x + b * y + c * z {-# INLINE fastFloor #-} fastFloor :: RealFrac a => a -> Int fastFloor x = truncate (if x > 0 then x else x - 1) --Generate a random permutation for use in the noise functions perm :: Int -> Permutation perm seed = V.fromList . concat . replicate 2 . shuffle' [0..255] 256 $ mkStdGen seed --Generate 3D noise between -0.5 and 0.5 simplex3D :: (Floating a, RealFrac a) => Permutation -> a -> a -> a -> a simplex3D pxyz = {-# SCC "out" #-} 16 * (n gi0 (x0,y0,z0) + n gi1 xyz1 + n gi2 xyz2 + n gi3 xyz3) where (i,j,k) = {-# SCC "ijk" #-} (sx, sy, sz) where sa = fastFloor (a + (x + y + z) / 3) (x0,y0,z0) = {-# SCC "x0-z0" #-} (x - int i + t, y - int j + t, z - int k + t) where t = int (i + j + k) * g3 (i1,j1,k1,i2,j2,k2) = {-# SCC "i1-k2" #-} if x0 >= y0 then if y0 >= z0 then (1,0,0,1,1,0) else if x0 >= z0 then (1,0,0,1,0,1) else (0,0,1,1,0,1) else if y0 < z0 then (0,0,1,0,1,1) else if x0 < z0 then (0,1,0,0,1,1) else (0,1,0,1,1,0) xyz1 = {-# SCC "xyz1" #-} (x0 - int i1 + g3, y0 - int j1 + g3, z0 - int k1 + g3) xyz2 = {-# SCC "xyz2" #-} (x0 - int i2 + 2*g3, y0 - int j2 + 2*g3, z0 - int k2 + 2*g3) xyz3 = {-# SCC "xyz3" #-} (x0 - 1 + 3*g3, y0 - 1 + 3*g3, z0 - 1 + 3*g3) (ii,jj,kk) = {-# SCC "iijjkk" #-} (i .&. 255, j .&. 255, k .&. 255) gi0 = {-# SCC "gi0" #-} mod (p V.! (ii + p V.! (jj + p V.! kk ))) 12 gi1 = {-# SCC "gi1" #-} mod (p V.! (ii + i1 + p V.! (jj + j1 + p V.! (kk + k1)))) 12 gi2 = {-# SCC "gi2" #-} mod (p V.! (ii + i2 + p V.! (jj + j2 + p V.! (kk + k2)))) 12 gi3 = {-# SCC "gi3" #-} mod (p V.! (ii + 1 + p V.! (jj + 1 + p V.! (kk + 1 )))) 12 {-# INLINE n #-} n gi (x',y',z') = {-# SCC "n" #-} (\a -> if a < 0 then 0 else a*a*a*a*dot3 (grad3 V.! gi) x' y' z') $ 0.6 - x'*x' - y'*y' - z'*z' harmonic :: (Num a, Fractional a) => Int -> (a -> a) -> a harmonic octaves noise = f octaves / (2 - 1 / int (2 ^ (octaves - 1))) where f 0 = 0 fo = let r = int $ 2 ^ (o - 1) in noise r / r + f (o - 1) --Generate harmonic 3D noise between -0.5 and 0.5 harmonicNoise3D :: (RealFrac a, Floating a) => Permutation -> Int -> a -> a -> a -> a -> a harmonicNoise3D p octaves lxyz = harmonic octaves (\f -> simplex3D p (x * f / l) (y * f / l) (z * f / l))
为了分析,我使用了下面的代码,
q _ = let p = perm 0 in sum [harmonicNoise3D p 3 lxyz :: Float | l <- [1..10], y <- [0..127], x <- [0..15], z <- [0..15]] main = do start <- getCurrentTime print $ q () end <- getCurrentTime print $ diffUTCTime end start
它产生以下信息:
COST CENTRE MODULE %time %alloc simplex3D Main 18.8 21.0 n Main 18.0 19.6 out Main 10.1 9.2 harmonicNoise3D Main 9.8 4.5 harmonic Main 6.4 5.8 int Main 4.0 2.9 gi3 Main 4.0 3.0 xyz2 Main 3.5 5.9 gi1 Main 3.4 3.4 gi0 Main 3.4 2.7 fastFloor Main 3.2 0.6 xyz1 Main 2.9 5.9 ijk Main 2.7 3.5 gi2 Main 2.7 3.3 xyz3 Main 2.6 4.1 iijjkk Main 1.6 2.5 dot3 Main 1.6 0.7
为了比较,我还将algorithm移植到C#中。 那里的performance快了大概3到4倍,所以我想我一定是做错了。 但即使如此,它还不如我想要的那么快。 所以我的问题是:任何人都可以告诉我是否有任何方法来加速我的实现和/或一般algorithm,或者有人知道一个不同的噪声algorithm,有更好的性能特点,但有相似的外观?
更新:
在遵循下面提供的一些build议之后,现在的代码如下所示:
module Noise ( Permutation, perm , noise3D, simplex3D ) where import Data.Bits import qualified Data.Vector.Unboxed as UV import System.Random import System.Random.Shuffle type Permutation = UV.Vector Int g3 :: Double g3 = 1/6 {-# INLINE int #-} int :: Int -> Double int = fromIntegral grad3 :: UV.Vector (Double, Double, Double) grad3 = UV.fromList $ [(1,1,0),(-1, 1,0),(1,-1, 0),(-1,-1, 0), (1,0,1),(-1, 0,1),(1, 0,-1),(-1, 0,-1), (0,1,1),( 0,-1,1),(0, 1,-1),( 0,-1,-1)] {-# INLINE dot3 #-} dot3 :: (Double, Double, Double) -> Double -> Double -> Double -> Double dot3 (a,b,c) xyz = a * x + b * y + c * z {-# INLINE fastFloor #-} fastFloor :: Double -> Int fastFloor x = truncate (if x > 0 then x else x - 1) --Generate a random permutation for use in the noise functions perm :: Int -> Permutation perm seed = UV.fromList . concat . replicate 2 . shuffle' [0..255] 256 $ mkStdGen seed --Generate 3D noise between -0.5 and 0.5 noise3D :: Permutation -> Double -> Double -> Double -> Double noise3D pxyz = 16 * (n gi0 (x0,y0,z0) + n gi1 xyz1 + n gi2 xyz2 + n gi3 xyz3) where (i,j,k) = (sx, sy, sz) where sa = fastFloor (a + (x + y + z) / 3) (x0,y0,z0) = (x - int i + t, y - int j + t, z - int k + t) where t = int (i + j + k) * g3 (i1,j1,k1,i2,j2,k2) = if x0 >= y0 then if y0 >= z0 then (1,0,0,1,1,0) else if x0 >= z0 then (1,0,0,1,0,1) else (0,0,1,1,0,1) else if y0 < z0 then (0,0,1,0,1,1) else if x0 < z0 then (0,1,0,0,1,1) else (0,1,0,1,1,0) xyz1 = (x0 - int i1 + g3, y0 - int j1 + g3, z0 - int k1 + g3) xyz2 = (x0 - int i2 + 2*g3, y0 - int j2 + 2*g3, z0 - int k2 + 2*g3) xyz3 = (x0 - 1 + 3*g3, y0 - 1 + 3*g3, z0 - 1 + 3*g3) (ii,jj,kk) = (i .&. 255, j .&. 255, k .&. 255) gi0 = rem (UV.unsafeIndex p (ii + UV.unsafeIndex p (jj + UV.unsafeIndex p kk ))) 12 gi1 = rem (UV.unsafeIndex p (ii + i1 + UV.unsafeIndex p (jj + j1 + UV.unsafeIndex p (kk + k1)))) 12 gi2 = rem (UV.unsafeIndex p (ii + i2 + UV.unsafeIndex p (jj + j2 + UV.unsafeIndex p (kk + k2)))) 12 gi3 = rem (UV.unsafeIndex p (ii + 1 + UV.unsafeIndex p (jj + 1 + UV.unsafeIndex p (kk + 1 )))) 12 {-# INLINE n #-} n gi (x',y',z') = (\a -> if a < 0 then 0 else a*a*a*a*dot3 (UV.unsafeIndex grad3 gi) x' y' z') $ 0.6 - x'*x' - y'*y' - z'*z' harmonic :: Int -> (Double -> Double) -> Double harmonic octaves noise = f octaves / (2 - 1 / int (2 ^ (octaves - 1))) where f 0 = 0 fo = let r = 2 ^^ (o - 1) in noise r / r + f (o - 1) --3D simplex noise --syntax: simplex3D permutation number_of_octaves wavelength xyz simplex3D :: Permutation -> Int -> Double -> Double -> Double -> Double -> Double simplex3D p octaves lxyz = harmonic octaves (\f -> noise3D p (x * f / l) (y * f / l) (z * f / l))
与减less我的块大小为8x8x128一起,产生新的地形块现在发生在约10-20fps,这意味着移动现在几乎不像以前那样有问题。 当然,任何其他性能改进仍然是受欢迎的。
最初突出的是你的代码是高度多态的。 您应该将浮点types专门化为Double
,所以GHC(和LLVM)有机会应用更积极的优化。
请注意,对于那些试图重现,这段代码导入:
import qualified Data.Vector as V import Data.Bits import Data.Time.Clock import System.Random import System.Random.Shuffle type Permutation = V.Vector Int
好。 有很多事情可以尝试改进这个代码。
改进
数据表示
- 专注于具体的浮点types,而不是多态的浮点函数
- 用unboxed triple
T !Double !Double !Double
replace元组(a,a,a)
T !Double !Double !Double
- 从
Data.Array
切换到Data.Array.Unboxed
Permutations
- 用
repa
包中的多维unboxed数组replace使用三元组的盒装数组
编译器标志
- 使用
-O2 -fvia-C -optc-O3 -fexcess-precision -optc-march=native
(或等同于-fllvm
)-fllvm
- 增加spec
-fspec-constr-count=16
阈值 –-fspec-constr-count=16
更高效的库函数
- 用mersenne-random代替StdGen生成随机数
- 用
rem
replacemod
- replace
V.!
使用未检查的索引VU.unsafeIndex
索引VU.unsafeIndex
(在移动到Data.Vector.Unboxed
运行时设置
- 增加默认分配区域:
-A20M
或-H
此外,请检查您的algorithm是否与C#相同,并使用相同的数据结构。