采用深度学习编译器对深度学习代码进行编译时,在编译器后端会对IR代码进行后端优化,循环优化就包括在后端优化中,后端优化能够加速代码的运行效率。深度学习编译器编译流程如下图所示:
循环优化方式:
循环融合(loop fusion)
1
2
3
4
5
6
7
8
9
10
11
12
13def sayHello():
print("hello")
def sayBye():
print("bye")
# 融合前
for i in range(1000000):
sayHello()
for i in range(1000000):
sayBye()
# 融合后(将两个循环融合为一个)
for i in range(1000000):
sayHello()
sayBye()循环重新排序(loop reorder)
1
2
3
4
5
6
7
8# 重排序前
for i in range(1000000):
for j in range(100):
pass
# 重排序后(采用迭代次数较小的循环驱动内层迭代次数较大的循环能减少内存的消耗)
for i in range(100):
for j in range(1000000):
pass循环展开(loop unrolling)
1
2
3
4
5
6
7
8
9
10
11
12
13
14# 展开前
for i in range(10):
sayHello()
# 展开后
sayHello()
sayHello()
sayHello()
sayHello()
sayHello()
sayHello()
sayHello()
sayHello()
sayHello()
sayHello()循环分块(loop tiling)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17def sum_2d_array(n, A):
sum = 0
for i in range(n):
for j in range(n):
sum += A[i][j]
return sum
def sum_2d_array(n, A) {
sum = 0
block_size = 8
for i in range(0, n, block_size):
for j in range(0, n, block_size):
for bi in range(i, i + block_size):
for bj in range(j, j + block_size):
sum += A[bi][bj]
return sum
}