在c release层面
分辨率 H*W 1280 720 | 耗时(MS) | 包含基本操作 |
---|---|---|
3*3 avg filter 普通写法 | 1.02 | 读写 +: 8 / :1 |
5*5 avg filter 普通写法 | 2.54 | 读写 + 24 /:1 |
3*3 winsum avg filter | 0.93 | 读写 +: 2 /: 1 other prepare for window |
5*5 winsum avg filter | 0.94 | 读写 +: 2 /: 1 other prepare for window |
2 to 1 down sample | 0.23 | 读写 |
1/4 H*W upsample (video filter) | 1.52 | 读写 +:4 / :5 |
1/16 H*W upsample(video filter) | 1.51 | 读写 +:4 / :5 |
1/4 H*W upsample (specialRatio_bilinear up) | 0.60 | 读写 +:4 / :1 |
1/16 H*W upsample (specialRatio_bilinear up) | 0.44 | 读写 +:4 / :1 |
3*3 avg filter 1/4 HxW 普通写法 | 0.26 | |
5*5 avg filter 1/4 HxW winsum | 0.24 |
手写单指令多数据 armv8 加速效果
分辨率 H*W 1280 720 | 耗时(MS) | 包含基本操作 |
---|---|---|
3*3 avg filter 普通写法 | ||
5*5 avg filter 普通写法 | ||
3*3 winsum avg filter | 0.13 | |
5*5 winsum avg filter | 0.14 | |
2 to 1 down sample | ||
1/4 H*W upsample (video filter) | ||
1/16 H*W upsample(video filter) | ||
1/4 H*W upsample (specialRatio_bilinear up) | ||
1/16 H*W upsample (specialRatio_bilinear up) | ||
3*3 avg filter 256x144 winsum | 0.01 | |
5*5 avg filter 256x144 winsum | 0.01 |
单指令多数据常用操作 intrinsic
mla 乘加或者乘减比分开做要快
循环费时