Video Merging with AVFoundation and OpenGL ES on iOS: Optimization With Instruments
Jan 10, 2015
Now we have our pretty much all the function working, next we need do some performance optimization.
But first, let’s do some testing. By change video from 640x640 to 1280x720, the reason to choose 1280x720 is the app that we mentioned before Action Movie FX, is using this resolution for video that it got from camera. Other videos like FX and Alpha we’re gonna use 960x720 which is high resolution video the app is gonna use. All movies are available on github. All movies’s duraton is about 5.52 seconds.
Before we did any optimization, our project keeps throwing “Low memory warning”, and sometimes even crashes. If you wonder how GPUImage performance, well, it just crashes immediately. Obviously this is not acceptable.
After optimzation we’re gonna do in this article, our project works quite well, no crashes and we got some comparison result:
-
ipod5 - 5.7 seconds, action movie fx - 10.6 seconds
-
iPhone4s - 6.7s ~ 7.380 seconds , action movie fx - 17.28 seconds
-
iPhone4 - 12.0s~13.0s seconds , action movie fx - 36 seconds
AVFoundataion + OpenGL ES | Action Movie FX | |
---|---|---|
iPhone4 | 12.0s ~ 13.0s | 36.1s |
iPhone4S | 6.7s ~ 7.38s | 17.28s |
iPod5 | 5.7s | 10.6s |
As you see, it is actually quite big difference out there!
The complete code is on the github: ThreeVidoes-Final. You’re free to download and try it.
OpenGL ES Analyzer
Apple has a great documentation covered this topic: Tuning Your OpenGL ES App.
Open XCode, try “Product -> Profile”, select OpenGL ES Analyzer in instruments dialog, then click run button in top left part of instrument.
Here is the result we got:
We could list all those points:
1. Redudant Call
2. Recommend Using VBO
3. Unitialized Texture Data
4. Logical Buffer Store
5. CPU wait on GPU for Finish
6. Draw Call Accessed Vertex Attributes
7. Recommend Using VAO
8. Logic Buffer Load
From my knowledge, I will start with Recommend Using VBO and Recommend Using VAO first, as it is easier to start with. Another reason is that other points like Unitialized Texture Data and CPU wait on GPU for Finish are not quite important or deliberately left like that. Well, Logic Buffer Load and Logical Buffer Store to be honest, I’m not quite sure what they are about and I hope someone could share some light on it :)
Recommend Using VBO
In VideoWriter, here is the current implementation in kickoffRecording method:
Well, how do we use VBO? What’s VBO?
VBO, according to Vertex Buffer Objects:
A Vertex Buffer Object (VBO) is an OpenGL feature that provides methods for uploading vertex data (position, normal vector, color, etc.) to the video device for non-immediate-mode rendering. VBOs offer substantial performance gains over immediate mode rendering primarily because the data resides in the video device memory rather than the system memory and so it can be rendered directly by the video device.
Also, Apple has a great article on it: Best Practices for Working with Vertex Data, which also covers VBO and VAO that we’gonna talk later.
If you feel still it is little bit hard to understand it, and it would be great it has some demo/example code along it, RayWenderlich has a greate blog on it: OpenGL Tutorial for iOS: OpenGL ES 2.0.
Okay, let’s start coding by declaring some constants and struts:
Then we create setupVBO method in initial setup code:
Finally in kick off video writing method:
That’s all for using VBO to improve the vertex performance. Nothing fancy.
Run the code to make sure everything still works.
Recommend Using VAO
Cool, we already implemented the vertex buffer objects, but that’s not the end. We could keep going by introducing VAO(Vertex Array Objects).
Remove the setupVBO method and add a new method setupVAO:
Then we need to slightly modify the code in rendering:
There you go. Run the code to see any differences. Now you have finished vertex optimization, and no more warning in instruments and it got much faster.
#Conclusion
This is the last article in this series.
I have done other cleanup also. But for other points it listed from instruments, I dont’ quite understand how to improve it. I help someone could share some experience on it and make it better.
Beside that, you could use kCVPixelFormatType_420YpCbCr8BiPlanarFullRange over kCVPixelFormatType_32BGRA for better performance, since I dont’ have much time for it, I’m just gonna leave it for someone who is interested on this :)