This is a follow-up to my latest post, where I ported a hand-written digit recognition network to Vulkan. This will be a short, non-technical article with a few thoughts and observations.
Intro
Now that I have a working application, I wanted to try running it on my mobile phone. The beauty of C++ and Vulkan is that porting is very straightforward. I quickly vibe-coded an Android wrapper, and the main application code didn’t require any changes - it just worked.
What surprised me the most is that I’m able to use my old Xiaomi Mi 9. It’s so old that it’s stuck on Android 10 with no possibility of upgrading. The latest security patch is dated January 1, 2021. Fortunately, early on I decided to target Vulkan 1.0 without relying on any extensions, so the application runs perfectly even on such an old device.
I did have to fix a couple of bugs. For example, I forgot to reset command buffers and instead allocated a new one for every training batch. On a desktop GPU with gigabytes of memory, I never hit any limitations. On the phone, however, I quickly ran out of memory inside the command pool. Fortunately, the fix was trivial.
Everything else - dispatch, synchronization, pipelines, and shaders - worked exactly as expected. So there’s not much more to say about the port itself.
Performance
Of course, the compute power of my phone is only a tiny fraction of a desktop NVIDIA monster, so there are no miracles here. Training starts automatically when the application launches, and it takes roughly nine minutes to train the network.
Inference performance is also nowhere near the desktop version. However, the network is small enough that the difference is largely irrelevant in practice, as predictions still appear instantaneous to the user.
Conclusion
The point of this experiment is simple: it is absolutely possible to train and run a neural network on a mobile device.
With C++ and Vulkan, it really feels like “write once, run everywhere.” When you think about the combined compute power of billions of mobile devices around the world, it’s fascinating to imagine what could be achieved if even a small fraction of that power could be used for distributed neural network computations.
I am using a static generator (Hugo) to build this site, so there is no comment section directly here. As a personal experiment, I published a short post on LinkedIn pointing to this article. If you have a question, you can ask it there. If you want to follow for updates, you can also follow me there.