INSPIRA

Energy-Efficient Machine Learning for Edge Devices: A Practical Guide to Sustainable On-Device Inference

Dr. Rajani Kumari Gora & Dr. Reena Sangwan

The rapid growth of on-device artificial intelligence has transformed how we use smart devices. By running machine learning models locally on edge systems like smartphones, smart cameras, and home automation nodes, we can achieve instant decisions, save network bandwidth, and protect user privacy. However, modern deep learning models require vast amounts of memory and computational power. This makes them difficult to deploy on small edge hardware, which runs on limited battery power and has restricted memory. To resolve this issue, this paper presents a practical guide to 'Green Machine Learning' techniques designed to run neural networks efficiently on low-power devices. We investigate three key optimization methods: reducing number precision (quantization), removing redundant model parameters (pruning), and training compact models using larger systems as guides (knowledge distillation). Using these methods, we evaluate an optimized image recognition model on two real physical systems representing different edge levels: a high-performance single-board computer and a resource-constrained microcontroller. Our findings show that combining pruning with 8-bit quantization reduces model size by up to 84.5% and improves inference speed by 4.3 times with less than a 1.2% drop in accuracy. On microcontrollers, these compression techniques reduced energy consumption by over 90%, allowing the model to fit and execute within very small memory boundaries. Finally, we discuss the trade-offs between performance and accuracy, and outline future directions such as dynamic security and runtime adaptations.

Green AI Edge Inference Quantization Pruning Low-Power Computing TinyML

Gora, R. & Sangwan, R. (2026). Energy-Efficient Machine Learning for Edge Devices: A Practical Guide to Sustainable On-Device Inference. International Journal of Education, Modern Management, Applied Science & Social Science, 08(02(I)), 151–156. https://doi.org/10.62823/IJEMMASSS/8.2(I).9019

Abadade, Y., Temouden, A., Bamoumen, H., Benlahmer, H., & El Qadi, A. (2023). A comprehensive survey on TinyML. IEEE Access, 11, 96892-96914.
Banbury, C., Zhou, C., Fedorov, I., Matas, R., Thakker, U., Gope, D., Reddi, V. J., & Whatmough, P. (2021). Micronets: Neural network architectures for deploying tinyml applications on commodity microcontrollers. Proceedings of Machine Learning and Systems, 3, 517-532.
David, R., Duke, J., Jain, A., Janapa Reddi, V., Jeffries, M., & Ward, J. (2021). TinyMLperf: A benchmark suite for global tinyML systems. Proceedings of the IEEE International Conference on Academic Computing and Systems, 24-35.
Deng, L., Li, G., Han, S., Shi, L., & Yese, Y. (2021). Model compression and hardware acceleration for neural networks on edge devices: A survey. IEEE Transactions on Neural Networks and Learning Systems, 32(11), 4810-4828.
Dutta, L., & Bharali, S. (2021). TinyML meets IoT: A comprehensive survey. Internet of Things, 16, 100461.
Gholami, A., Kim, S., Dong, Z., Yao, Z., Mahoney, M. W., & Keutzer, K. (2021). A survey of quantization methods for efficient neural network inference. arXiv preprint arXiv:2103.13631.
Hoefler, T., Alistarh, D., Ben-Nun, T., Dryden, N., & Peste, A. (2021). Sparsity in deep learning: A survey. Journal of Machine Learning Research, 22(1), 1-124.
Lai, L., Suda, N., & Chandra, V. (2021). Hardware-aware neural architecture search for energy-efficient edge inference: A review. IEEE Design & Test, 38(4), 45-56.
Menghini, M., & others. (2024). Sustainable AI: Energy-efficient deep learning architectures for edge devices. Journal of Systems Architecture, 148, 103052.
Patterson, D., Gonzalez, J., Le, Q., Liang, C., Munguia, L. M., Rothchild, D., So, D. R., Texier, M., & Dean, J. (2021). Carbon emissions and large neural network training. arXiv preprint arXiv:2104.10350.
Patterson, D., Gonzalez, J., Hölzle, U., Le, Q., Liang, C., Munguia, L. M., Rothchild, D., So, D. R., Texier, M., & Dean, J. (2022). The carbon footprint of machine learning training will plateau, then shrink. IEEE Computer, 55(7), 18-28.
Schurgers, C., & others. (2022). Empowering edge intelligence: A comprehensive survey on on-device AI models. IEEE Communications Surveys & Tutorials, 24(3), 1621-1645.
Somvanshi, S., Islam, M. M., & others. (2025). From tiny machine learning to tiny deep learning: A survey. ACM Computing Surveys, 57(2), 1-42.
Tsoukas, V., Gkogkidis, A., Spathoulas, G., & Kakarontzas, G. (2024). A review on the emerging technology of TinyML. ACM Computing Surveys, 56(4), 1-36.
Wang, Y., & others. (2022). Dynamic model compression for edge devices: A survey. ACM Transactions on Embedded Computing Systems, 21(5), 1-25.
Xu, Y., & others. (2023). Lightweight deep learning models for edge devices—A survey. International Journal of Computer Information Systems and Industrial Management Applications, 15, 120-135.
Zhang, Z., & others. (2023). A comprehensive survey on large language model compression for artificial intelligence applications in edge systems. ACM Transactions on Intelligent Systems and Technology, 14(6), 1-28.

INTERNATIONAL JOURNAL OF EDUCATION, MODERN MANAGEMENT, APPLIED SCIENCE & SOCIAL SCIENCE (IJEMMASSS) [ Vol. 8 | No. 2 (I) | April - June, 2026 ]

Energy-Efficient Machine Learning for Edge Devices: A Practical Guide to Sustainable On-Device Inference

Dr. Rajani Kumari Gora & Dr. Reena Sangwan

DOI:

Download Full Paper: