Lightweight Distilled Transformer-Based Vision Framework for Detection of Forest Fire and Smoke in Real-World Scenes

HASSAN, Akbar; NAWAZ, Tahir; ASADUZZAMAN, Md; HASAN, Mohammad; QURESHI, Waqar S; SHAFAIT, Faisal

STORE Home
Browse

Year

Thesis

Faculty

All Authors

University of Staffordshire Author

Latest Additions
About

About Store

Subjects
Statistics
Login

Lightweight Distilled Transformer-Based Vision Framework for Detection of Forest Fire and Smoke in Real-World Scenes

Tools

Lists

HASSAN, Akbar, NAWAZ, Tahir, ASADUZZAMAN, Md, HASAN, Mohammad, QURESHI, Waqar S and SHAFAIT, Faisal (2025) Lightweight Distilled Transformer-Based Vision Framework for Detection of Forest Fire and Smoke in Real-World Scenes. The Journal of Electronic Imaging (JEI), 34 (3). 033035. ISSN 1560-229X

[thumbnail of Forest_Fire_Manuscript_Accepted_Version.pdf]

Preview

Text
Forest_Fire_Manuscript_Accepted_Version.pdf - AUTHOR'S ACCEPTED Version (default)
Available under License Type Creative Commons Attribution 4.0 International (CC BY 4.0) .
Download (1MB) | Preview

Official URL: https://doi.org/10.1117/1.JEI.34.3.033035

Abstract or description

Forest fires have become a ravaging threat with incidents growing rapidly across the globe. Several approaches for forest fire detection have been presented over the years, however, the need remains for an effective, computationally efficient, and unified vision-based solution, which can easily be deployable on edge devices for real-world applications. To this end, we present a lightweight model based on a distilled vision transformer (D-ViT) to classify forest imagery into fire, smoke and normal scenarios. We used ResNet50 as a teacher model trained on the target dataset and a compressed D-ViT as a student model trained using the knowledge distillation (KD) approach. Unlike existing approaches, the proposed D-ViT framework is computationally efficient with fewer trainable parameters and is unified in terms of detecting both fire and smoke (whichever is dominant) at longer ranges with visible imagery in the scene. For experimental validation, we deployed the model on Jetson Nano board, and performed an extensive evaluation and analysis of the proposed framework on data collected from public online sources, which we have made available on request for use by the research community. The proposed D-ViT model achieves an encouraging performance with a processing speed of 18.84 frames per second (FPS) and accuracy of 94% using soft distillation, thus demonstrating a performance improvement over the 90% accuracy obtained with the ViT (without distillation). A comparison with several other standard deep classification models also shows encouraging results, with a better trade-off between accuracy and computational efficiency.

Item Type:	Article
Additional Information:	© The Authors. Published by SPIE under a Creative Commons Attribution 4.0 International License. Distribution or reproduction of this work in whole or in part requires full attribution of the original publication, including its DOI. [DOI: 10.1117/1.JEI.34.3.033035]
Faculty:	School of Digital, Technologies and Arts > Engineering
Depositing User:	Md ASADUZZAMAN
Date Deposited:	12 Jun 2025 13:11
Last Modified:	13 Jun 2025 04:30
URI:	https://eprints.staffs.ac.uk/id/eprint/9077

Actions (login required)

: View Item