Lightweight Multimodal LLM-Enabled Cost-Effective Defect Grading of Power Transmission Equipment

arXiv cs.CL·Tao Wang, Lipeng Zhu, Jiayong Li, Feng Gao, Siwen Liang

5/29/2026

·~1 min·5/29/2026·en·2

Quick Answer

Quick Take

This paper presents a novel multimodal large language model (MLLM) framework for defect grading of power transmission equipment, achieving state-of-the-art performance with Qwen3-VL-8B through low-rank adaptation. By leveraging in-context learning and generating interpretable Q&A pairs, it significantly reduces manual annotation costs while effectively addressing class imbalance in defect grading tasks.

Key Points

Introduces a lightweight MLLM framework for defect grading of power transmission equipment.
Achieves state-of-the-art performance on three defect grading tasks with Qwen3-VL-8B.
Reduces manual annotation costs by generating high-quality Q&A pairs.
Demonstrates feasibility of multi-task joint fine-tuning within a single model.
Addresses class imbalance issues prevalent in existing defect grading methods.

Paper Resources

Read Paperarxiv.org View PDFarxiv.org

Article Content

From source RSS / original summary

arXiv:2605. 28822v1 Announce Type: new Abstract: Defect grading of power transmission equipment (DGPTE) is crucial to the stability of electric energy transmission. Although existing machine learning methods exhibit strong capabilities in defect detection, they are plagued by difficulties in integrating expert experience and facing class imbalance in more refined defect grading field. To address this issue, this paper introduces a novel defect grading framework based on multimodal large language model (MLLM).

Specifically, this approach maximizes the commercial MLLMs' potential of DGPTE through in-context learning and obtains the state-of-te-art (SOTA) model. By sending a secondary request to this model, a small number of chain of thought-based question-answer pairs (Q\&As) are generated, which effectively reduces the cost of manual annotation. In this way, these high-quality interpretable Q\&As are used to train Qwen3-VL-8B via Low-Rank Adaption-based supervised fine-tuning (SFT).

Experimental results on three DGPTE tasks demonstrate that fine-tuning only the language model layer yields the SOTA performance. Furthermore, multi-task joint fine-tuning verifies the feasibility of handling multiple grading tasks within only a single lightweight MLLM.

Read on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.CL

See more →

arXiv cs.CL·Miguel Arana-Catania, Catherine Conisbee, Matthew Kidd

1d ago

FeaturedOriginal

Letting the Data Speak: Extracting Keywords from Crowdsourced Collections with AI

AI Summary

The study evaluates three NLP approaches—Named Entity Recognition, Keyword Extraction, and Topic Modelling—using the Their Finest Hour Online Archive to automate keyword extraction from crowdsourced WWII collections. Findings suggest that while NLP methods show promise, no single approach is sufficient, and ethical considerations in automated keyword extraction are crucial for responsible stewardship.

#AI Coding #Inference #Open Source #Policy

Lightweight Multimodal LLM-Enabled Cost-Effective Defect Grading of Power Transmission Equipment

Quick Answer

Quick Take

Key Points

Paper Resources

Article Content

Want this in your inbox every morning?

More from arXiv cs.CL

Letting the Data Speak: Extracting Keywords from Crowdsourced Collections with AI

Quantifying Prior Dominance in Systems

Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?

Quick Answer

Quick Take

Key Points

Paper Resources

Article Content

Want this in your inbox every morning?

More from arXiv cs.CL

Letting the Data Speak: Extracting Keywords from Crowdsourced Collections with AI

Quantifying Prior Dominance in RAG Systems

Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?

Quantifying Prior Dominance in Systems