Improving Quantized Model Performance in Qualitative Analysis with Multi-Pass Prompt Verification

arXiv cs.CL·Aisvarya Adeseye, Jouni Isoaho, Adeyemi Adeseye

8h ago

·~2 min·5/21/2026·en·0

Quick Take

This study enhances quantized LLM performance in qualitative analysis using multi-pass prompt verification.

Key Points

Examines quantization effects on LLaMA-3.1 performance.
Proposes a method to reduce hallucinations in low-bit models.
8-bit models yield closest results to gold-standard ground truth.

📖 Reader Mode

~2 min read

[Submitted on 4 Apr 2026]

View PDF HTML (experimental)

Abstract:Quantized Large Language Models (LLMs) are used more often in qualitative analysis because they run fast and need fewer computing resources. This study examines how different lower bits quantization levels (8-bit, 4-bit, 3-bit, and 2-bit) and quantization types affect the performance of LLaMA-3.1 (8B) on qualitative analysis. The study uses expert and non-expert responses from 82 interview transcripts. Low-bit models often produce higher levels of hallucinations and unstable results, especially when reading non-expert language with unclear terms. To improve performance, we propose a quantization-aware multi-pass prompt verification method. This method guides the model through controlled steps that reduce hallucinations. It removes unreliable content and passes the results to the next transcript after verification, improving accuracy. To validate performance, human coders analyzed transcripts using NVivo and BF16 LLaMA. BF16 LLaMA-3.1 produced high-precision output but had semantic drift and hallucination. These errors were corrected manually. The corrected BF16 output and NVivo human coding were combined to create a gold-standard ground truth (GSGT) for thematic extraction and frequency analysis. The results show that 8-bit models stay closest to the GSGT. The 4-bit models lose accuracy but become stable when the proposed method is applied. The 3-bit and 2-bit models drop in performance because of heavy compression, but they improve with the proposed prompt design and verification. The study also finds that models at the same bit level behave differently depending on quantization type. Overall, the method helps low-resource LLMs become more stable, accurate, and suitable for qualitative research at lower cost.

Comments:	Accepted to publish in 12th Intelligent Systems Conference 2026; 3-4 September 2026 in Amsterdam, The Netherlands
Subjects:	Computation and Language (cs.CL); Artificial Intelligence (cs.AI); Machine Learning (cs.LG)
Cite as:	arXiv:2605.20193 [cs.CL]
	(or arXiv:2605.20193v1 [cs.CL] for this version)
	https://doi.org/10.48550/arXiv.2605.20193 arXiv-issued DOI via DataCite

Submission history

From: Aisvarya Adeseye Mrs [view email]
[v1] Sat, 4 Apr 2026 04:50:03 UTC (8,180 KB)

— Originally published at arxiv.org

Continue reading on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

Improving Quantized Model Performance in Qualitative Analysis with Multi-Pass Prompt Verification

Quick Take

Key Points

📖 Reader Mode

Submission history

Want this in your inbox every morning?

More from arXiv cs.CL

Time to REFLECT: Can We Trust LLM Judges for Evidence-based Research Agents?

Diagnosing Multi-step Reasoning Failures in Black-box LLMs via Stepwise Confidence Attribution

Mix-Quant: Quantized Prefilling, Precise Decoding for Agentic LLMs

Related in this space

From Prompts to Protocols: An AI Agent for Laboratory Automation

Agentic Trading: When LLM Agents Meet Financial Markets