Instruction Finetuning DeepSeek-R1-8B Model Using LoRA and NEFTune
Quick Answer
The DeepSeek-R1-8B model, enhanced with LoRA and NEFTune, achieves a micro-F1 score of 0.912 for financial NER, outperforming Llama3-8B and other benchmarks.
Quick Take
The DeepSeek-R1-8B model, enhanced with LoRA and NEFTune, achieves a micro-F1 score of 0.912 for financial NER, outperforming Llama3-8B and other benchmarks. This approach effectively addresses misclassification of financial entities in unstructured reports, benefiting financial analysts and researchers.
Key Points
- DeepSeek-R1-8B model utilizes LoRA and NEFTune for financial NER.
- Achieved a micro-F1 score of 0.901 with LoRA, improved to 0.912 with NEFTune.
- Outperformed Llama3-8B, Qwen3-8B, Baichuan2-7B, T5, and BERT-Base.
- Study based on a corpus of 1693 annotated financial samples.
- Addresses misclassification of financial entities in unstructured texts.
Paper Resources
Article Excerpt
From source RSS / original summaryarXiv:2606. 10392v1 Announce Type: new Abstract: Financial named-entity recognition (NER) is essential for translating unstructured financial reports and news into structured knowledge graphs. However, general-purpose large language models (LLMs) often misclassify financial entities or ignore domain-specific patterns. This paper investigates the use of DeepSeek-R1-8B, a recent open-source large language model, combined with Low-Rank Adaptation (LoRA) and Noisy Embedding Fine-Tuning (NEFTune) for financial NER.
Each annotated sentence in our corpus of 1693 samples is converted into an instruction-input-output triple. We insert lightweight LoRA matrices into the Transformer layers and apply NEFTune to improve generalisation by adding uniform noise to embedding vectors during training. Experiments show that the LoRA-adapted DeepSeek-R1-8B achieves a micro-F1 of 0. 901 on seven entity types (Company, Date, Location, Money, Person, Product and Quantity), and adding NEFTune further boosts the micro-F1 to 0.
912, outperforming Llama3-8B, Qwen3-8B, Baichuan2-7B, T5 and BERT-Base baselines.
Reader Mode unavailable (could not extract clean content).
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.AI
See more →Arbor: Tree Search as a Cognition Layer for Autonomous Agents
Arbor introduces a multi-agent framework utilizing structured tree search for optimizing LLM inference, achieving up to 193% throughput-latency improvement compared to vendor-optimized systems. It employs an Orchestrator and Critic agent for stability and coordination, demonstrating hardware-agnostic performance with minimal variance.