VectraYX-Nano: A 42M-Parameter Spanish Cybersecurity Language Model with Curriculum Learning and Native Tool Use
Quick Answer
VectraYX-Nano is a 41.95M-parameter Spanish cybersecurity language model that utilizes curriculum learning and native tool invocation, achieving a conversational gate of 0.78 after extensive training on a diverse 170M-token corpus.
Quick Take
VectraYX-Nano is a 41.95M-parameter Spanish cybersecurity language model that utilizes curriculum learning and native tool invocation, achieving a conversational gate of 0.78 after extensive training on a diverse 170M-token corpus. This model is the first of its kind with end-to-end integration and operates efficiently on commodity hardware.
Key Points
- Trained on a 170M-token corpus focused on cybersecurity and conversational data.
- Utilizes advanced architecture features like GQA, QK-Norm, and RMSNorm.
- Achieved a loss reduction from 9.80 to 2.16 through continual pre-training.
- First Spanish-native cybersecurity LLM with Model Context Protocol integration.
- GGUF artifact is 81 MB and runs efficiently on standard hardware.
Paper Resources
Article Content
From source RSS / original summaryarXiv:2605. 13989v1 Announce Type: new Abstract: We present VectraYX-Nano, a 41. 95M-parameter decoder-only language model trained from scratch in Spanish for cybersecurity, with a Latin-American focus and native tool invocation via the (MCP).
Four contributions: (i) Corpus: VectraYX-Sec-ES, a 170M-token Spanish corpus from an eight-VM pipeline (~$25 USD) partitioned into conversational (42M tokens, OpenSubtitles-ES, OASST1), cybersecurity (118M tokens, NVD, Wikipedia-ES, CVE mirror, security blogs), and offensive-security tooling (10M tokens, ExploitDB, HackTricks, OWASP) phases. (ii) Architecture: 42M-parameter Transformer decoder with GQA, QK-Norm, RMSNorm, SwiGLU, RoPE, z-loss, and a 16,384-token byte-fallback BPE.
(iii) Curriculum with replay: continual pre-training with a replay buffer yields monotonic loss descent (9. 80->3. 17->3. 00->2. 16); after SFT on OASST-ES, Alpaca-ES, CVE Q&A, and 6,327 traces, the model attains a conversational gate of 0. 78+-0. 05 (N=4 seeds). (iv) Two findings: a bootstrap-corpus ablation reveals a loss-vs-register inversion at nano scale; a LoRA study shows the B4 tool-selection floor of 0.
000 is a corpus-density artifact, not a capacity gate -- a tool-dense corpus (2,801 examples) raises B4 to 0. 145+-0. 046 on Nano 42M and 0. 445+-0. 201 on a 260M mid-tier. The GGUF artifact is 81 MB (F16), runs at sub-second TTFT on commodity hardware under llama. cpp, and is to our knowledge the first Spanish-native cybersecurity LLM with end-to-end MCP integration. Corpus recipe, training scripts, GGUF weights, and B1-B5 benchmark are released.
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.CL
See more →Quantifying Prior Dominance in Systems
The study introduces the Normalized Context Utilization (NCU) metric to evaluate Retrieval-Augmented Generation (RAG) systems, revealing that Small Language Models (SLMs) outperform larger models in factual extraction. The findings indicate that traditional scaling laws yield diminishing returns, with a commercial API frequently failing against adversarial evidence due to systemic confidence collapse.


