GRID: Graph Representation of Intelligence Data for Security Text Knowledge Graph Construction
Quick Answer
GRID introduces an end-to-end framework for constructing security knowledge graphs from cyber threat intelligence (CTI), achieving 84.62% precision and 64.91% recall using Qwen3-4B-Instruct-2507-based extractors.
Quick Take
GRID introduces an end-to-end framework for constructing security knowledge graphs from cyber threat intelligence (CTI), achieving 84.62% precision and 64.91% recall using Qwen3-4B-Instruct-2507-based extractors. This method outperforms traditional LLM approaches by leveraging a task-bank reward system, reducing costs and improving efficiency in graph construction.
Key Points
- GRID constructs knowledge graphs from CTI articles using traceable article-graph alignments.
- Achieved 84.62% precision and 64.91% recall on 249 CTI articles.
- Utilizes a scripted task bank for more stable task-specific rewards.
- End2End Reward model reached 76.91% precision and 53.85% recall.
- Task-bank rewards can be reused across post-training runs, enhancing efficiency.
Paper Resources
📖 Reader Mode
~2 min readAbstract:Security knowledge graphs can provide computable external memory for security agents, but constructing them from long-form cyber threat intelligence (CTI) remains difficult: LLMs often lack grounded security-domain knowledge, and end-to-end document-to-graph training is hard to supervise with cheap, stable rewards. We present GRID (Graph Representation of Intelligence Data), an end-to-end framework for security text knowledge graph construction. GRID first builds security-domain supervision from CTI articles by creating traceable article-graph alignments through graph extraction and knowledge-graph-conditioned text revision. It then turns document-to-graph learning into a scripted task bank combining four-option multi-select questions with triple-level regex matching targets, yielding more stable task-specific rewards than repeatedly scoring full graph outputs with an LLM judge. Using this supervision pipeline, we train two Qwen3-4B-Instruct-2507-based 4B extractors: a primary Task-bank Reward model and a secondary End2End Reward model with LLM-as-judge precision/recall rewards. On 249 CTI articles from GRID, CASIE, CTINexus, MalKG, and SecureNLP, the Task-bank Reward model with the ontology-guided GRID extraction pipeline reaches 84.62% source-averaged precision, 64.91% source-averaged recall, and 68.53% Avg F1, achieving the best source-averaged recall and near-top Avg F1 with lower token usage and deployment cost. The End2End Reward model reaches 76.91% precision, 53.85% recall, and 58.06% Avg F1. Further analyses show that task-bank rewards can be built once offline and reused across later post-training runs, outperforming online End2End LLM-as-judge reward and weaker alternatives such as Choice-only Reward and End2End SFT without RL.
| Subjects: | Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR) |
| Cite as: | arXiv:2605.16714 [cs.AI] |
| (or arXiv:2605.16714v1 [cs.AI] for this version) | |
| https://doi.org/10.48550/arXiv.2605.16714 arXiv-issued DOI via DataCite (pending registration) |
Submission history
From: Liangyi Huang [view email]
[v1]
Fri, 15 May 2026 23:54:01 UTC (1,825 KB)
— Originally published at arxiv.org
Want this in your inbox every morning?
Daily brief at your local 8am — bilingual EN/中文, free.
More from arXiv cs.AI
See more →Procedural Memory Distillation: Online Reflection for Self-Improving Language Models
Procedural Memory Distillation (PMD) enhances reinforcement learning by converting cross-episode signals into reusable memory, improving Qwen3-8B and OLMo3-Instruct-7B models by 3.8-5.5% on SCIKNOWEVAL and 7.9-13.6% on . The co-evolution of policy and memory allows for more effective self-supervision, demonstrating significant performance gains when both components are active.