GRID: Graph Representation of Intelligence Data for Security Text Knowledge Graph Construction

arXiv cs.AI·Liangyi Huang, Zichen Liu, Fei Shao, Shang Ma, Mengshi Zhang, Zihao Chen, Yanfang Ye, Xusheng Xiao

5/19/2026

·~2 min·5/19/2026·en·4

Quick Answer

Quick Take

GRID introduces an end-to-end framework for constructing security knowledge graphs from cyber threat intelligence (CTI), achieving 84.62% precision and 64.91% recall using Qwen3-4B-Instruct-2507-based extractors. This method outperforms traditional LLM approaches by leveraging a task-bank reward system, reducing costs and improving efficiency in graph construction.

Key Points

GRID constructs knowledge graphs from CTI articles using traceable article-graph alignments.
Achieved 84.62% precision and 64.91% recall on 249 CTI articles.
Utilizes a scripted task bank for more stable task-specific rewards.
End2End Reward model reached 76.91% precision and 53.85% recall.
Task-bank rewards can be reused across post-training runs, enhancing efficiency.

Paper Resources

Read Paperarxiv.org View PDFarxiv.org

📖 Reader Mode

~2 min read

[Submitted on 15 May 2026]

View PDF HTML (experimental)

Abstract:Security knowledge graphs can provide computable external memory for security agents, but constructing them from long-form cyber threat intelligence (CTI) remains difficult: LLMs often lack grounded security-domain knowledge, and end-to-end document-to-graph training is hard to supervise with cheap, stable rewards. We present GRID (Graph Representation of Intelligence Data), an end-to-end framework for security text knowledge graph construction. GRID first builds security-domain supervision from CTI articles by creating traceable article-graph alignments through graph extraction and knowledge-graph-conditioned text revision. It then turns document-to-graph learning into a scripted task bank combining four-option multi-select questions with triple-level regex matching targets, yielding more stable task-specific rewards than repeatedly scoring full graph outputs with an LLM judge. Using this supervision pipeline, we train two Qwen3-4B-Instruct-2507-based 4B extractors: a primary Task-bank Reward model and a secondary End2End Reward model with LLM-as-judge precision/recall rewards. On 249 CTI articles from GRID, CASIE, CTINexus, MalKG, and SecureNLP, the Task-bank Reward model with the ontology-guided GRID extraction pipeline reaches 84.62% source-averaged precision, 64.91% source-averaged recall, and 68.53% Avg F1, achieving the best source-averaged recall and near-top Avg F1 with lower token usage and deployment cost. The End2End Reward model reaches 76.91% precision, 53.85% recall, and 58.06% Avg F1. Further analyses show that task-bank rewards can be built once offline and reused across later post-training runs, outperforming online End2End LLM-as-judge reward and weaker alternatives such as Choice-only Reward and End2End SFT without RL.

Subjects:	Artificial Intelligence (cs.AI); Cryptography and Security (cs.CR)
Cite as:	arXiv:2605.16714 [cs.AI]
	(or arXiv:2605.16714v1 [cs.AI] for this version)
	https://doi.org/10.48550/arXiv.2605.16714 arXiv-issued DOI via DataCite (pending registration)

Submission history

From: Liangyi Huang [view email]
[v1] Fri, 15 May 2026 23:54:01 UTC (1,825 KB)

— Originally published at arxiv.org

Continue reading on arxiv.org

Want this in your inbox every morning?

Daily brief at your local 8am — bilingual EN/中文, free.

Subscribe — it's free

More from arXiv cs.AI

See more →

arXiv cs.AI·Ye Liu, Srijan Bansal, Bo Pang, Yang Li, Zeyu Leo Liu, Yifei Ming, Zixuan Ke, Shafiq Joty, Semih Yavuz

1d ago

FeaturedOriginal

Procedural Memory Distillation: Online Reflection for Self-Improving Language Models

AI Summary

Procedural Memory Distillation (PMD) enhances reinforcement learning by converting cross-episode signals into reusable memory, improving Qwen3-8B and OLMo3-Instruct-7B models by 3.8-5.5% on SCIKNOWEVAL and 7.9-13.6% on . The co-evolution of policy and memory allows for more effective self-supervision, demonstrating significant performance gains when both components are active.

#LLM #AI Coding #Inference #Policy