Literature Review: Exploring the Role of Large Language Models in Cybersecurity: A Systematic Survey

Summary

  • Provides a systematic survey on the applications of Large Language Models (LLMs) in cybersecurity, organized by the cyber attack lifecycle: reconnaissance, foothold establishment, lateral movement, data exfiltration, and post-exfiltration.
  • Highlights LLMs’ strengths in:
    • Detecting covert reconnaissance (system-based and human-based, e.g., phishing, malware) using advanced pattern recognition, reasoning, and few-shot learning.
    • Automating vulnerability detection, analysis, and patching in the foothold phase, leveraging code property graphs, fine-tuning, and prompt engineering.
    • Enhancing intrusion detection, anomaly detection, and endpoint response during lateral movement through log, traffic, and endpoint data analysis.
  • Identifies a research gap: limited LLM application for post-intrusion scenarios (data exfiltration, post-exfiltration) and calls for more research in these areas.
  • Details LLMs’ roles in Cyber Threat Intelligence (CTI):
    • Automating CTI collection and processing from unstructured sources (forums, reports).
    • Structuring intelligence into knowledge graphs and extracting actionable insights.
    • Providing strategic reasoning and defense recommendations.
  • Discusses deployment strategies:
    • Traditional networks: centralized/cloud-based LLM deployment.
    • Next-generation networks (IoT, 6G, SAGIN): resource constraints addressed via model pruning, federated/distributed/split learning, and hybrid architectures.
  • Reviews security risks:
    • External: prompt injection and data poisoning attacks, with mitigation via structured input, fine-tuning, and output validation.
    • Inherent: hallucinations/misinformation, mitigated by RAG, factuality-enhanced training, and mediator components.
  • Outlines open problems and future directions:
    • Need for high-quality, domain-specific datasets; overcoming input length limitations.
    • Defending against targeted and pollution attacks.
    • Improving real-time inference, interpretability, and deployment in next-gen networks.
    • Expanding LLM application coverage and ensuring transparency/reproducibility.

Key Insights

1. LLMs Across the Cyber Attack Lifecycle

  • Reconnaissance Phase: LLMs excel at detecting both system-based (i.e. network scans, log analysis) and human-based (i.e. phishing, social engineering) reconnaissance through advanced pattern recognition, reasoning, and few-shot learning. LLM-powered honeypots (i.e. shelLM) can convincingly simulate real systems to deceive attackers.
  • Foothold Establishment: LLMs are effective in vulnerability detection, analysis, and patching. Techniques like code property graphs, multitask fine-tuning, and prompt engineering allow LLMs to identify and remediate vulnerabilities, generate patches, and validate repairs. However, input length limitations remain a bottleneck for large-scale or cross-file vulnerabilities.
  • Lateral Movement: LLMs enhance intrusion detection systems (IDS), anomaly detection, and endpoint detection and response (EDR) by parsing complex logs, network traffic, and endpoint data. Their adaptability allows for improved detection of novel or sophisticated attacks compared to rule-based systems.
  • Post-Intrusion Gaps: The survey highlights a significant research gap in LLM applications for post-intrusion scenarios (data exfiltration, post-exfiltration), calling for more work on defense strategies in these critical phases.

2. LLMs in Cyber Threat Intelligence (CTI)

  • LLMs automate CTI collection, processing, and analysis, extracting actionable intelligence from unstructured sources like cybercrime forums and threat reports.
  • They facilitate the conversion of CTI into structured knowledge graphs, generate organization-specific intelligence, and support strategic reasoning for defense recommendations.
  • LLMs reduce manual effort and improve the timeliness and quality of CTI, but face risks from data source contamination and adversarial manipulation.

3. Deployment in Network Scenarios

  • Traditional Networks: Centralized deployment in cloud or local servers is common, leveraging LLMs for detection, analysis, and policy generation.
  • Next-Generation Networks: Resource constraints and heterogeneity in IoT, 6G, and integrated networks present deployment challenges. Solutions include model pruning, federated learning, distributed and split learning, and hybrid microservice architectures. LLMs are poised to become integral to intelligent, adaptive security management in these environments.

4. Security Risks of LLMs

  • External Threats: Prompt injection and data poisoning attacks threaten LLM integrity. Defense strategies include structured input separation, task-specific fine-tuning, and output validation.
  • Inherent Risks: Hallucinations and misinformation can lead to incorrect or unsafe outputs, potentially undermining security systems. Techniques like RAG, factuality-enhanced training, and mediator components are proposed to mitigate these risks.

5. Open Problems and Research Directions

  • Scarcity of high-quality, domain-specific datasets and input length limitations hinder LLM performance in complex cybersecurity tasks.
  • Targeted attacks, slow inference in real-time systems, and over-reliance on black-box models raise concerns about reliability and reproducibility.
  • Future research should focus on open datasets, breaking input length barriers, robust defense against adversarial contamination, semantic data transformation, interpretability, and expanding LLM coverage to next-generation networks.

Example

LLM-Driven Phishing Detection:

  • ChatSpamDetector preprocesses emails for LLM analysis, assigns a spam-detection role via prompt engineering, and uses chain-of-thought prompting to decompose the detection task. This approach enables stepwise, explainable phishing detection, outperforming traditional methods in adaptability and detection of novel attack vectors.

Ratings

Category Score Rationale
Novelty N/A Survey Paper
Technical Contribution N/A Survey Paper
Readability 4 Well-structured, with clear explanations, comparative tables, and practical examples. Some technical depth may challenge non-experts.




Enjoy Reading This Article?

Here are some more articles you might like to read next:

  • Literature Review: Attack and Defense Techniques in Large Language Models: A Survey and New Perspectives
  • Literature Review: Large Language Models are Autonomous Cyber Defenders
  • Literature Review: Sugar-Coated Poison: Benign Generation Unlocks LLM Jailbreaking
  • Literature Review: Auto-Patching: Enhancing Multi-Hop Reasoning in Language Models
  • Literature Review: RedCode: Risky Code Execution and Generation Benchmark for Code Agents