Spooky Times: LLM vs. Knowledge Graph vs. Machine Learning — Who Reigns Supreme?

AIRRIVED
6 min readOct 31, 2024

--

For years, Machine Learning (ML) has been heralded as the “savior” of cybersecurity, bringing automation, data-driven insights, and proactive threat detection to the forefront of cyber defense. ML allowed us to spot patterns and anomalies in mountains of data, taking cybersecurity from reactive to proactive. However, as adversaries evolved their tactics, ML started hitting its limits, especially when dealing with complex, real-time threats. Enter Knowledge Graphs (KGs) — structured, interconnected networks that promised to augment ML’s capabilities by mapping known relationships between entities like indicators of compromise (IOCs), tactics, techniques, and threat actors.

But just as Knowledge Graphs seemed poised to redefine threat intelligence, a revolutionary technology emerged: Large Language Models (LLMs), powered by advances in natural language processing and fine-tuning capabilities. Unlike anything before them, fine-tuned LLMs have the power to dynamically interpret and contextualize unstructured data, adapting in real time as new threats unfold. This shift from Machine Learning to Knowledge Graphs to Large Language Models is not merely an upgrade; it’s a transformation. The game of cybersecurity is about to change.

Let’s dive into this journey, exploring how each technology has shaped cybersecurity and how LLMs are redefining what’s possible.

Machine Learning: The First Revolution in Cybersecurity

When ML came onto the scene in cybersecurity, it was a game-changer. Algorithms trained on large datasets could recognize suspicious patterns, enabling anomaly detection, predictive analytics, and automated responses. For the first time, we could identify threats before they fully emerged.

However, ML has its limitations:

Lack of Contextual Understanding: ML models often rely on statistical patterns rather than an understanding of domain-specific language, leading to high false positives in complex environments.

Static Learning: Traditional ML models are usually trained on historical data. When attackers adapt or innovate, ML can struggle to detect new patterns unless it’s frequently retrained, a resource-intensive process.

Difficulty Handling Unstructured Data: Cyber threats often arrive in unstructured, real-time formats (e.g., threat advisories, social media intelligence). Machine Learning models were not designed to make sense of these ambiguous, often nuanced information streams.

In short, while ML took cybersecurity a significant step forward, its limitations became increasingly clear as attackers grew more sophisticated.

Knowledge Graphs: The Savior of Machine Learning

Knowledge Graphs emerged as a solution to some of these limitations. By structuring data into interconnected entities — like linking threat actors with specific TTPs (tactics, techniques, and procedures) and IOCs — Knowledge Graphs brought an understanding of relationships and context that Machine Learning alone couldn’t provide.

For example, in a Knowledge Graph, APT29 might be linked to:

T1059 (Command and Scripting Interpreter) using PowerShell

• Known IP addresses and domains associated with its campaigns

• Targeted industries like finance and government

This structured mapping made it possible to identify patterns based on known relationships, helping analysts spot connections between seemingly disparate indicators. Knowledge Graphs were especially valuable in threat attribution, where linking entities to known threat actors could lead to faster, more accurate responses.

However, even KGs had limitations:

Static and Predefined Relationships: While KGs are great at representing known relationships, they struggle with the dynamic and evolving nature of cybersecurity threats.

Difficulty Interpreting Unstructured Data: Like ML, KGs lack the flexibility to interpret unstructured or ambiguous threat intelligence, such as a vague report suggesting “increased usage of cloud infrastructure for data exfiltration.”

Inability to Hypothesize or Infer in Real Time: KGs require manual updating and predefined rules. They can’t generate hypotheses or recommend actions when presented with novel or ambiguous data.

KGs brought cybersecurity to a new level by adding structure and context to ML models, but in a rapidly evolving threat landscape, they too began to show limitations.

Enter Large Language Models: The Next Frontier

Just as KGs were beginning to feel limited in the face of increasingly sophisticated attacks, a groundbreaking development arrived: Large Language Models (LLMs). Trained on vast amounts of data and fine-tuned to understand specific domains, LLMs have an unprecedented capacity to interpret, contextualize, and act on cybersecurity data in real time.

Unlike ML or KGs, a fine-tuned LLM doesn’t just identify keywords or rely on predefined connections. Instead, it reads, interprets, and generates insights from unstructured data with context-sensitive reasoning, even when presented with novel or ambiguous information.

Real-World Example: Incident Response with APT29

Let’s put this into context. Imagine a cybersecurity team receiving an OSINT alert with the following message:

“Recent observations indicate APT29 using stealthy lateral movement in financial networks, relying on encrypted C2 communication over unusual ports and increased usage of cloud infrastructure for data exfiltration. Initial access was reportedly gained through phishing emails with embedded macro scripts.”

Here’s how each technology would handle this intelligence.

Machine Learning:

• A general ML model might pick out keywords like “C2 communication” and “phishing.”

• It could flag the alert as high-risk, but it lacks the nuanced understanding to connect specific tactics (like “encrypted C2 over unusual ports”) with real-time actionable steps (e.g., monitoring non-standard HTTPS ports).

• The outcome? A basic alert without clear, prioritized actions.

Knowledge Graph:

• A KG would map this alert to known entities associated with APT29, linking the phishing tactic with initial access methods and recognizing C2 activity as a potential threat.

• However, it cannot interpret new tactics (like “increased usage of cloud infrastructure”) or dynamically prioritize responses. It would require manual updating to include new connections, limiting its real-time effectiveness.

• The result? A structured but static mapping without adaptive recommendations.

Fine-Tuned LLM:

• The LLM interprets the OSINT alert in real time, recognizing that “increased usage of cloud infrastructure” likely involves new data exfiltration methods.

• It dynamically suggests that the team monitor outbound traffic to cloud providers and inspect HTTPS traffic on unusual ports like 8443 or 8080, providing actionable, specific recommendations based on the observed tactics.

• Furthermore, the LLM hypothesizes that APT29 may be shifting to a low-and-slow exfiltration tactic to evade detection, informing the team’s threat-hunting strategy and enabling a more proactive response.

In this scenario, the LLM not only reads and interprets the alert but understands the tactical implications and provides real-time, adaptive responses, something neither ML nor KGs alone could achieve.

The Game is About to Change

The evolution from Machine Learning to Knowledge Graphs to Large Language Models represents a fundamental shift in cybersecurity. While ML and KGs laid the groundwork for today’s defenses, they lack the real-time, adaptive intelligence needed in a world where threat actors constantly evolve.

With fine-tuned LLMs, we’re entering a new era where AI doesn’t just react to cyber threats but anticipates, contextualizes, and responds to them in real time. The ability of LLMs to interpret complex, unstructured data, generate insights from ambiguous language, and make adaptive recommendations marks a transformative leap in cybersecurity capabilities.

The game is about to change — and organizations that harness the power of LLMs will be poised to lead the next frontier of cyber defense, outpacing adversaries with intelligence that’s not only automated but dynamic, contextual, and proactive. The cybersecurity landscape is evolving, and those who adapt will set the pace for a safer digital world.

Final Thoughts

While Machine Learning and Knowledge Graphs will remain essential tools, Large Language Models have unlocked a new level of intelligence that goes beyond detection to offer contextual, adaptive insights. This evolution in AI doesn’t just represent another technology shift; it’s the dawn of a new era in cybersecurity.

So, are you ready to leave the past behind and embrace the future? The answer will define the resilience of your cybersecurity posture in a world where the stakes have never been higher.

--

--