From Known to Unknown: Shifting Cybersecurity to Proactive AI Detection

Introduction

Everyday, Cyber Security teams are faced with a chicken-and-egg problem to detect and contain bad things. As I’ll discuss later in this article, it’s my belief that cyber teams shift allocation of resources to engineering of unsupervised anomaly detections to hunt for broader and deeper unknown threats.

Threat Intelligence Teams and Red Teams alike must assist in creating and discovering indicators of compromise. Ultimately, a team must have some knowledge of an attack chain, the indicators on malware profiles, the network telemetry, system memory, in system logs or audit logs that create some known pattern of what “bad” looks like.

Let’s not forget, the adversary is doing the same, researching your detections, reverse engineering EDR solutions and creating new inventive ways to hide in plain sight using well known good patterns.

Today, I’ll keep to the practical theory based on what I’m seeing in throughout general cyber security industry and emerging vendors. Although, we won’t dive deeply into how to build the ETL and how and what features to train the models with or the hyper parameters to tune the models, we will cover the business theory and practical options to assist you if/when you purchase an AI cyber solution or whether you want to build out the capabilities yourself.

Business Theory

To simplify an otherwise techy cyber security problem into a simple business problem, I’d like to use the old school, Know What You Don’t Know (KWYDK) device taught back during business school.

Knowledge Matrix. (Source)

When I think of where “Blue/Purple Teams’ live, it’s typically but not always, in a world of

Known Known
Known Unknowns
Unknown Knowns

So What?

Well, it’s my view that most times cyber security teams are reactive in nature (known knowns) while trying to be proactive to include Known Unknowns and Unknown Knowns.

To me, most teams search for indicators of compromise (IoC) using threat intelligence that can be used to match a pattern that may be an attack or bad behavior.

There lies the problem, to create an detection you must know what to detect or at the very least be aware of some common indicators of a behavior you don’t fully understand. Either way, you need to be aware of something and know something to build detection for it.

However, emerging threats and zero days may have no or little indicator of compromise. It’s this gap in awareness and knowledge that an attacker will benefit from.

Boiling it down further, I’d say very generically, you might be able to bucket cyber security threats and detections using the framework. Although, not perfect science, I find it a useful thought experiment to illustrate the concepts later in this article.

	Known	Unknown
Known	Known TTPs with public exploits on ExploitDB Known TTPs with public exploits in MetaSploit or similar tooling Other TTPs or malware where threat intel is publicly shared	Attacks happening in rapidly emerging technology such as AI/ML Attacks happening in Quantum computing and emerging super computing Attacks happening in VR environments and meta worlds Zeros day exploits on new languages, new products
Unknown	Known Attacks TTPs used in private sector but Intel not made public (actors, regional sources, time signature, code signatures, domains, source networks ) Known Attacks TTPs used in public sector but pertinent Intel not made public (actors, regional sources, time signature, code signatures, domains, source networks)	Threats combining Super Computing + AI LLM uses Threats combining Super Computing + AI Social Engineering audio and video Threats combining AI + AGI + Zeros day exploits on new languages, new products Data Exfiltration using super computing, VR and meta worlds.

We may never truly be able to define an “Unknown Unknown” the thought exercise leads me to my personal belief and realization that chasing unknown knowns and knowns unknowns will inevitably be a reactive game where the attacker always has first mover advantage.

It this reason, I wanted to share my design philosophy and some general knowledge to those who may be interested in building out AI/ML capabilities within the cyber field.

The THeory

I believe this business problem can, to some extent, be solved by shifting from building reactive detection based on the known to allocating resources to building proactive anomalies detection of the unknown.

Once you build an anomaly detection capability with a reasonable error rate, you might use these signals to research emerging TTPs that were otherwise unknown before.

Meaning the “hunt” should start with large scale data processing of either human or non-human based anomalies.

Each AI algorithm generating an inference with an anomaly score that can be collectively chained together with other signals to create a holistic and collective measurement of anomalies across the full cyber attack chain.

For example, let’s say an AD service account attempts to update an AD groups that falls outside of it’s ACLs. While the same AD service accounts regularly is used once every minute to simply query and read group membership but never insert or update. What is normally a predictable non-human identity programmatic behavior now behaves unpredictably.

By itself, this could be serious issue or simply a false positive, maybe an ops engineer trying something manually as a test.

Now imagine, the same host system and IP address communicating some N number of bytes of packets egressing from internal networks to the public internet. With is otherwise predictable machine behavior for internals communication seems to behave unpredictably.

Around the same time, we observe a memory corruption issues being emitted from the host indicating potential overflows. When historically, the system has a low error rate.

Finally, we observe multiple reads of all secrets from the vault that stores the service accounts credential. While normally, we only see these secret fetched one time every so often.

Individually, any event from these systems by themselves might be nothing more than noise in a sea of billions and billions of event logs, while observing all events collectively chained chained together from a shared host or IP should would be more seriously investigated for root cause.

The anomaly chain is neither bad, nor good, right or wrong, it simply is … an unusual pattern of behavior that appears to break from some historical pattern. Typically, workflows overtime behave in a predictable way. For example, a custom automation script uses the same conditional logic and same API calls in batch and generally speaking human tend to login from the same devices, same regions around the same hours with slightly more volatility and unpredictability than the non-human use-cases.

But trying to build custom detection for a every user, every network host and IP, for every program and service account and every log stream is not a scalable or practical solution for most.

The volume of data and unique combination and permutations of features ultimately means IoCs are often written either very generically or specifically for the highest risk applications. In my view, potentially leaving more room for the unknown threats.

However, I see the emerging commercialization of low code/no code ETL and AI/ML algorithms on PaaS hosting platforms that have arguably made it more feasible to build custom AI/ML inferencing models much like you would build custom SIEM detections.

Using models such as Random Cut Forest (RCF), LSTM, GANN and more are most likely best suited for cyber security anomaly detection.

Additionally using low code solution like Amazon Glue and SageMaker can assist in the full transformation and training process while avoiding the heavy lifting of managing infrastructure.

Compared to a few years back, the combination of low/code no code services and commercially available algorithms and training solutions now reduces the barrier to entry for teams with less resources.

If you’re considering building or buying a solution, what are some algorithms you should investigate and why?

Examples of Commercial Algorithm Options

Model	Pros	Cons	Best For
RCF	Unsupervised, efficient, multivariate	No context, limited temporal insights	Multivariate outlier detection (e.g., logs, IoT data).
LSTM	Sequential modeling, temporal insights	Complex, resource-intensive	Time-series anomalies (e.g., spikes, trends in logs).
S-H-ESD	Simple, handles seasonality	Limited flexibility	Seasonal anomalies (e.g., periodic system logs).
Isolation Forest	Fast, handles high-dimensional data	Poor for time-series	Batch outliers (e.g., transaction or traffic anomalies).
XGBoost	High accuracy, interpretable features	Needs labels	Fraud detection, labeled anomaly classification.
GNNs	Models relationships, scalable	Complex setup	Graph-based anomalies (e.g., user or device interactions).

Although this article was meant to be an introduction, it feels important to highlight that not all models are created equal.

Some models will favor time series data such as dramatic spikes in requests rates signaling DdoS while other models may support multi variate analysis in batch and other models may support multi variate over time while storing time series data in memory for correlation of events between variables over time dimension.

You may also notice, that in many of these cases, I am illustrating unsupervised learning models which quite often ingest a set of features (data attributes) and use some form of node mapping, forest mapping, branching etc. to determine outliers.

The algorithms may differ and their practical application differs as well. For a cyber security expert new to this domain, it’s important to understand some models focus on a singular variable while others focus on many and some focus on relationships between variable and some do not and not all algorithms have the capability to store relationships and correlations over time.

However, at the core most are trying to slice the data based on some paramaters you pre configure to find common outliers. AWS provides as great example illustration of RCF algorithm below.

It’s also important to understand that some type of compissite key must be used in order to chain the AI inferences (anomaly detections) together. For example, if we had two algorithms which both found an outlier like below.

Algorithm 1 – Inference

Algorithm 2 – Inference

Then much like a security operation analyst, you likely want go investigate whether they anomalies are coming from the same network device, the same computer, the same account or some other common denominator to determine a relationships between the actions.

In cyber security this is important, because we often think it terms of cyber attack chains, where various components of an overall attack surface can be exploited in a novel way to create a chain of events, that when put together, ultimately results in compromise.

A single event does not mean compromise, and therefore whether purchasing or building an AI/ML capability, you should ensure you have the proper model types, and the proper chaining of models, the proper features (data attributes/dimensions) to create a reasonable statistical representation of a possible attack chain.

Finally, unsupervised models may provide breadth for the most generic unknowns. While, future insights from unsupervised models might be labeled and used to train very specific supervised models as well.

However, it’s my opinion, that diving into a supervised model, which requires a dataset of known anomalies to be labeled falls victim to the chicken-and-egg problem again, which is why I believe that the supervised model is an output of the unsupervised model datasets. Leveraging unknown to further refine the known instead of using the known to try and find the unknown …

If I were to suggest a generic lifecycles, it would be as follows.

The Summary

In this article, we just scratched the surface and I covered business driver for shifting cyber security blue teams left to AI/ML unsupervised models to proactively detect unknowns instead of building attack IoC signatures based on reactive threat research of known patterns.

We did not cover, the engineering and implementation necessary to build the data lake, secure the data lakes, build the ETL to standard format acceptable for each unique algorithm or hosting the inference etc. etc.

That scope would be a separate series of articles, and quite frankly, you’d have to hire me first (-8. Why buy the cow when you can get the milk for free?

Hopefully, this article helped provide a Cyber and Business perspective into what is otherwise an often overly complex and techy lingo filled domain. These algorithms, training and hosting solutions are becoming more commercially available to both build and buy compared to years past and it’s my personal opinion that it is becoming more feasible to shift the allocation of cyber security resources to AI/ML engineering and or at least SaaS COTS product operations.

It’s my personal belief, that this strategy will help detect bad things, minimize harm done to innocent people and bring visibility and a more scalable approach for handling billions of events and ultimately find that needle in a haystack.

— Happy Hunting

From Known to Unknown: Shifting Cybersecurity to Proactive AI Detection

Introduction

Business Theory

The THeory

The Summary

Published by @s3cs&man

Leave a comment Cancel reply

Introduction

Business Theory

The THeory

The Summary

Share this:

Related

Published by @s3cs&man

Leave a comment Cancel reply