Tommy Skaug
805a34f937
All checks were successful
Export / Explore-GitHub-Actions (push) Successful in 2m19s
463 lines
23 KiB
Markdown
463 lines
23 KiB
Markdown
Over what have become some years, cyber security
|
||
professionals have been working on optimising the sharing of
|
||
information and knowledge. A lot of the efforts have
|
||
recently been focused around intelligence- and data-driven
|
||
teams. Today many of these discussions have ended evolving
|
||
around something related to the STIX format.
|
||
|
||
> Don't use a lot where a little will do
|
||
> – Unknown origin
|
||
|
||
This post features a perspective of the potential of today's
|
||
standard-oriented approach for documenting indicator sets
|
||
related to cyber security threat actors and incidents. It
|
||
turns out we have a longer way to go than expected.
|
||
|
||
For the purpose of this article, an indicator is a
|
||
characteristic or evidence of something unwanted, or hostile
|
||
if you'd like. I like to refer to the military term
|
||
"Indicators & Warnings" in this regard. In other words, an
|
||
indicator isn't necessarily limited to the cyber domain
|
||
alone either. Physical security could be in an even worse
|
||
condition than cyber security when it comes to expressing
|
||
threat indicators. I'll leave the cross-domain discussion
|
||
for another time.
|
||
|
||
## Up Until Today
|
||
|
||
Multiple standards have evolved and disappeared, and one
|
||
that I have been in favor of previously is the OpenIOC 1.1
|
||
standard. However, times are changing, and so are the
|
||
terminology and breadth of how we are able to express the
|
||
intrusion sets.
|
||
|
||
Even though OpenIOC was a very good start, and still is as
|
||
far as I am concerned, it has far been surpassed Cybox and
|
||
ultimately STIX [1] in popularity.
|
||
|
||
STIX is a container, a quite verbose XML format (which is
|
||
turning JSON in 2.0). Cybox is the artefact format [2], for
|
||
malware you have MAEC [3] and so on. Basically it's a set of
|
||
projects collaborating.
|
||
|
||
This all sounds good, right? Not quite. Have a look at the
|
||
OpenIOC to STIX repository on Github [4] and you will find
|
||
that ``stuxnet.stix.xml`` is 202 lines of XML code for 18
|
||
atomic indicators. OpenIOC on the other hand, is 91 lines,
|
||
and that is a verbose format as well. In fact the overhead
|
||
ratio of the STIX file is about 10:1, while OpenIOC is about
|
||
5:1.
|
||
|
||
To add to the mind-blowing inefficiency I have yet to see,
|
||
on a regular basis, complex and nested expressions of an
|
||
actor or a campaign in the STIX format.
|
||
|
||
Before you continue, do a simple Google search for "STIX
|
||
editor" and "cybox editor". Do it now, and while you are at
|
||
it google for "openioc editor" as well. Hello guys, these
|
||
standards have been going around for many years. So, how
|
||
should we interpret that there aren't any user friendly
|
||
approaches to using them? The closest I've come is through
|
||
MISP, and that is generally speaking not using these
|
||
standards for their internal workings either. This one on
|
||
the MISP GitHub issue tracker says it all: STIX 2.x support
|
||
(MISP) [5].
|
||
|
||
I'm sure that some may disagree with the above statements,
|
||
calling out the infancy of these formats. However, they
|
||
can't be said to be new standards anymore. They are just too
|
||
complex. One example of such is the graph-oriented relations
|
||
implemented into the formats. Why not just let a graph
|
||
database take care of these instead?
|
||
|
||
This is not just a post to establish the current state. How
|
||
would a better approach look?
|
||
|
||
## What Is The Problem to Be Solved?
|
||
|
||
Back to where things have gone since the OpenIOC 1.1/atomic
|
||
indicator days. The most promising addition, in my opinion,
|
||
is the MITRE PRE-ATT&CK and ATT&CK frameworks. The two
|
||
frameworks builds on a less structured approach than seen
|
||
for atomic indicators (Lockheed's Kill-Chain). The latter
|
||
can for instance be viewed in form of the Intelligence
|
||
Pyramid.
|
||
|
||
The Intelligence Pyramid's abstraction levels can be mapped
|
||
against what it is supposed to support when it comes to
|
||
indicators like the following:
|
||
|
||
| Level of abstraction | | Supports
|
||
|-----------------------|----|-------------
|
||
| Behavior | | Knowledge
|
||
|-----------------------|--->|-------------
|
||
| Derived | | Information
|
||
|-----------------------|--->|-------------
|
||
| Atomic | | Data
|
||
|
||
The purpose of the abstration layer is in this case to
|
||
support assessments and measures at the corresponding
|
||
contextual level. For instance a technical report tailored
|
||
to an Incident Response Team (IRT) generally concerns
|
||
Derived and Atomic indicators, while an intelligence report
|
||
would usually be based on the Behavioural level.
|
||
|
||
Having covered the abstraction layers, we can recognize that
|
||
OpenIOC (or Cybox and MAEC) covers the bottom layers of
|
||
abstration, while MITRE (PRE-)ATT&CK in its current form is
|
||
mostly about the Behaviour level.
|
||
|
||
For Derived indicators there are primarily two
|
||
well-established, seasoned and successful formats that have
|
||
become standards through its widespread usage. This is
|
||
amongst others caused by the indicators and rules being
|
||
effective, rapid, easy and pleasing to write.
|
||
|
||
First we have Snort/Suricata rules and Lua scripts which was
|
||
designed for network detection. For Snort/Suricata I'd say
|
||
that most of what is detected of metadata today is probably
|
||
expressable in OpenIOC (except for the magic that can be
|
||
done with Lua). Second there is the Yara format which has
|
||
become known for its applicability against malicious
|
||
files. The simplicity of both formats is obviously due to
|
||
their power of expression. Thus, I'd say that Yara and
|
||
Snort/Suricata formats is the ones to look for when it comes
|
||
to content and pattern detection.
|
||
|
||
> Indicators should be easy and pleasing to write.
|
||
|
||
To summarize the above, each of the formats can be mapped to
|
||
an abstraction level:
|
||
|
||
| Level of abstraction | | Formats
|
||
|-----------------------|----|-------------
|
||
| Behavior | | MITRE (PRE-)ATT&CK
|
||
|-----------------------|--->|-------------
|
||
| Derived | | Suricata+Lua, Yara
|
||
|-----------------------|--->|-------------
|
||
| Atomic | | OpenIOC 1.1
|
||
|
||
|
||
Going through my notes on how I document my own indicators I
|
||
also found that I use the CVE database, datetimes,
|
||
confidence, analyst comments for context and classification
|
||
as well (the latter being irrelevant for detection).
|
||
|
||
One of the major problems is: everything that is currently
|
||
out there breaks the analyst workflow. You either need to
|
||
log in to some fancy web interface, edit XML files (god
|
||
forbid) or you would just jot down everything in a text
|
||
file. The text file seems to be the natural fallback in
|
||
almost any instance. I have even attempted to use the very
|
||
good initiative by Yahoo, PyIOCe, and Mandiant's
|
||
long-forgotten IOC Editor. These projects have both lost
|
||
tracktion, as almost every other intiative in this space. So
|
||
that is right folks, the text editor is still the preferred
|
||
tool in 2018, and let's face it: indicators should be
|
||
pleasing to design and create - like putting your signature
|
||
to an incident or a job well done.
|
||
|
||
> an indicator set should be for humans and machines by
|
||
humans
|
||
|
||
After all, the human is the one that is going to have to
|
||
deal with the indicator sets at some point, and we are the
|
||
slowest link. So let us not slow ourselves down more than
|
||
necessary. At this point I would like to propose the golden
|
||
rule of creating golden rules: an indicator set should be
|
||
for humans and machines by humans.
|
||
|
||
You may also have noticed that when all these standards
|
||
suddendly are combined into one standard, they become less
|
||
user-friendly. In other words, let us rather find back to
|
||
our common \*NIX roots where each tool had a limited set of
|
||
tasks.
|
||
|
||
Graphs are essential when writing indicators. Almost
|
||
everything in the world around us can be modelled as a
|
||
network, and infiltration and persistence in cyberspace is
|
||
no exception. Thus, an indicator format needs to be
|
||
representable in a graph, and guess what? Almost everything
|
||
are as long as it maintains some kind of structure.
|
||
|
||
For graphs there are two ways of going about the problem:
|
||
|
||
1) Implement the graph in the format
|
||
|
||
2) Make sure that you have a good graph backend and a
|
||
automatable and traversable format available
|
||
|
||
For option 1, the graph in the format will increase the
|
||
complexity significantly. Option 2 results in the opposite,
|
||
but that does not mean that it can't be converted to a
|
||
graph. To make an elaborate discussion short, this is what
|
||
we have graph databases for, such as Janusgraph [6].
|
||
|
||
|
||
## A Conceptual View
|
||
|
||
Summarizing the above, I'd like to propose the following
|
||
requirements for indicator formats:
|
||
|
||
1) Indicator sets should be easy and inviting to create
|
||
|
||
2) You should be able to start writing at any time, when you
|
||
need it
|
||
|
||
3) Unnecessary complexity should be avoided
|
||
|
||
4) The format should be human readable and editable
|
||
|
||
5) A machine should be able to interpret the format
|
||
|
||
6) Indicator sets should be graph compatible
|
||
|
||
With a basis in this article, I believe that the best
|
||
approach is to provide a basic plain text format
|
||
specification that inherits from the OpenIOC 1.1 and MITRE
|
||
frameworks and references other formats where necessary.
|
||
|
||
Let us imagine that we found an IP address in one
|
||
situation. The IP-address was connected to a domain that we
|
||
found using passive DNS. Further, it was found that a
|
||
specific file was associated with that domain through a
|
||
Twitter comment. Representing the given information in its
|
||
purest (readable) form looks like the following:
|
||
|
||
// a test file
|
||
class tlp:white
|
||
date 2018/02/18
|
||
ipv4 low 188.226.130.166
|
||
domain med secdiary.com
|
||
technique PRE-T1146
|
||
filename med some_filename.docx
|
||
comment found in open sources
|
||
|
||
To recap some of the previous points: the above format is
|
||
simple, it can be written at any time based on knowledge of
|
||
well known standards. The best of it all is that if you are
|
||
heavily invested in specific formats, it can be converted to
|
||
them all using a simple interpreter traversing the format.
|
||
|
||
Further, such a format is easily converted into a tree and
|
||
can be loaded into a graph for traversing and automated
|
||
assessments. Each confidence value can be quantified
|
||
(``low=0.33``, ``med=0.66``, ``high=1.0``). That said,
|
||
simplicity in this case equals actionable indicators.
|
||
|
||
| v: 188.226.130.166 (0.33) | match |
|
||
| e | |
|
||
| v: secdiary.com (0.66) | no match | (0.33+0.66)/2=0.5
|
||
| e | |
|
||
| v: some_filename.docx (0.66) | match |
|
||
|
||
For networks vs hierarchies: a drawback of the latter, as
|
||
mentioned in the former section, is the lack of
|
||
e.g. multiple domains being connected to different other
|
||
vertices. A practical solution goes as follows:
|
||
|
||
ipv4 low 188.226.130.166
|
||
domain med secdiary.com
|
||
domain low secdiary.com
|
||
ipv4 low 128.199.56.232
|
||
|
||
The graph receiving the above indicator file should identify
|
||
the domain as being a unique entity and link the two IP
|
||
addresses to the same domain:
|
||
|
||
| v: 188.226.130.166 (0.33)
|
||
| e: 0.5
|
||
| v: secdiary.com (0.5)
|
||
| e: 0.33
|
||
| v: 128.199.56.232 (0.33)
|
||
|
||
As for structuring the indicator format for machines in the
|
||
practical aspect, consider the following pseudocode:
|
||
|
||
indicators = [(0,'ipv4','low','188.226.130.166'),...]
|
||
_tree = tree(root_node)
|
||
for indicator in indicators
|
||
depth = indicator[0]
|
||
_tree.insert(indicator,depth)
|
||
|
||
Now that we have the tree represented in code, it is
|
||
trivially traversable when loading it into some graph:
|
||
|
||
method load_indicators(node,depth):
|
||
graph.insert(node.parent,edge_label,node)
|
||
for child in node.children
|
||
load_indicator(child,depth+1)
|
||
|
||
load_indicators(tree,0)
|
||
|
||
## Summary
|
||
|
||
Hopefully I did not kill too many kittens with this
|
||
post. You may or may not agree, but I do believe that most
|
||
analysts share at least parts of my purist views on the
|
||
matter.
|
||
|
||
We are currently too focused on supporting standards and
|
||
having everyone use as few of them as possible. I believe
|
||
that energy is better used on getting more consistent in the
|
||
way we document and actually exchange more developed
|
||
indicator sets than the md5 hash- and domainlists that are
|
||
typically shared today ("not looking at these kinds of files
|
||
at all" - even though it's not the worst I've seen:
|
||
``MAR-10135536-F_WHITE_stix.xml`` [7]).
|
||
|
||
In the conceptual part of this article I propose a simple
|
||
but yet effective way of representing indicators in a
|
||
practical manner. Frankly, it is even too simple to be
|
||
novel. It is just consistent and intutitive.
|
||
|
||
PS! For the STIX example above, have a look at the following
|
||
to get a feel with the actual content of the file (used one
|
||
of the mentioned specimens to show the point):
|
||
|
||
class tlp:white
|
||
date 2018/02/05
|
||
|
||
sha1 high 4efb9c09d7bffb2f64fc6fe2519ea85378756195
|
||
comment NCCIC:Observable-724f9bfe-1392-456e-8d9b-c143af15f8d4
|
||
comment did not convert all attributes
|
||
compiler Microsoft Visual C++ 6.0
|
||
md5 high 3dae0dc356c2b217a452b477c4b1db06
|
||
date 2016-01-29T09:21:46Z
|
||
entropy med 6.65226708818
|
||
#sections low 5
|
||
intname med ProxyDll.dll
|
||
detection med symantec:Heur.AdvML.B
|
||
|
||
The original document states for those same indicators in no less than 119 lines
|
||
with an overhead ratio of about 1:5 (it looks completely insane):
|
||
|
||
<stix:Observables cybox_major_version="2" cybox_minor_version="1" cybox_update_version="0">
|
||
<cybox:Observable id="NCCIC:Observable-724f9bfe-1392-456e-8d9b-c143af15f8d4">
|
||
<cybox:Object id="NCCIC:WinExecutableFile-bb9e38d1-d91c-4727-ab6a-514ecc0c02a2">
|
||
<cybox:Properties xsi:type="WinExecutableFileObj:WindowsExecutableFileObjectType">
|
||
<FileObj:File_Name>3DAE0DC356C2B217A452B477C4B1DB06</FileObj:File_Name>
|
||
<FileObj:Size_In_Bytes>336073</FileObj:Size_In_Bytes>
|
||
<FileObj:File_Format>PE32 executable (DLL) (console) Intel 80386, for MS Windows</FileObj:File_Format>
|
||
<FileObj:Hashes>
|
||
<cyboxCommon:Hash>
|
||
<cyboxCommon:Type xsi:type="cyboxVocabs:HashNameVocab-1.0">MD5</cyboxCommon:Type>
|
||
<cyboxCommon:Simple_Hash_Value>3dae0dc356c2b217a452b477c4b1db06</cyboxCommon:Simple_Hash_Value>
|
||
</cyboxCommon:Hash>
|
||
<cyboxCommon:Hash>
|
||
<cyboxCommon:Type xsi:type="cyboxVocabs:HashNameVocab-1.0">SHA1</cyboxCommon:Type>
|
||
<cyboxCommon:Simple_Hash_Value>4efb9c09d7bffb2f64fc6fe2519ea85378756195</cyboxCommon:Simple_Hash_Value>
|
||
</cyboxCommon:Hash>
|
||
<cyboxCommon:Hash>
|
||
<cyboxCommon:Type xsi:type="cyboxVocabs:HashNameVocab-1.0">SHA256</cyboxCommon:Type>
|
||
<cyboxCommon:Simple_Hash_Value>8acfe8ba294ebb81402f37aa094cca8f914792b9171bc62e758a3bbefafb6e02</cyboxCommon:Simple_Hash_Value>
|
||
</cyboxCommon:Hash>
|
||
<cyboxCommon:Hash>
|
||
<cyboxCommon:Type xsi:type="cyboxVocabs:HashNameVocab-1.0">SHA512</cyboxCommon:Type>
|
||
<cyboxCommon:Simple_Hash_Value>e52b8878bd8c3bdd28d696470cba8a18dcc5a6d234169e26a2fbd9862b10ec1d40196fac981bc3c5a67e661cd60c10036321388e5e5c1f60a7e9937dd71fadb1</cyboxCommon:Simple_Hash_Value>
|
||
</cyboxCommon:Hash>
|
||
<cyboxCommon:Hash>
|
||
<cyboxCommon:Type xsi:type="cyboxVocabs:HashNameVocab-1.0">SSDEEP</cyboxCommon:Type>
|
||
<cyboxCommon:Simple_Hash_Value>3072:jUdidTaC07zIQt9xSx1pYxHvQY06emquSYttxlxep0xnC:jyi1XCzcbpYdvQ2e9g3kp01C</cyboxCommon:Simple_Hash_Value>
|
||
</cyboxCommon:Hash>
|
||
</FileObj:Hashes>
|
||
<FileObj:Packer_List>
|
||
<FileObj:Packer>
|
||
<FileObj:Name>Microsoft Visual C++ 6.0</FileObj:Name>
|
||
</FileObj:Packer>
|
||
<FileObj:Packer>
|
||
<FileObj:Name>Microsoft Visual C++ 6.0 DLL (Debug)</FileObj:Name>
|
||
</FileObj:Packer>
|
||
</FileObj:Packer_List>
|
||
<FileObj:Peak_Entropy>6.65226708818</FileObj:Peak_Entropy>
|
||
<WinExecutableFileObj:Headers>
|
||
<WinExecutableFileObj:File_Header>
|
||
<WinExecutableFileObj:Number_Of_Sections>5</WinExecutableFileObj:Number_Of_Sections>
|
||
<WinExecutableFileObj:Time_Date_Stamp>2016-01-29T09:21:46Z</WinExecutableFileObj:Time_Date_Stamp>
|
||
<WinExecutableFileObj:Size_Of_Optional_Header>4096</WinExecutableFileObj:Size_Of_Optional_Header>
|
||
<WinExecutableFileObj:Hashes>
|
||
<cyboxCommon:Hash>
|
||
<cyboxCommon:Type xsi:type="cyboxVocabs:HashNameVocab-1.0">MD5</cyboxCommon:Type>
|
||
<cyboxCommon:Simple_Hash_Value>e14dca360e273ca75c52a4446cd39897</cyboxCommon:Simple_Hash_Value>
|
||
</cyboxCommon:Hash>
|
||
</WinExecutableFileObj:Hashes>
|
||
</WinExecutableFileObj:File_Header>
|
||
<WinExecutableFileObj:Entropy>
|
||
<WinExecutableFileObj:Value>0.672591739631</WinExecutableFileObj:Value>
|
||
</WinExecutableFileObj:Entropy>
|
||
</WinExecutableFileObj:Headers>
|
||
<WinExecutableFileObj:Sections>
|
||
<WinExecutableFileObj:Section>
|
||
<WinExecutableFileObj:Section_Header>
|
||
<WinExecutableFileObj:Name>.text</WinExecutableFileObj:Name>
|
||
<WinExecutableFileObj:Size_Of_Raw_Data>49152</WinExecutableFileObj:Size_Of_Raw_Data>
|
||
</WinExecutableFileObj:Section_Header>
|
||
<WinExecutableFileObj:Entropy>
|
||
<WinExecutableFileObj:Value>6.41338619924</WinExecutableFileObj:Value>
|
||
</WinExecutableFileObj:Entropy>
|
||
<WinExecutableFileObj:Header_Hashes>
|
||
<cyboxCommon:Hash>
|
||
<cyboxCommon:Type xsi:type="cyboxVocabs:HashNameVocab-1.0">MD5</cyboxCommon:Type>
|
||
<cyboxCommon:Simple_Hash_Value>076cdf2a2c0b721f0259de10578505a1</cyboxCommon:Simple_Hash_Value>
|
||
</cyboxCommon:Hash>
|
||
</WinExecutableFileObj:Header_Hashes>
|
||
</WinExecutableFileObj:Section>
|
||
<WinExecutableFileObj:Section>
|
||
<WinExecutableFileObj:Section_Header>
|
||
<WinExecutableFileObj:Name>.rdata</WinExecutableFileObj:Name>
|
||
<WinExecutableFileObj:Size_Of_Raw_Data>8192</WinExecutableFileObj:Size_Of_Raw_Data>
|
||
</WinExecutableFileObj:Section_Header>
|
||
<WinExecutableFileObj:Entropy>
|
||
<WinExecutableFileObj:Value>3.293891672</WinExecutableFileObj:Value>
|
||
</WinExecutableFileObj:Entropy>
|
||
<WinExecutableFileObj:Header_Hashes>
|
||
<cyboxCommon:Hash>
|
||
<cyboxCommon:Type xsi:type="cyboxVocabs:HashNameVocab-1.0">MD5</cyboxCommon:Type>
|
||
<cyboxCommon:Simple_Hash_Value>4a6af2b49d08dd42374deda5564c24ef</cyboxCommon:Simple_Hash_Value>
|
||
</cyboxCommon:Hash>
|
||
</WinExecutableFileObj:Header_Hashes>
|
||
</WinExecutableFileObj:Section>
|
||
<WinExecutableFileObj:Section>
|
||
<WinExecutableFileObj:Section_Header>
|
||
<WinExecutableFileObj:Name>.data</WinExecutableFileObj:Name>
|
||
<WinExecutableFileObj:Size_Of_Raw_Data>110592</WinExecutableFileObj:Size_Of_Raw_Data>
|
||
</WinExecutableFileObj:Section_Header>
|
||
<WinExecutableFileObj:Entropy>
|
||
<WinExecutableFileObj:Value>6.78785911234</WinExecutableFileObj:Value>
|
||
</WinExecutableFileObj:Entropy>
|
||
<WinExecutableFileObj:Header_Hashes>
|
||
<cyboxCommon:Hash>
|
||
<cyboxCommon:Type xsi:type="cyboxVocabs:HashNameVocab-1.0">MD5</cyboxCommon:Type>
|
||
<cyboxCommon:Simple_Hash_Value>c797dda9277ee1d5469683527955d77a</cyboxCommon:Simple_Hash_Value>
|
||
</cyboxCommon:Hash>
|
||
</WinExecutableFileObj:Header_Hashes>
|
||
</WinExecutableFileObj:Section>
|
||
<WinExecutableFileObj:Section>
|
||
<WinExecutableFileObj:Section_Header>
|
||
<WinExecutableFileObj:Name>.reloc</WinExecutableFileObj:Name>
|
||
<WinExecutableFileObj:Size_Of_Raw_Data>8192</WinExecutableFileObj:Size_Of_Raw_Data>
|
||
</WinExecutableFileObj:Section_Header>
|
||
<WinExecutableFileObj:Entropy>
|
||
<WinExecutableFileObj:Value>3.46819043887</WinExecutableFileObj:Value>
|
||
</WinExecutableFileObj:Entropy>
|
||
<WinExecutableFileObj:Header_Hashes>
|
||
<cyboxCommon:Hash>
|
||
<cyboxCommon:Type xsi:type="cyboxVocabs:HashNameVocab-1.0">MD5</cyboxCommon:Type>
|
||
<cyboxCommon:Simple_Hash_Value>fbefbe53b3d0ca62b2134f249d249774</cyboxCommon:Simple_Hash_Value>
|
||
</cyboxCommon:Hash>
|
||
</WinExecutableFileObj:Header_Hashes>
|
||
</WinExecutableFileObj:Section>
|
||
</WinExecutableFileObj:Sections>
|
||
</cybox:Properties>
|
||
</cybox:Object>
|
||
</cybox:Observable>
|
||
|
||
|
||
|
||
[1] STIX: https://oasis-open.github.io/cti-documentation/
|
||
[2] Cybox example: https://github.com/CybOXProject/schemas/blob/master/samples/CybOX_IPv4Address_Instance.xml
|
||
[3] MAEC: https://maec.mitre.org/
|
||
[4] OpenIOC to STIX repository on Github: https://github.com/STIXProject/openioc-to-stix
|
||
[5] STIX 2.x support (MISP): https://github.com/MISP/MISP/issues/2046
|
||
[6] Janusgraph: http://janusgraph.org/
|
||
[7] MAR-10135536-F_WHITE_stix.xml: https://www.us-cert.gov/sites/default/files/publications/MAR-10135536-F_WHITE_stix.xml
|