thoughts/data/telemetry.md

Telemetry for cyber security is currently at a
crossroads. While past methods have been efficient by being
based on network monitoring, the current revolution in
encryption and the distributed workspace makes it
insufficient to solely rely on network monitoring. Through
this post we are going to focus on the current challenges.

> Telemetry is an electrical apparatus for measuring a
> quantity (such as pressure, speed, or temperature) and
> transmitting the result especially by radio to a distant
> station  
> – Meriam Webster

Telemetry, a term mostly used by AV-vendors, have become
broadly applied as services change from a central to
decentralised geographically spread. Yesterday an employee
would work at his desk from 9-5 and then go home, while
today's modern worker moves around the office area and can
basically work from anywhere in the world when they feel
like it.

In cyber security, telemetry can generally be categorised
in: 1) Network-centric and 2) endpoint-based. A complete
telemetry profile is essential for being able to monitor
security events and to execute retrospective
analysis. Through my recent article on indicators [1] I
proposed a structure for indicators organised in three
levels of abstraction. In this article a telemetry profile
means something that covers a degree of these three levels.

    | Level of abstraction  |    | Formats
    |-----------------------|----|-------------
    | Behavior              |    | MITRE (PRE-)ATT&CK
    |-----------------------|--->|-------------
    | Derived               |    | Suricata+Lua, Yara
    |-----------------------|--->|-------------
    | Atomic                |    | OpenIOC 1.1
    
    
## The Challenges

There are generally two problems that needs to be fully
solved when collecting data for cyber security:

* The use of encryption from end-to-end
* Workers and thereby the defended environment are or will be distributed

As of February 2017 the web was 50% encrypted [2]. Today
that number [3] is growing close to 70%.

For defense purposes, it is possible to identify malicous
traffic, such as beaconing, through metadata analysis. There
have been some developments on detecting anomalies in
encrypted content lately - namely the fingerprinting of
programs using SSL/TLS. In the future I believe this will be
the primary role of network-based detection. This is
actually a flashback to a pre-2010 monitoring environment
when full content was rarely stored and inspected by
security teams.

An additional element to consider is the previous debate
about public key pinning, which has now evolved into
Expect-CT [4]. This means that man in the middle (MitM)
techniques is going to be a no-no at some point. Yes, that
includes your corporate proxy as well.

There is one drawback and dealbreaker with the above for
security teams: it requires access to the datastream used by
the endpoints to be fully effective.

VPNs are going away as more resilient and modern network
architectures will become dominating. The most promising
challenger at the moment is the Beyondcorp [5] (based on
zero trust) architecture proposed by Google more than six
years ago. A zero trust architecture means that clients will
only check in to the corporate environment at the points
that _they_ need or are in the vicinity of corporate
resources. Other activity, such as browsing on external
websites are actually no longer going via the corporate
infrastructure or its monitored links. Additionally, the
endpoint is easily the most common infiltration vector.

To be honest, the Beyondcorp model reflects to a larger
extent how humans actually interact with computers. Humans
have never been confined to the perimeter of the enterprise
network. This may be some of the reason for organisations
being in a currently defeatable state as well. The only ones
to confine themselves to the enterprise network is
ironically the network defenders.

> The only ones to confine themselves to the enterprise network is
> ironically the network defenders.

The battle of controlling the technology evolution is not
completely lost though, it is a matter of changing the
mindset of where data or telemetry is collected. Yesterday
it was at the corporate proxy or in the corporate
environment - today it is on the endpoint and during the
connections to valuable resources.

For endpoints, the primary challenges currently faced are:

* Maintaining the integrity of locally stored and buffered data
* The availability and transport of data to a centralised logging instance
* Confidentiality of the data in transport or at rest
* Data source consistency for central correlation of information from several
  host sources
* Raising the stakes on operational security in a cat and mouse
  chase between intruders and defenders
  
Remote logging is a subject that has gained much publicity
previously, so we are not going into depth about that here.
  
### Existing Tooling For Endpoints

This section was not originally a part of the scope of this
article, but I'd like to establish a baseline of parts of
the available tooling to handle the above issues. I also
believe it touches some of the endpoint challenges.

For the purpose of this article, we define the following
well-known computer abstraction stack:

1. Hardware
2. Operating System
3. Application

Hardware verification and logging is currently a more or
less unexplored field, with primarily only one tool
available to my knowlege. That tool is Chipsec [6] which has
been of interest and integrated into the Google Rapid
Response (GRR) [7] project for some time.

Operating system logs are well understood today, and many
organisations manages logging from the host operating system
properly.

There are increasingly good event streaming and agent-based
systems available, such as LimaCharlie [8], Sysmon [9] and
Carbon Black [10]. The media focus of these platforms are on
the more trendy term "hunting", but their real purpose is
OS-level logging and pattern matching.

Further, distributed forensic platforms are available from
FireEye (HX) and an open source equivalent from Google named
GRR. GRR have been featured extensively on this site
previously. Common for these are that they do not stream
events, but rather stores information on the endpoint.

Application layer logging is extremely challenging. The
logging mechanism in this regard needs to be connected to
the structure of the application itself, and there are a lot
of applications. Further, many application developers does
not focus on logging.

Application logging is important and could be seen as the
technical contextual information provided by the
endpoint. Exposed applications that are important in terms
of coverage:

* Browsers
* Email Readers
* Application Firewalls (if you have one)
* Instant Messaging Clients
* Rich Document editors, such as Excel, Word, Powerpoint

These applications are important since they are the first
point of contact for almost any technical threat. Done
right, application logs will be at a central location before
the intruder manages to get a foothold on the client. Thus,
the risk of data being misrepresented in the central system
are highly reduced (integrity).

Taking browsers and Microsoft Office as an example, there
are some options readily available:

* Firefox HTTP and DNS logging: mozilla.org [11]
* Office Telemetry logging: Office Telemetry Log [12]

The above examples are not security focused as far as I
could tell, more often they are debug oriented. However, the
same data is often what we are after as well (such as: did
the document have a macro? or what is the HTTP header?).

The dependency on the application developers to create
logging mechanisms is quite a challenge in this
arena. However, I believe the solutions in cases where
applications does not log sufficiently is to take advantage
of plugins. Most modern applications supports plugins to
some extent.

To summarise the tooling discussion, we can populate the
computer abstraction layers with the mentioned tools.

    | Level of abstraction  |    | Tools
    |-----------------------|----|-------------
    | Application           |    | Browser, Email and so on
    |-----------------------|--->|-------------
    | Operating System      |    | LC, CB, Sysmon, 
    |-----------------------|--->|-------------
    | Hardware              |    | Chipsec

## Conclusions: How Do We Defend in The Future?

In this article we have defined a structure and discussed in
short one of the most prominent challenges faced by
enterprise defenders today: how do we defend in the future?

Technology. This is the point were technology alone is no
longer the sole solution to defending a network. Modern
network architectures means that defenders needs to be able
to fully comprehend and use the human nature as sensors. It
is also about building intuitive systems which makes the
necessary data and information available to the
defenders. In my mind technology has never been the sole
solution either, so the technology evolution is for the
greater good.

It seems obvious and unavoidable to me that network
defenders must start looking outside the perimeter, just as
intruders have done for many years already. This means
adapting the toolsets available and lobbying for an
architecture that reflects how humans actually use
technology resources. Most people have owned private
equipment for many years (surprise), and the line between
employee and enterprise is blurred and confusing when
realitity now sinks in.

This means, in the technology aspect, that an emphasis must
be put on the endpoints - and that network monitoring must
again be about the metadata of the activity. In short:
collect metadata from networks and content from endpoints.

Only this way will we, in the future, be able to create a
full telemetry profile from each device under our
responsibility.


[1] Article on indicators: /indicators/  
[2] 50% encrypted: https://www.eff.org/deeplinks/2017/02/were-halfway-encrypting-entire-web  
[3] that number: https://letsencrypt.org/stats/  
[4] Expect-CT: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Expect-CT  
[5] Beyondcorp: https://cloud.google.com/beyondcorp/  
[6] Chipsec: https://github.com/chipsec/chipsec  
[7] Google Rapid Response (GRR): https://github.com/google/grr-doc/blob/master/publications.adoc  
[8] LimaCharlie: https://github.com/refractionPOINT/lce_doc/blob/master/README.md  
[9] Sysmon: https://www.rsaconference.com/writable/presentations/file_upload/hta-w05-tracking_hackers_on_your_network_with_sysinternals_sysmon.pdf  
[10] Carbon Black: http://the.report/assets/Advanced-Threat-Hunting-with-Carbon-Black.pdf  
[11] mozilla.org: https://developer.mozilla.org/en-US/docs/Mozilla/Debugging/HTTP_logging  
[12] Office Telemetry Log: https://msdn.microsoft.com/en-us/library/office/jj230106.aspx