the finer things of cyber

M-21–31: Ye Shall Log, No Matter the Log

GAO Audit

“Three agencies had met tier 3. These agencies were the Department of Agriculture (Agriculture), the National Science Foundation (NSF), and the Small Business Administration (SBA)” (pg. 26)

“…official stated their agencies were not expected to meet the tiers soon.” (pg. 26)

“One agency official stated that its agency estimated that it would require more than 9 years or sufficient additional funding for contractors to account for the new workload needed to meet the event logging tiers. (pg. 27)

“agency officials cited the “all or nothing” nature of the requirements, meaning even if a majority of systems had reached the tier 1 requirements, if all systems had not reached tier 1, the agency overall would be at tier 0.” (pg. 26-27)

Small note: In February 2023, CISA published a short guidance document on M-21–31 that provides a second layer of prioritization for the logs within the existing Tier system (in Tier 0 logs, prioritize XYZ first). However, the CISA guidance does not change the M-21–31 requirements. It states,

This guidance intends to complement and clarify the requirements within M-21–31 and does not supersede or conflict with the policy.

Analyzing the Tiers

What must be logged?

Question for the reader: Which line here represents a discrete event type? Account Creation is straightforward, but what is Manage Credential Type? What is Track Usage of Credentials? Is that referring to a sign-in log? It’s really not clear what the event type is. This issue occurs throughout the memo.

In addition to the unclear phrasing, there are other problems.

Some ‘atomic’ event types are grouped:

OS: – Start-Up and Shutdown of the System (pg. 18)

and others are not:

– Account Creation
– Account Deletion

Some mention an action you should take, instead of an event:

Monitor, Alert, and Respond to Anomalous Behaviors/Activities (pg. 14)

Some are possibly duplicates4:

Network Device Infrastructure (General Logging) — DNS Query/Response Logs (pg. 16)

Some are supersets of others (note the *):

System Log Folder: /Var/Log/* (pg .23)

System Log: /Var/Log/System.Log (pg. 23)

Some are closer to inventory data and not events:

Device Name
Device Manufacturer and Model
Serial #
Phone #
IMEI, IMSI, OS Version, OS Build
Firmware Version

Some are so broad they are almost immeasurable:

OS – System Events (pg. 20)

All of these inconsistencies make it impossible to calculate the true number of required event types (and therefore impossible to consistently measure). But let’s try anyway. Below is my very rough calculation:

  • Event Logging Tier 1 | at least 215 log types (~60%)
  • Event Logging Tier 2 | at least 130 log types (~36%)
  • Event Logging Tier 3 | at least 12 log types (~.03%)

That’s at least 357 different events. There are probably more, because I did some generous groupings. Some are also double counted, because the memo separates by device type (e.g. Linux logon events and Windows logon events are separate).

Finally, there is also no discussion in the memo regarding which systems the logs must be collected from. Is it only production systems? Development? Every system in the agency? I doubt every agency is using the same criteria.

Retention

Log retention discussions usually focus on what to store and for how long, based on your threat model. These discussions are necessary because most SIEM/data lake costs come from the amount of data ingest (e.g. pay per GB of ingested data or pay per GB of data scanned in a query).

Unfortunately, M-21–31 is not a fan of this discussion.

Every event type, except two5, must be stored for:

  • 12 months Active Storage, and then
  • 18 months Cold Storage

This includes extremely high volume events like:

OS – Registry Access (pg. 21)

OS – File and Object Access (pg. 20)

Web Applications – HTTP Request and Response with Body of Data (pg. 33)

There is no consideration for the value, or ROI, of a log type based on its volume and provided insight.

Filtering

The memo also does not mention filtering of any kind, another very common cost management approach in logging discussions. See SwiftonSecurity’s Sysmon config for popular attempts or Olaf Hartong’s presentation on how Microsoft Defender for Endpoint implements event filtering.

In the industry, the question is not “should we filter out events?”, it is “what should we filter out?” and “how should we balance visibility requirements with cost?” However, M-21–31 does not discuss this and therefore implies agencies must log every single event in these categories, which will dramatically increase the cost.

Costs

When agencies say they have funding issues, I think they’re right. But let’s try to calculate some numbers. For simplicity, I’ll focus on just one category: Windows event logs.

To calculate an estimate, I’ll use a small agency like FRTIB, which manages the federal government’s 401(k) and has ~300 employees.

Scenario: Windows Events

To calculate the number of Windows devices:

I estimate ~1 employee = 1 device, so ~300 employee devices.

To calculate the size per Windows device:

  • I ran Procmon with no filters for 10 minutes on my laptop6. The result file in .PML format is ~500MB.
  • 500MB/10min = 50MB/min * 60min/hour * 8 hour/workday = 24GB/day/device.
  • 24GB/day/device * 300 devices
  • = 7,200GB/day (or 7.2TB)

That is a lot of data. Even if you can find some good compression savings, it is still a lot of data.

If FRTIB ingested this into Microsoft Sentinel, it would cost at least ~$16,600/day or ~$6mil/year.7 FRTIB’s FY2023 budget was ~$400 million.

Edit 12.12.2023: The Sentinel cost only includes 90 days of Active storage and the memo requires 180 days. So you’re not even at the minimum retention yet!

That’s 1.5% of the Agency budget dedicated to Windows event logs.

And there’s still the remaining 350 event types!

Recommendations

If you were a federal CISO, could you truly attest that your agency is meeting the requirements?

Considering the unclear event log requirements, unclear system scope, inability to filter events, and resulting costs, the memo needs to change. Especially if GAO or OMB expect agencies to report compliance.

  1. Reissue the memo — this seems necessary. It just has too many problems.
  2. Move the requirements to CISA’s authorities — the requirements seem too low-level to be managed by OMB.
  3. Move the requirements to an updatable format — this enables OMB (or CISA) to update the exact event type requirements at their discretion. This allows for new event types to be added and times adjusted if necessary.
  4. Standardize how event types are described — the current requirements are inconsistently structured, such as when they decide to include required fields (e.g. Source Port) or how they group events. Pick a lane. Preferably the more explicit one.
  5. Fix the presentation of the event type requirements— the current table is difficult to follow and creates confusion.
  6. Create ROI-informed retention times — compare the volume and usefulness of events and build ROI-aligned retention times. This is not an easy task, but the alternative is worse.
  7. Decide a position on filtering — the new memo should provide agencies the flexibility to filter events. Or don’t allow it and find a way to defend that position.
  8. Make the requirements more accessible and known outside of federal government— everyone cares about logging. If you’re doing the work to decide what is important to the government, why not tell the world about it?
  1. 20 federal agencies miss deadline for implementing cyber incident tracking requirements, watchdog says — Nextgov/FCW ↩︎
  2. Only 3 agencies have hit deadline for cyber event logging standards, GAO finds | FedScoop ↩︎
  3. Section 8, subsections (a)-(c) ↩︎
  4. There are many more duplicates. Look at Linux event logs and PowerShell too. ↩︎
  5. PCAP [72 hours], Cloud GCP logs [6 months + 18 months] ↩︎
  6. You might wonder why and I direct you back to the requirement for File/Object/Registry access. 👀 Procmon also doesn’t capture every event required for Windows, so even this number represents a subset. ↩︎
  7. East US region, 5TB commitment tier, 7,200GB/day * $2.31/GB * 365 days ↩︎