CATrustData

Analysis & results from the SURGe Certificate Authority Trustworthiness project

What is this repo?

Security relies on trust, especially when it comes to Certificate Authorities. Browsers ship with many root CAs built in, but are they all equally trustworthy? Splunk's SURGe research team examined over 5 billion recent TLS certificates used for secure web sites (WebPKI) to try to find the answer to this question.

This repo is a companion to David Bianco's 2023 RSA talk, "Trust Unearned? Evaluating CA Trustworthiness Across 5 Billion Certificates" [slides]. It contains the raw risk rankings for all root CAs and the top 10k issuing CAs we encountered in our research.

How did we compile this data?

It's simple:

We downloaded about 5 billion TLS certificates used for WebPKI (think, HTTPS websites) from 15 separate Certificate Transparency Logs (CTL)s. This gave us the bulk of all the TLS certificates issued during 2021 and 2022, plus some older ones, a few going back as far as 2015.
We partnered with some threat intelligence companies and private CTI providers to get a very large list of ~185m malicious domains and FQDNs
We threw all this data into Splunk and analyzed it!

Ok, it wasn't actually quite so straightforward, but this was the general idea.

What does the data look like?

The risk rankings are distributed in the form of two CSV files:

ca-trust-root-risk-scores.csv ("the root file", 497 entries)
ca-trust-issuer-risk-scores-10k.csv ("the issuer file", 10k entries)

Each file has a consistent set of fields:

The name of the CA (the root_ca or the issuer field, depending on which file you're looking at)
total_certs: the total number of certificates issued by this CA. For roots, this is the total number of certificates where the trust chain is anchored to the root CA)
risky_certs: The number of certificates associated with this CA that our threat intel partners observed participating in malicious activity
risk_percent: The ratio of risky to benign certificates (risky_certs / total_certs)
tier: In order to ensure fair comparisons, we divided the CAs into tiers, based on their total_certs values. This allows us to do an apples-to-apples comparison of CAs which are (roughly) the same size. These are strings and will be one of : "tier1", "tier2", "tier3" or "tier4".
- Root CAs
  - Tier 1 Roots: >= 10m certificates
  - Tier 2 Roots: 1m - 10m certificates
  - Tier 3 Roots: 100k - 1m certificates
  - Tier 4 Roots: <= 100k certificates
- Issuing CAs
  - Tier 1 Issuers: >= 100k certificates
  - Tier 2 Issuers: 10k - 100k certificates
  - Tier 3 Issuers: 1k - 10k certificates
  - Tier 4 Issuers: <= 1k certificates
zscore: The number of standard deviations away from the mean for this CA's risky_percent, compared to the rest of it's tier. Larger numbers are more risky, smaller numbers are more trustworthy. In our research, we used an outlier threshold of +/- 3.0 to identify risky or trusty outliers, but here we just provide the zscore. You will need to impose your own interpretation.

Each file contains one record per root/issuing CA we encountered in our dataset. The root file is comprehensive (i.e., it contains data for all root CAs we discovered), while the issuer file only contains the top 10,000 issuing CAs by risk score. Although we counted ~78,000 distinct issuers, there was an extremely long tail of issuers with 0% risk. Every issuer with any risk rating at all is present in the issuer file. If an issuer is not present, assume that we either didn't find it in our data or (more likely) the risk percentage was 0%.

How should I use this data?

That's up to you! We think this would be a great start to a risk-based approach to alerting or hunting (RBA or RBH). It might also work well as a source of data enrichment for web-focused hunts. Our findings don't support directly alerting on certificates just because they happen to be issued or anchored by any of these CAs, but if used in combination with other factors or as part of an RBA/RBH strategy, these scores can be very helpful.

Please note, however, that this is a point-in-time snapshot and will become increasingly unreliable as time goes on. We feel that this is good data right now (Spring 2023), but downloadeat emptor. USE THIS FOR PRODUCTION SECURITY OPERATIONS AT YOUR OWN RISK.

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
LICENSE		LICENSE
README.md		README.md
ca-trust-issuer-risk-scores-10k.csv		ca-trust-issuer-risk-scores-10k.csv
ca-trust-root-risk-scores.csv		ca-trust-root-risk-scores.csv
trust_unearned_evaluating_CA_trustworthiness_across_5_billion_certificates.pdf		trust_unearned_evaluating_CA_trustworthiness_across_5_billion_certificates.pdf

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LICENSE

LICENSE

README.md

README.md

ca-trust-issuer-risk-scores-10k.csv

ca-trust-issuer-risk-scores-10k.csv

ca-trust-root-risk-scores.csv

ca-trust-root-risk-scores.csv

trust_unearned_evaluating_CA_trustworthiness_across_5_billion_certificates.pdf

trust_unearned_evaluating_CA_trustworthiness_across_5_billion_certificates.pdf

Repository files navigation

CATrustData

What is this repo?

How did we compile this data?

What does the data look like?

How should I use this data?

About

Releases

Packages

License

splunk/CATrustData

Folders and files

Latest commit

History

Repository files navigation

CATrustData

What is this repo?

How did we compile this data?

What does the data look like?

How should I use this data?

About

Resources

License

Stars

Watchers

Forks