METHODOLOGY · OPEN SCIENCE · REPRODUCIBLE

How we built the Public ISP Complaint Database

1. What's in this dataset (honestly)

We publish two intersecting datasets under one CC-BY 4.0 release:

  1. FCC consumer telecom complaints, state + issue level. Sourced from the FCC Consumer Help Center Open Data set (Socrata dataset 3xyp-aqkj). This dataset does not include carrier names — the FCC strips them before publishing the bulk file — but it does include state, ZIP3, issue type, issue category, complaint method, and date. Useful for state-level + trend analysis.
  2. Carrier-level documented issues, editorial. Sourced from our 509-ISP master database. Each carrier record has counts of: data breaches in the last 24 months, active class-action lawsuits, state Public Service Commission flags, BEAD grant awards. This is what gives the dataset carrier attribution.

We don't pretend the FCC bulk dump has carrier names when it doesn't. Carrier attribution for FCC complaints requires either an individual FOIA request or scraping state PSC dockets. v2.0 (Q3 2026) will add state-PSC carrier-level data from CA PUC, NY DPS, IL ICC, TX PUC, and PA PUC — the five state PSCs that publish carrier attribution.

Original FCC dataset license: Public domain (US federal government, 17 U.S.C. § 105).
Editorial carrier data + this derivative work license: Creative Commons Attribution 4.0 International (CC-BY 4.0).

2. Pull cadence

We refresh the dataset quarterly, within 30 days of the FCC publishing each new quarter. Each refresh bumps the dataset version (v1.0, v1.1, ...) and updates the Zenodo DOI. The fetch script is open-source:

$ git clone https://github.com/untangledstreaming/isp-complaint-database
$ node build/fetch-fcc-complaints.js
$ node build/match-fcc-complaints.js
$ node build/generate-complaint-database-pages.js

3. Name resolution (built but mostly unused in v1)

We built a two-stage carrier name resolver — alias map + fuzzy fallback against the 509-ISP master database — for when carrier names are present. In the FCC 3xyp-aqkj bulk dump, they almost never are (the field exists but is empty for >99% of rows). The resolver is ready for v2.0 state-PSC data, which does include carrier names.

4. Aggregation

For the FCC state-level data we aggregate per state × quarter × issue type × method. For the carrier editorial data we maintain raw counts (breaches, lawsuits, PSC flags, BEAD grants) and let users sort by any. We do NOT normalize FCC complaint counts per capita because the FCC dataset is too noisy at the state-by-state level to support that comparison meaningfully — state-level differences in FCC complaint volume reflect filing behavior more than carrier behavior.

5. What's NOT in the data (yet)

6. Citation formats

APA

Baron, R. (2026). UntangledStreaming Public ISP Complaint Database (v1.0) [Data set]. Untangled Streaming. https://doi.org/10.5281/zenodo.PENDING

BibTeX

@dataset{baron2026untangled,
  author = {Baron, Rick},
  title  = {UntangledStreaming Public ISP Complaint Database},
  year   = {2026},
  version= {1.0},
  publisher = {Untangled Streaming},
  doi    = {10.5281/zenodo.PENDING},
  url    = {https://untangledstreaming.com/data/isp-complaint-database/}
}

RIS

TY  - DATA
AU  - Baron, Rick
PY  - 2026
DA  - 2026-06-07
TI  - UntangledStreaming Public ISP Complaint Database
PB  - Untangled Streaming
DO  - 10.5281/zenodo.PENDING
UR  - https://untangledstreaming.com/data/isp-complaint-database/
ER  -

7. Reproducibility

Everything — fetch script, alias map, generation logic, raw FCC dump, normalized output — is open and downloadable. If you find a name we mis-resolved, file an issue on the repo (coming) or email [email protected] and we'll fix it in the next quarterly refresh.

8. Why we built this

Cord-cutting and ISP-shopping are major consumer-finance decisions, but the data behind ISP quality is locked in messy government datasets that journalists, researchers, and regulators have to clean themselves every single time. By doing the work once and publishing it free, we make it easier for everyone in the cord-cutting ecosystem to make data-driven arguments. We don't charge for this. We don't gatekeep it. Use it.

— Rick Baron & Bear, UntangledStreaming editorial team