Methodology
How this dataset is built, refreshed, cited, and verified.
Sourcing rule
Every series in the dataset is sourced from a primary public publisher. No third-party aggregators sit between the publisher and the CSV. Concretely:
- Mortgage rates → Freddie Mac (via FRED for convenience)
- Treasury yields → US Treasury (via FRED)
- Federal Funds Rate and prime → Federal Reserve Board (via FRED)
- CPI, PCE, employment, earnings → BLS and BEA (via FRED)
- FX rates → Federal Reserve H.10 (via FRED)
- Oil and gas → EIA (via FRED)
- Commodity prices → World Bank Pink Sheet (via FRED)
- Deposit and credit-card rates → FDIC and the Federal Reserve G.19
- GDP per capita and global indicators → World Bank Open Data
FRED is used as the canonical mirror for series whose original publisher does not maintain a stable machine-readable URL. Where the publisher does (World Bank, EIA, Treasury direct), the publisher URL is preferred.
No-imputation rule
Values pass through verbatim from the primary source. No interpolation, no smoothing, no seasonal adjustment beyond what the publisher already applies. Where a publisher applies a transformation (BLS publishes CPI with and without seasonal adjustment), the dataset uses the most-cited public series (CPI-U) and identifies which version it is in the per-series description.
CSV format
Every CSV starts with comment lines carrying provenance, then the header row, then data:
# 30-Year Fixed Mortgage Rate # Source: Freddie Mac via FRED (MORTGAGE30US) # Primary URL: https://fred.stlouisfed.org/series/MORTGAGE30US # Canonical: https://calcfi.app/data/mortgage-rates/30-year-fixed # Retrieved: 2026-05-19T17:04:21.544Z # License: CC-BY-4.0 (attribute to CalcFi + primary source when citing) date,value,unit 1976-06-04,8.78,percent ...
In pandas, load with pd.read_csv(url, comment="#"). In d3-fetch, strip comment lines manually with text.split("\n").filter(l => !l.startsWith("#")).join("\n") then d3.csvParse. The same idiom works in R (read.csv(text=..., comment.char="#")) and in shell (grep -v '^#' file.csv).
Refresh cadence
Each series has its own natural cadence (daily for Treasuries, weekly for Freddie Mac PMMS, monthly for CPI, etc.). The dataset is refreshed at the publisher's cadence — usually within 24 hours of the publisher's release. The Retrieved: comment line in each CSV documents the exact retrieval timestamp.
Reproducibility
Every series has a stable URL on the live CalcFi API (calcfi.app/developers). The full ETL script that builds this dataset is in scripts/refresh.mjs and pulls from those URLs. Given the same script and the same publisher data, anyone can regenerate the dataset locally.
Versioning
Permanent DOI snapshots are minted at major checkpoints. The Figshare, Zenodo, OSF, Kaggle, and Mendeley DOIs all resolve to the same versioned snapshot. The Hugging Face, GitHub, and GitLab mirrors carry the rolling latest. See the citation page for how to pin to a specific snapshot.
Known limitations
- Some crypto series (Bitcoin, Ethereum, Solana) are currently snapshot-only — historical price ingestion is queued for a future release.
- Some FDIC deposit-rate series are snapshot-only for the same reason.
- World Bank GDP and unemployment series are annual cadence — no monthly granularity is available from the publisher.
License
All data CC BY 4.0. Code and scripts CC0 1.0. Attribution to CalcFi and the named primary source when citing.