The SEC EDGAR API Is the Best Free Dataset in Finance
Every public company in the United States is required to file electronically with the SEC. All of that data — 10-K annual reports, 13F institutional holdings, Form 4 insider transactions, 8-K material events — is freely accessible via the EDGAR full-text search API. Most quant PMs pay thousands per month for data they could pull for free.
This tutorial shows you exactly how to extract structured financial data from SEC EDGAR using Python — no API key required, no Bloomberg terminal, no third-party data vendor.
EDGAR API Basics: Three Endpoints You Need to Know
The EDGAR system exposes several machine-readable endpoints. The three most useful for quant research are:
- Company Search —
https://efts.sec.gov/LATEST/search-index?q=&dateRange=custom&startdt=&enddt=&forms=10-K - XBRL Viewer / Financial Data —
https://data.sec.gov/api/xbrl/companyfacts/CIK{cik}.json - Submissions Feed —
https://data.sec.gov/submissions/CIK{cik}.json
All three return JSON. The SEC imposes a rate limit of 10 requests per second with a proper User-Agent header — always include one, or your requests will be blocked.
Free Beta Access
Get daily AI-powered quant signals — 0-cost beta
SEC filing alerts, insider clusters, factor regime shifts — in your inbox before market open.
Step 1: Set Up Your Python Environment
You need three libraries: requests for HTTP calls, pandas for data handling, and optionally beautifulsoup4 for parsing raw filing HTML.
pip install requests pandas beautifulsoup4
Always set a descriptive User-Agent to avoid rate limiting:
HEADERS = {
"User-Agent": "YourFundName research@yourfund.com",
"Accept-Encoding": "gzip, deflate",
}
Step 2: Pull 13F Institutional Holdings
13F filings reveal every stock held by institutional managers with $100M+ in equity AUM. Filed quarterly, they are one of the most actionable signals for understanding institutional positioning.
import requests
import pandas as pd
BASE = "https://data.sec.gov"
HEADERS = {"User-Agent": "QuantResearch research@example.com"}
def get_cik(ticker: str) -> str:
"""Map ticker symbol to SEC CIK number."""
url = "https://efts.sec.gov/LATEST/search-index?q=%22{}%22&forms=SC+13G".format(ticker)
# Use EDGAR company tickers JSON for direct lookup
tickers_url = "https://www.sec.gov/files/company_tickers.json"
resp = requests.get(tickers_url, headers=HEADERS)
tickers = resp.json()
for item in tickers.values():
if item["ticker"].upper() == ticker.upper():
return str(item["cik_str"]).zfill(10)
return None
def get_submissions(cik: str) -> dict:
"""Fetch all recent filings for a given CIK."""
url = f"{BASE}/submissions/CIK{cik}.json"
return requests.get(url, headers=HEADERS).json()
def find_13f_filings(cik: str, limit: int = 4) -> list:
"""Return the most recent 13F-HR filing accession numbers."""
data = get_submissions(cik)
recent = data.get("filings", {}).get("recent", {})
forms = recent.get("form", [])
accessions = recent.get("accessionNumber", [])
dates = recent.get("filingDate", [])
results = []
for form, acc, date in zip(forms, accessions, dates):
if "13F" in form:
results.append({"accession": acc, "date": date})
if len(results) == limit:
break
return results
Step 3: Extract XBRL Financial Data (10-K / 10-Q)
The XBRL financial data endpoint is the fastest way to pull structured fundamentals. It returns every reported financial concept for a company — revenue, net income, earnings per share, total assets — going back to the first electronic filing.
def get_xbrl_facts(cik: str) -> dict:
"""Pull all reported XBRL financial facts for a company."""
url = f"{BASE}/api/xbrl/companyfacts/CIK{cik}.json"
resp = requests.get(url, headers=HEADERS)
return resp.json()
def extract_revenue(cik: str) -> pd.DataFrame:
"""Return annual revenue time series."""
facts = get_xbrl_facts(cik)
us_gaap = facts.get("facts", {}).get("us-gaap", {})
# Revenue can be under multiple XBRL tags
for tag in ["Revenues", "RevenueFromContractWithCustomerExcludingAssessedTax", "SalesRevenueNet"]:
if tag in us_gaap:
units = us_gaap[tag].get("units", {}).get("USD", [])
df = pd.DataFrame(units)
# Filter for annual 10-K filings only
df = df[df["form"] == "10-K"].copy()
df["end"] = pd.to_datetime(df["end"])
return df[["end", "val"]].rename(columns={"val": "revenue_usd"}).sort_values("end")
return pd.DataFrame()
# Example: Apple Inc. CIK
apple_cik = "0000320193"
revenue = extract_revenue(apple_cik)
print(revenue.tail(5))
Step 4: Track Form 4 Insider Transactions
Form 4 must be filed within two business days of an insider transaction. This timeliness makes it one of the highest-frequency signals you can monitor. Studies show insider buying generates 3-7% excess returns annually when filtered for high-conviction signals (cluster buying across multiple insiders, purchases above $500K).
def get_form4_filings(cik: str, days_back: int = 90) -> list:
"""Return recent Form 4 filings for a company."""
from datetime import datetime, timedelta
cutoff = (datetime.now() - timedelta(days=days_back)).strftime("%Y-%m-%d")
data = get_submissions(cik)
recent = data.get("filings", {}).get("recent", {})
forms = recent.get("form", [])
accessions = recent.get("accessionNumber", [])
dates = recent.get("filingDate", [])
results = []
for form, acc, date in zip(forms, accessions, dates):
if form == "4" and date >= cutoff:
results.append({"form": form, "accession": acc, "date": date})
return results
Step 5: Rate Limiting and Production Patterns
The SEC enforces a 10 req/s rate limit. For production workflows scanning hundreds of tickers, implement exponential backoff and a request queue:
import time
from functools import wraps
def rate_limited(max_per_second):
min_interval = 1.0 / max_per_second
last_called = [0.0]
def decorator(func):
@wraps(func)
def wrapper(*args, **kwargs):
elapsed = time.time() - last_called[0]
wait = min_interval - elapsed
if wait > 0:
time.sleep(wait)
result = func(*args, **kwargs)
last_called[0] = time.time()
return result
return wrapper
return decorator
@rate_limited(8) # Stay under 10/s
def safe_get(url: str) -> dict:
return requests.get(url, headers=HEADERS).json()
What to Do With This Data
Raw EDGAR data is only valuable when transformed into factors. Once you have the extraction pipeline running, the next step is building a scoring model. For most RIAs, three signals drive the majority of the alpha:
- Insider Cluster Score — Weighted sum of insider purchases across 3/6/12 month windows
- Filing Sentiment Delta — Year-over-year change in risk factor keyword density
- Earnings Surprise Predictor — MD&A tone versus analyst consensus
Quantscope runs this entire pipeline automatically — EDGAR ingestion, XBRL parsing, insider scoring, and daily delivery to your inbox. If you want the output without the Python overhead, sign up below for a free daily brief.
Related Research
- Autonomous AI Quant Research Workflow: From EDGAR Filing to Factor Signal
- SEC EDGAR Data for Quant Funds: Automating 13F and 10-K Analysis
- XBRL Filing Analysis: What Every RIA Should Know
- Backtesting Factor Strategies: A Step-by-Step Guide for Independent RIAs
Want this data without writing a single line of Python? Get your free Quantscope watchlist →