SEC EDGAR Data for Quant Funds: Automating 13F and 10-K Analysis

Introduction

SEC EDGAR filings represent the most comprehensive public dataset for quant fund research. Yet for most systematic managers, the challenge is not accessing the data—it is extracting meaningful signals from the noise.

The Securities and Exchange Commission requires institutional investment managers to file Form 13F quarterly, revealing their full equity portfolio holdings. Individual companies file Form 10-K annually (and 10-Q quarterly), providing fundamental data that feeds valuation models.

The opportunity: automate extraction, build systematic signals.

Understanding 13F Filings

Form 13F reveals all equity positions exceeding $100 million in market value at quarter-end. Filed within 45 days of quarter-end, these documents provide a window into what the smartest money is holding.

Free Beta Access

Get daily AI-powered quant signals — 0-cost beta

SEC filing alerts, insider clusters, factor regime shifts — in your inbox before market open.

Key data points:

Manager name and filings history
Complete position list (ticker, shares, market value)
Changes from prior quarter (buys, sells, position changes)

For quant funds, 13F data enables:

Crowding Analysis: Which stocks appear across multiple top-tier managers? High crowding increases vulnerability to collective exits.

Signal Mining: Follow the leaders—not as blind copy, but as factor inputs. Which managers consistently outperform? What positions do they add before strong performance?

Sector Rotation: Aggregate manager behavior by sector. When smart money collectively reduces tech exposure, the signal merits attention.

Automating 13F Data Extraction

Manual 13F analysis is unsustainable. A single quarter might contain 200+ filers, each with 50-200 positions. Here is the automation approach:

Data Sources

EDGAR provides direct XML feeds:

Index files listing all 13F filings by date
HTML/XML documents for individual filings
Master file providing filer history

Extraction Pipeline

Daily scan: Query EDGAR for new 13F filings
Parser: Extract positions from HTML tables or XML structures
Normalize: Standardize tickers, calculate changes from prior quarter
Store: Historical database enabling time-series analysis
Analyze: Calculate crowding scores, signal generation

The critical challenge: parsing quality. EDGAR documents vary in formatting. A robust parser handles inconsistent table structures, footnotes, and embedded notes.

10-K and Fundamental Data

Form 10-K provides the annual report every public company files. For quant strategies, key sections include:

Financial Statements: Balance sheet, income statement, cash flow—standardized and comparable across companies.

Management Discussion (MD&A): Qualitative context on performance, strategy, and risks. Natural language processing extracts sentiment signals.

Risk Factors: Companies disclose operational, legal, and market risks. Change detection identifies new risk emergence.

XBRL: Structured Fundamental Data

Since 2009, public companies embed XBRL (eXtensible Business Reporting Language) tags in their filings. This structured format enables systematic extraction.

Key metrics available:

Earnings per share (basic, diluted)
Revenue, operating income, net income
Total assets, liabilities, equity
Book value per share
Dividend per share

XBRL allows bulk fundamental data pipelines—extract all companies with P/E < 15 and positive earnings in seconds, not hours of manual searching.

Building the Pipeline

For a systematic approach:

Infrastructure Requirements

ETL system: Daily job pulling new EDGAR filings
Parser engine: Handle HTML, XML variations across thousands of filers
Historical database: Store filings enabling quarter-over-quarter analysis
NLP layer: Extract sentiment from MD&A and risk factors

Signal Generation Ideas

From 13F data:

Follow the leaders: Track top-performing managers, extract their new positions
Crowding indicators: Percentage of top managers holding same stock
Flow signals: Aggregate buying/selling by manager tier

From 10-K data:

Earnings surprise prediction: NLP on MD&A versus consensus
Risk emergence detection: New risk factors signal operational challenges
Accounting quality: Footnote analysis identifying potential issues

Practical Considerations

Data Quality Challenges

EDGAR is not perfect:

Amendments require reprocessing
Restatements change historical data
Filing delays (45 days post-quarter) limit timeliness

Build version control: store exact filing content with timestamp, track amendments separately.

Regulatory Compliance

13F data is public information—free to use and distribute. However:

Do not present as trading recommendations
Understand the 45-day disclosure lag
Note that positions may have changed significantly since filing

Cost-Effective Implementation

You do not need expensive data vendors for EDGAR. Government-provided feeds are free. The investment is:

Engineering time to build extraction pipelines
Ongoing maintenance as EDGAR formats evolve
Computed infrastructure for large-scale parsing

For smaller funds, consider third-party providers that handle the extraction complexity and deliver clean, normalized datasets.

Conclusion

SEC EDGAR automation represents a core infrastructure investment for quant funds. The data is comprehensive, structured (mostly), and free.

The question is not whether to build systematic EDGAR analysis—it is how quickly you can operationalize the pipeline. Funds with automated 13F and 10-K processing:

Identify crowding before it becomes a risk
Extract fundamental signals at scale
Build research notebooks that compound over time

Start with 13F filing tracking—it is the highest-signal, lowest-complexity entry point. Expand to fundamental analysis as your infrastructure matures.

Related reading: For the fundamentals of automated SEC data, see our guide on XBRL filing analysis for RIAs. Also explore how market regime detection improves factor model decisions.

Quantscope automates SEC EDGAR analysis — 13F institutional tracking, 10-K fundamental screening, and insider transaction monitoring at $49/month.

Start a free analysis at quantscope.polsia.app