Back to Blog
Educational

Why Your AI Agent Needs Real SEC Data

Most financial AI agents run on processed data news summaries, analyst recaps, sentiment scores not the original SEC filings sitting in EDGAR. That distance from the source is not a minor inconvenience. It is a structural constraint that limits what those agents can reliably do, and it compounds at every layer downstream. This piece covers what SEC filings actually contain, which datasets carry the most signal, and why the data quality problem starts at the source and gets worse from there.

AI Agents
SEC EDGAR
Financial Data API
insider transactions
13F Filings
Supply Chain Data
Financial Data
stock market data
form 4
Insider Trading
Institutional Holdings
Fintech
Alternative Data
Data Engineering
Finance Api
12 min read
AI Agent

The Problem With Building on Derivative Data

News-based pipelines are the path of least resistance. There are feeds for them, libraries that parse them, and enough volume that it feels like comprehensive coverage. The issue is not the format, it is the position in the information chain.

Financial news describes things that have already happened. A Form 4 insider transaction filing lands at the SEC within two business days of the trade. Journalists write about that trade sometime after. Markets move on the filing date. By the time an AI agent retrieves the article and processes it, the relevant price action has often already occurred. The agent is not analyzing an opportunity, it is reconstructing history.

Incompleteness is the second problem. Take a 13F filing. A single institutional manager's quarterly disclosure can contain hundreds of individual positions, detailing every equity holding, the change from the previous quarter, and the concentration of the overall portfolio. A news article covering that same filing will mention the headline grab: the largest new position, or a significant exit. The rest of the data does not make it into the article. For an agent doing real analytical work, the rest of the data is where the signal lives.

The third problem is structural noise. Regulatory filings follow standardized schemas enforced by the SEC. Every Form 4 submitted by every company looks the same. News copy does not. It carries framing, editorial judgment, inconsistent terminology, and factual errors that aggregators sometimes propagate for days before corrections appear. Building a machine consumption layer on top of that is harder than it looks, and the downstream reasoning suffers for it.

What EDGAR Actually Contains

EDGAR is the SEC's public database for regulatory submissions. Every publicly traded company in the United States files through it. Every institutional money manager above a certain threshold files through it. The data is timestamped, organized by filer and form type, and technically accessible to anyone.

The challenge is that raw EDGAR data is not ready for consumption at scale. Filing formats vary across years and filer types. Schemas shift. Building a reliable, historically consistent data layer on top of raw EDGAR takes significant engineering effort, which is why most developers default to derivative sources instead.

SEC EDGAR
SEC EDGAR

The filing types most relevant to financial AI applications are:

Form 4: The insider trading disclosure form required when executives, directors, or major shareholders transact in their own company's stock. Filed within two business days of the trade, it records the exact date, share count, price, and transaction type for every reported transaction.

Form 13F: The quarterly institutional holdings disclosure required from investment managers with over $100 million in assets under management. It covers every long equity position held at the end of the reporting period, including position size and quarter-over-quarter changes.

Form 144: Filed before a planned sale of restricted or control securities. Unlike Form 4, which is retrospective, this filing is predictive. It discloses the intended volume and timing of a sale before it executes.

S-1 and IPO Filings: The registration statements companies submit ahead of initial public offerings. They contain the proposed price range, offering size, underwriter relationships, lockup structures, and the company's full financial history up to the point of filing.

Supply Chain Disclosures: Material customer, supplier, and partner relationships disclosed across SEC filings. Companies are required to name customers that account for more than ten percent of revenue, creating a structured dependency map across the public equity universe.

None of these data types exist in equivalent form anywhere else. They are not replicated in news feeds. They are not summarized with sufficient completeness in analyst reports. They are the original record, and every downstream source is a derivative of them.

Insider Transactions: When the People Closest to the Business Put Capital Down

Insider Transactions
Insider Transactions

Executives and directors know their companies in ways that no external analyst can match. They see pipeline data, margin trends, hiring trajectories, product timelines, and competitive dynamics long before those dynamics show up in reported results. When those same people choose to buy shares on the open market, using their own capital, that decision carries a specific kind of weight.

The research on insider buying patterns is consistent over long time horizons. Single transactions are often noise. An executive selling shares might be managing a tax position, funding a home purchase, or diversifying a concentrated position. But cluster buying, multiple insiders buying within a short window at similar prices, has historically preceded above-average returns. The logic is simple: that level of coordination, even informal, suggests shared conviction about something the market has not yet priced.

Selling clusters are more complicated to interpret but no less informative. Broad insider selling ahead of a disappointing earnings period is a pattern that appears in the data with enough regularity to be worth tracking systematically. An AI agent with access to real-time Form 4 data can surface these patterns as they develop, not after the fact when they become obvious in retrospect.

RyxelData's Insider Transactions dataset sources directly from Form 4 filings and delivers executive buys, sells, and option exercises in normalized JSON, structured for programmatic querying without any EDGAR preprocessing.

13F Holdings: A Quarterly Map of Institutional Conviction

Every three months, hundreds of the largest investment managers in the world submit a document to the SEC that lists every long equity position they hold. The combined AUM behind these filings runs into the tens of trillions of dollars. 13F data is, in aggregate, a direct view into how the most sophisticated pools of capital in the world are positioned.

The analytical applications are extensive. Quarter-over-quarter position changes reveal which companies are attracting new institutional interest versus which are being systematically reduced. Concentration analysis across managers can surface emerging consensus around specific sectors or themes. Tracking a single fund's composition over multiple quarters reveals whether conviction is building or dissolving on particular names.

For AI agents running competitive intelligence, portfolio construction, or sector research workflows, 13F data provides an input that has no equivalent. News coverage of institutional activity is selective and incomplete. The filings themselves are comprehensive by legal requirement.

There is a caveat worth acknowledging: 13F filings are delayed by up to 45 days after the quarter ends, and they disclose long positions only, not shorts or derivatives. Sophisticated use of this data accounts for those limitations. Even with the delay, the directional information about where large capital was positioned and how it moved remains one of the more valuable inputs available to financial AI systems.

RyxelData's Institutional Holdings dataset aggregates 13F filings into a structured, queryable format, making that information accessible without requiring custom EDGAR parsing infrastructure.

Proposed Sales and Supply Chain: The Datasets Most Agents Have Never Seen

Proposed Sales (Form 144)

Form 144 is largely absent from financial AI workflows, which is a notable gap given what it contains.

When an insider plans to sell restricted or control securities, they are required to file Form 144 before the sale occurs. The filing discloses the proposed number of shares and the approximate timing. That sequence matters: the disclosure comes before the transaction, not after it. For agents designed to anticipate insider supply dynamics rather than just observe completed transactions, Form 144 data provides a forward-looking dimension that Form 4 simply cannot offer.

The signal is most relevant when proposed sales cluster across multiple insiders in a compressed timeframe. A single planned sale reflects individual circumstances. Several executives filing Form 144 within days of each other, ahead of a period where the stock subsequently underperforms, is a different kind of data point. RyxelData's Proposed Sales dataset structures these filings for systematic monitoring.

Supply Chain Relationships

Among the datasets that most financial AI systems lack entirely, supply chain disclosures stand out.

SEC rules require companies to name customers that account for more than ten percent of revenue. Supplier and partner relationships with material financial significance also appear in filings. When that disclosure data is aggregated across thousands of companies and structured into a queryable graph, it becomes something genuinely useful: a map of economic dependencies between public companies, sourced from the filings themselves rather than inferred from news or estimated from industry research.

The analytical applications are not hypothetical. When a major customer relationship changes, the impact on the supplier is financially significant. When a company that represents a large share of another company's revenue runs into its own problems, the downstream effects follow a path that is traceable through declared supply chain relationships. An AI agent that can reason over those relationships systematically has an analytical capability that is almost entirely absent from news-based pipelines.

RyxelData's Supply Chain dataset structures customer-supplier-partner relationships from SEC filings into a clean, accessible API layer.

The Data Quality Problem Compounds Downstream

Raw EDGAR access is technically free. That fact leads some developers to conclude that the data layer is not a real engineering challenge, just a matter of pointing at the right government endpoint. That conclusion does not hold at scale.

The Data Quality
The Data Quality

EDGAR filing formats have evolved over decades. Schema consistency varies by form type, filing year, and filer category. Historical data requires non-trivial preprocessing before it can be queried reliably. For a production AI agent that needs to reason over thousands of filings with consistent structure, building that normalization layer in-house is a substantial undertaking that precedes any actual analytical work.

The compounding problem is that data quality issues do not stay contained at the ingestion layer. An agent reasoning over inconsistently structured inputs will produce inconsistently reliable outputs, regardless of how sophisticated the model is. The quality ceiling of the output is set by the quality floor of the input.

RyxelData normalizes SEC-sourced data into consistent JSON schemas delivered through a REST API with an OpenAPI 3.1 specification. A TypeScript SDK is available for direct integration, and a Model Context Protocol server is designed specifically for AI agent workflows, allowing agents to query insider trades, institutional holdings, proposed sales, supply chain relationships, and IPO data through a single API key with no preprocessing overhead.

Dataset subscriptions are available individually, starting at $19 per month for institutional holdings and $49 per month for insider transactions. The all-in-one bundle at $149 per month covers every current and future dataset under a single subscription. Month-to-month billing, cancel anytime.

Pro Tip: The data layer is not a commodity choice. An agent built on normalized, SEC-sourced inputs operates from a fundamentally different informational foundation than one built on scraped news, and that difference shows in every output it produces.

Institutional-Grade Financial Data, Built for Developers

RyxelData surfaces insider transactions, 13F holdings, proposed sales, supply chain relationships, fund profiles, and IPO filings through a clean REST API, designed for the teams building the next generation of financial AI applications. Live sample responses are available for every endpoint in the documentation at ryxel.io. No enterprise sales process. No long-term commitments.

Ryxel's Datasets
Ryxel's Datasets

Frequently Asked Questions

What makes SEC data more reliable than financial news for AI agents?

SEC filings are the legal record. Companies and institutional managers submit them under regulatory obligation, following standardized schemas. Financial news is a downstream product of that same data, filtered through editorial judgment, often incomplete, and always delayed. For agents where output reliability matters, the informational difference between a primary filing and a news summary is not marginal.

What is Form 4 and why does it matter for financial AI?

Form 4 is the SEC disclosure required when corporate insiders transact in their own company's securities. It is due within two business days and records the exact date, share count, price, and transaction type. Because insiders operate with privileged knowledge of their company's direction, their trading patterns have long been tracked by institutional investors as a leading signal. For financial AI agents, Form 4 data provides a high-integrity, legally verified input with genuine informational content.

How does 13F data help AI agents analyze institutional positioning?

Form 13F requires institutional managers with over $100 million under management to disclose every long equity position at the end of each quarter. Aggregated across hundreds of managers, that disclosure creates a comprehensive view of where institutional capital is concentrated and how it is moving. AI agents processing 13F data can identify emerging patterns in institutional conviction that are not visible through any other public channel.

What is the difference between insider transactions and proposed sales data?

Insider transaction data from Form 4 documents trades after they have executed. Proposed sales data from Form 144 documents the intention to sell before the transaction occurs. For agents focused on anticipating insider supply dynamics rather than observing completed trades, Form 144 provides a forward-looking signal that Form 4 cannot replicate.

Why do AI agents need normalized financial data rather than raw EDGAR feeds?

Raw EDGAR data is publicly available but not structured for reliable programmatic consumption at scale. Filing formats have shifted over decades, schema consistency varies, and building a historically stable data layer on top of raw EDGAR requires significant preprocessing before any analytical work begins. Normalized API layers standardize that data into consistent schemas, removing the infrastructure overhead and letting agents query clean, structured inputs directly.

Which RyxelData datasets are currently available?

Six datasets are live: Insider Transactions, Institutional Holdings (13F), Proposed Sales, Funds, Supply Chain, and IPOs. Earnings, earnings calendar, dividends, dividends calendar, splits, and financial statements are in development. Individual datasets start at $19 per month, with an all-in-one bundle at $149 per month covering every current and future dataset on a single subscription.