This week we are finalizing our market data providers for our Alpha launch next month. After talking to 6 different vendors I am truly amazed on the complexity that has been added to a seemingly simple problem: Getting market price data streamed from one service provider (that collects the data from the different exchanges) to our our data server.
Connectivity
Different ways to connect for every service provider. While this was not totally unexpected, I would have hoped that some vendors had integrated some standard market data formats, for example FIX. This would have allowed us to use some open source libraries for connection handling and data parsing. We are looking for an Internet connection - dedicated line is not an option as we are planning on running our whole infrastructure on Amazon’s Webservices (AWS). The variety that we get offered, from local API (one vendor only offered a Windows DLL) to open socket to VPN tunnel is quite amazing. Unfortunately this increases the changing cost and makes it difficult / expensive to evaluate multiple vendors at the same time.
Data
The market data is collected from different exchanges and sometimes not even directly from an exchange but from another third party that aggregates data. Therefore each vendors structures and normalizes the data differently. Some for example always send consolidated volume with each price tick, others send sometimes incremental volume (that means we have to add it ourselves) and sometimes updated aggregated volume. So we have to be careful to look at the data structure in detail.
Pricing (or what is it with delayed vs. non-delayed data?)
That is the most confusing part of this. First we have to pay our data providers a fee to get their data pipe connected to us. Most of them have a flat monthly fee for this, some try to participate on our monthly user fees. So far so good.
But then we have to register directly with each exchange (we will start with NYSE, NASDAQ and AMEX) to be licensed to show market data to our customers (and even report personal data to the exchanges). While we do not have to pay a fee if we show/use delayed data, we have to pay a per-user fee for each customer that we provide with real-time data. You would think that in a competitive environment, prices would tend toward marginal cost of provisioning the service. Here the exchanges have $0 marginal cost for each additional user we sign up. The data is there anyway and it is already broadcasted to the data providers. So why can the exchanges still impose a tax on their real-time data (while they give it away for free if it is 15 min delayed)?
It looks like competition will finally get to this: Since Monday BATS Trading, an ECN, provides its real-time data to Yahoo! Finance (Press Release). They probably don’t do it for free, but they do it for a fixed fee that is not user based. That is a start. A little bit later that day, NASDAQ announced that they would provide a way to allow free access to its real time data as well. Google Finance for example uses this service. I heard that they charge a flat fee of $100,000 per month. Not really interesting for a startup, but again a start in the right direction.
I predict that in 2 years, real-time market data will be free of any license fees to the exchanges - of course data providers will (and should) charge for their service to collect, normalize and deliver market data. Hopefully they will standardize their interfaces a little bit.