BETA
This is a BETA experience. You may opt-out by clicking here

More From Forbes

Edit Story

At NYSE, The Data Deluge Overwhelms Traditional Databases

Following
This article is more than 10 years old.

NYSE and NYSE Technologies, its technology subsidiary, found that the continuing growth of stock market data, the demand for more analytics, and, thanks to regulators, lots more reporting, were too much for its existing database.

NYSE Technologies receives four to five terabytes of a data a day and uses it to do complex analytics, market surveillance, capacity planning and monitoring.

The company had been using a traditional database, said Emile Werr, head of product development, NYSE Big Data Group and global head of Enterprise Data Architecture and Identity Access Management for NYSE Euronext . the existing system couldn’t handle the workload -- it took hours to load and had poor query speed.

NYSE turned to the IBM Netezza platform because it couldn’t accomplish its goals with traditional database technology, Werr said.

“We started five years  go and now we are more mature in the industry with using MPP (massively parallel processing) systems, and we have shown significant ROI, in being able to do complex analytics while managing the footprint,” said Werr.

“NYSE needs to store and analyze seven years of historical data and be able to search through approximately one terabyte of data per day, which amounts to hundreds in total,” added Werr. “The PureData System for Analytics powered by Netezza provides the scalability, simplicity and performance critical in being able to analyze our big data to deliver results eight hours faster than on the previous solution, which in our world is a game changer.”

NYSE’s initial focus was on trading surveillance of market makers and broker-dealers’ trading platforms. A second concern was capacity planning.

“The New York Stock Exchange SLAs (service level agreements) are stringent,” said Werr. “The system need to be 100 percent fault tolerant. When systems cross capacity thresholds, additional capacity would be automatically engaged and trading would continue to flow without interruptions.”

Werr said it became clear that traditional database technology would not do what NYSE needed.

“Extremely large data volumes, data integration complexities, market surveillance and  ad hoc analytics requirements took a large number of IT resources to babysit the environment and constantly tune it. The systems became too complex and slow,” Werr added.

To run analytics, data had to be extracted out of the database into applications like SAS and proprietary NYSE apps to perform necessary analysis.

Werr said NYSE Technologies has figured out how to use all its data assets in an efficient and cost-effective manner. The firm has extended its data warehouse with a distributed file source, he added.

“Big data for us is augmentation between systems like Netezza and a set of technologies like Hadoop and a distributed file system and identifier tiers that orchestrate data access. NYSE big data is all about taking that to the next level and packaging it so it can be dropped into an organization and leveraged so they could continue to support the innovations in big data.”

Phil Francisco, vice president of big data product management at IBM, said Werr had developed some interesting ways to load archival data into Netezza very quickly so NYSE can run surveillance analytics against records a few months back, or a few years back.

“Typically they will have less than a year’s worth of data in Netezza but they can always load data from an archive.” With the methods Werr developed, NYSE can look for long-running patterns. Emile was the architect for that -- how to use a high performance data warehouse around data retention.”

“NYSE continues to push the envelope for high performance, scalability and reliability,” Werr said. “NYSE has implemented large network pipes across data centers and trading systems.  We can move data around  very quickly.  Data needs to move in and out of analytics systems (like Netezza) fast.

NYSE Technologies makes its systems available for purchase and installation behind a firewall or as a service. The system is fast -- in terms of analytics; it is not designed for high frequency trading. It refreshes at one-minute intervals, near real-time in the analytics world.

Some broker-dealers ask for data at a specific point in time, such as the Flash Crash so they can test their algorithms against it. Moving that data to a firm can be expensive, so NYSE Technologies leaves it in their data center and firms can test against it without moving the day’s data.

“A lot of firms want to get data on demand while leaving it in our enterprise,” he explained. The data can be offered in raw form or customized to make it more user friendly.