- December 2013
- November 2013
- October 2013
- September 2013
- August 2013
- July 2013
- May 2013
- April 2013
- March 2013
- February 2013
- January 2013
- December 2012
- November 2012
- October 2012
- September 2012
- August 2012
- July 2012
- June 2012
- May 2012
- April 2012
- March 2012
- February 2012
- January 2012
- December 2011
- November 2011
- October 2011
- September 2011
- August 2011
- July 2011
- June 2011
- May 2011
- April 2011
- March 2011
- February 2011
- January 2011
- December 2010
- November 2010
- October 2010
- September 2010
- July 2010
- June 2010
- May 2010
- April 2010
- March 2010
- February 2010
BOOKMARK AND SHARE
This is part five of a six-part series by Mike O’Hara, looking at the usage of FPGAs and other types of hardware acceleration in the financial trading ecosystem.
Previous articles in this series have focused mainly on Field Programmable Gate Array (FPGA) technology. However, a number of firms are now looking at how the Intel Sandy Bridge architecture can help them accelerate their trading infrastructures, rather than following the FPGA route.
I spoke with David Barrand, EMEA Head of Financial Services and David O’Shea, Relationship Manager at Intel, to gain a better understanding of how Intel’s latest technology is helping firms to compete in ever-faster trading environments.
HFT Review: What trends are you currently seeing in the way trading technology is evolving?
David Barrand: If you look back a year or 18 months, there was a lot of talk about GPGPUs & FPGAs cutting a swathe through the businesses that we deal with. But what we’ve actually seen is that the rate at which Intel’s technology is progressing is such that the advantages that perhaps were there from those alternative technologies are now falling away.
Intel can afford to invest and maintain that pace of advancement because the volume at which we sell our technologies is so large. What that means to the market – in terms of being able to conduct high frequency trading – is lower costs for all. It’s almost a democratisation that lowers the barriers to entry and allows far smaller firms, or firms that don’t have such a broad cost base, to compete.
Companies for whom HFT technology is their core business are now selecting Intel to run that business across the board. So not only has the gap to FPGAs closed, but firms can see it progressing such that this is the right technology to bet on in the future.
HFTR: What kind of technological advancements are we talking about here?
David O’Shea: There’s an old saying, “With time, truth and technology change”. What was true in 2005 and what was true in 2012 are two different things.
In 2005, when direct feeds started hitting the market and people started using FPGAs for algo trading, we were struggling with the physics of a 150W one-core chip. We also had some big challenges around manufacturing at that moment in time.
But with the Intel “tick-tock” model, approximately every two years we bring a new architecture to market, which in the next year we shrink in order to get more performance and more cores.
When we introduced the Core Architecture in the Nehalem in 2008, it made a big difference in the number of applications that people would consider putting on an the CPU or the FPGA.
HFTR: In what way?
DOS: FPGAs did a good job of bringing in data, normalising it and then distributing it to the CPU for higher level functions. But you have to consider the entire process. At one time people envisioned doing the entire process on an FPGA, but that just has not happened for many resaons.
FPGA-based feed handlers are very common, but when people want to do more, they end up sending the data to a CPU socket. There were visions at one time about doing a lot of the calculation on the FPGA, but programming FPGAs are fairly challenging. It’s certainly not as easy as performant code with higher level languages.
We found that people started looking at using Cuda® as an alternative for the math side of things, but this too can be a more challenging development environment. So, people take applications that were targeted at an FPGA for math, and attempt to run those computations on GPGPUs. But then they were losing the benefit of doing everything on the FPGA card and sending out the data as quickly as though that is possible, because once you leave the card, you incur a latency hit going across the bus.
Fragmentation of the code base is a bad thing, and this type of GPU + FPGA + CPU solution makes it worse.
HFTR: What sort of latency hit are we talking about here?
DOS: It depends on how frequently people refresh these cards, some of the FPGA cards are still PCI Gen1. But once you have to leave the FPGA card, you have to handle the throughput, and actually process the work that allows you to make a trade.
All of that work tends to happen today on the CPU, and our latest CPU platform, Sandy Bridge, is really another game changer.
HFTR: What are the key elements of Sandy Bridge from a trading architecture perspective?
DOS: With Sandy Bridge, we’ve introduced an embedded I/O controller on the socket, and Data Direct I/O. These two enhancements mean we can move the data directly from NIC to the Socket and into L3 Cache to L2 Cache where it can be used by the application. This eliminates an I/O controller hop and the round trip latency for accessing memory. That provides a big improvement in latency and is why SolarFlare and Mellanox have been able to publish incredible reductions in their latency by using this technology. Mellanox claims a sub 1 microsecondtime.
Then you have PCI Gen3, which will let you support a 40Gig connection. You can meet the data thoughput requirements and reduce latency. Most of the 40gig cards are quad-port 10Gig cards, but the fact is you’re able to support that level of bandwidth, which is a significant amount of throughput.
So now you have data being taken directly from the NIC card into L3, you have kernel bypass, you have incredible bandwidth to the sockets, you have very low latency into L2 being used by the application, and you’ve got 8 very powerful cores per socket. And that number is only going to go up.
HFTR: How does all of this stack up versus an FPGA-based trading architectures?
DOS: We’ve been running tests at various HFT firms and a number of companies that write software to run on our platform, and one of these firms has already found that the Sandy Bridge server with the necessary kernel bypass NICs is outperforming an FPGA. And this was only released in March.
The important thing is that you have to think about the entire process. If the FPGA accelerates one piece, and you write that in Verilog or VHDL, and then you take your maths section and write it in Cuda or OpenCL, and then you write all of your transaction-type stuff in x86 compatible languages like C or C++, you’ll need expertise in at least three different languages before doing anything else.
DB: Just to re-emphasise, Intel knows how expensive it is to maintain an architecture and continue to develop it. We’ve run many architectures over the years. It is hard work and it takes a lot of effort and money to maintain this level of development. If you’re trying to do that with FPGAs & GPGPUs, which have a far more limited market, then the commercials of that actually limit what you’re able to do to develop in a profitable manner. So people may look back in a few years and point at this period we’re in at the moment and say, “that was the era of the GPGPUs and FPGAs but now we’re on general purpose CPUs again”.
HFTR: Gentlemen, thank you for your insight.
The final part of this series, which will be published at www.hftreview.com/pg/fpga in early September, will have some predictions from various experts on the future of FPGA and hardware-accelerated trading.
Related content
News: InRush(TM) 3 from Redline Trading Solutions Delivers Full-Depth Book Updates in less than 1.2 Microseconds
11 September 2012 – Redline Trading Solutions
Redline Trading Solutions, Inc. (www.redlinetrading.com) today announced InRush™ 3, its next-generation Accelerated Ticker Plant. Already in production at key customer …
Event: Automatic GPU Computing For Derivative Pricing Models (London)
Starting on: Sunday 16th of October, 2011. Posted by: HFT Review.
Seminars on GPU-Accelerated Derivative Pricing Models
New York, September 27, 2011
London, October 17, 2011
Presented by SciComp Inc., NVIDIA Corporation, Dell Inc., and Microsoft Corporation.
News: Enyx announces expansion to Asian markets
6 February 2013 – Enyx
Nicolas Karonis joins Enyx as Managing Director, AsiaParis, France and Sydney, Australia – February 6, 2013 – Enyx leader in FPGA accelera…
News: Impulse C Integration to Solarflare AOE Programmable NIC Shortens Development Time
6 February 2013 – Impulse Accelerated Technologies
New Impulse C to FPGA framework enables software developers to easily insert their own custom logic into Solarflare 10 Gbps programmable network interface card. Belle…
Blog: HFT Strategies, Regulation and Technology – An Interesting Mix
HFT Review 31 March 2011
Blog: Leveraging FPGA-Enabled Acceleration Platforms For Financial Trading
Accelize 31 October 2012
News: Why trading technology is the best medicine for institutional investors – custom-built TCA, Leveraging Technology for Customized Transaction Cost Analysis
19 October 2012 – OneMarketData In The News
While high-speed trading has been grabbing the headlines another evolution is occurring, one that leverages the same super-fast computers. As a corollary to HFT’s low-latenc…
News: Panasas Introduces Activestor 14
20 September 2012 – Panasas, Inc.
SSD-Accelerated Scale-out NAS Appliance Delivers Extreme Bandwidth and Small File Performance for Big Data Workloads SUNNYVALE, NEW YORK, LONDON, Sept. 17, 2012 &nda…;