Peter Lawrey


    This article was originally published at the Vanilla #Java blog, and is reproduced here with the author’s permision.


    There are conflicting views as to what is the best solution for high frequency trading. Part of the problem is that what is high frequency trading varies more than you might expect, another part is what is meant by faster.

    My View

    If you have a typical Java programmer and typical C++ programmer, each with a few years experience writing a typical Object Oriented Program, and you give them the same amount of time, the Java programmer is likely to have a working program earlier and will have more time to tweak the application. In this situation it is likely the Java application will be faster. IMHO.

    In my experience, Java performs better at C++ at detecting code which doesn’t need to be done. esp micro-benchmarks which don’t do anything useful. 😉 If you tune Java and C++ as far as they can go given any amount of expertise and time, the C++ program will be faster. However, given limited resources and in changing environment a dynamic language will out perform. i.e. in real world applications.

    In the equities space latency you need latencies sub-10 us to be seriously high frequency. Java and even standard OOP C++ on commodity hardware is not an option. You need C or a cut down version of C++ and specialist hardware like FPGAs, GPUs.

    In FX, high frequency means a latencies of sub-100 us. In this space C++ or a cut down Java (low GC) with kernel bypass network adapter is an option. In this space, using one language or another will have pluses and minuses. Personally, I think Java gives more flexibility as the exchanges are constantly changing, assuming you believe you can use IT for competitive advantage.

    In many cases, when people talk about high frequency, esp Banks, they are talking sub 1 ms or single digit ms. In this space, I would say the flexibility/dynamic programming of Java, Scala or C# etc would give you time to market, maintainability and reliability advantages over C/C++ or FPGA.

    The problem Java faces

    The problem is not in the language as such, but a lack of control over caches, context switches and interrupts. If you copy a block of memory, something which occurs in native memory, but using a different delay between runs, that copy gets slower depending on what has happened between copies.

    The problem is not GC, or Java as neither of these play much of a part. The problem is that part of the cache has been swapped out and the copy itself takes longer. This is the same for any operation which accesses memory. e.g. accessing plain objects will also be slower.

    private void doTest(Pauser delay) throws InterruptedException {
    int[] times = new int[1000 * 1000];
    byte[] bytes = new byte[32* 1024];
    byte[] bytes2 = new byte[32 * 1024];
    long end = System.nanoTime() + (long) 5e9;
    int i;
    for (i = 0; i < times.length; i++) {
    long start = System.nanoTime();
    System.arraycopy(bytes, 0, bytes2, 0, bytes.length);
    long time = System.nanoTime() - start;
    times[i] = (int) time;
    if (start > end) break;
    Arrays.sort(times, 0, i);
    System.out.printf(delay + ": Copy memory latency 1/50/99%%tile %.1f/%.1f/%.1f us%n",
    times[i / 100] / 1e3,
    times[i / 2] / 1e3,
    times[i - i / 100 - 1] / 1e3

    The test does the same thing many times, with different delays between performing that test. The test spends most of its time in native methods and no objects are created or discarded as during the test.

    YIELD: Copy memory latency 1/50/99%tile 1.6/1.6/2.3 us
    NO_WAIT: Copy memory latency 1/50/99%tile 1.6/1.6/1.6 us
    BUSY_WAIT_10: Copy memory latency 1/50/99%tile 2.8/3.5/4.4 us
    BUSY_WAIT_3: Copy memory latency 1/50/99%tile 2.7/3.0/4.0 us
    BUSY_WAIT_1: Copy memory latency 1/50/99%tile 1.6/1.6/2.5 us
    SLEEP_10: Copy memory latency 1/50/99%tile 2.2/3.4/5.1 us
    SLEEP_3: Copy memory latency 1/50/99%tile 2.2/3.4/4.4 us
    SLEEP_1: Copy memory latency 1/50/99%tile 1.8/3.4/4.2 us

    With -XX:+UseLargePages with Java 7

    YIELD: Copy memory latency 1/50/99%tile 1.6/1.6/2.7 us
    NO_WAIT: Copy memory latency 1/50/99%tile 1.6/1.6/1.8 us
    BUSY_WAIT_10: Copy memory latency 1/50/99%tile 2.7/3.6/6.6 us
    BUSY_WAIT_3: Copy memory latency 1/50/99%tile 2.7/2.8/5.0 us
    BUSY_WAIT_1: Copy memory latency 1/50/99%tile 1.7/1.8/2.6 us
    SLEEP_10: Copy memory latency 1/50/99%tile 2.4/4.0/5.2 us
    SLEEP_3: Copy memory latency 1/50/99%tile 2.3/3.9/4.8 us
    SLEEP_1: Copy memory latency 1/50/99%tile 2.1/3.3/3.7 us

    The best of three runs was used.

    The typical time (the middle value) it takes to perform the memory copy varies between 1.6 and 4.6 us depending on whether there was a busy wait or sleep for 1 to 10 ms. This is a ratio of about 3x which has nothing to do with Java, but something it has no real control over. Even the best times vary by about 2x.

    The code


    In ultra-high frequency, the core engine will be more C, assembly and custom hardware than OOP C++ or Java. In markets where the latency requirements of the engine are less tight C++ and Low GC Java become an option. As latency requirement become less tight, Java and other dynamic languages can be more productive. In this situation, Java is faster to market so you can take advantages of changes in the market/requirements.

    Related content

    News: Fixnetix Banks Another Win; Awarded Top Computer Growth Company in the Deloitte Technology Fast 500 EMEA 2012
    30 November 2012 – Fixnetix
    30 November, 2012 – London, UK – Fixnetix, managed services provider for ultra-low latency market data, trading and risk control, announced today a first place ran…

    News: Azul Systems Leverages Partner Ecosystem to Target New Innovative Solutions
    19 December 2012 – Azul Systems
    New ISV partner alliances provide enterprise solutions in Big Data, in-memory computing, enterprise data grids, complex event processing and cloud computing    LON…

    News: Mellanox and Bon Trade Benchmark over 598 Million Messages a Second over RDMA on Windows Server 2008 R2
    20 June 2012 – Mellanox
    Benchmark achieved while testing RDMA messaging across Mellanox’s FDR 56Gb/s InfiniBand interconnect solutions and HP Proliant Gen8 servers     SIFMA, NEW Y…

    News: StreamBase FX Study Reveals Increased Buy Side Use of FX Algos
    18 December 2012 – StreamBase
    Speed is still a key factor to secure better prices and achieve best execution NEW YORK, NY (USA) — December 18, 2012 — StreamBase Systems today an…

    News: Quincy Data Announces Launch of Extreme-Low Latency Wireless Market Data Over The McKay Brothers’ Microwave Network
    26 December 2012 – Quincy Data
    Quincy Data Creates the Competitive Edge in a Matter of Milliseconds [OAKLAND, CA], December 21, 2012 — Quincy Data is live with the Quincy Extreme Data (QED) Service …

    Blog: In-Memory We Trust?
    Steve Graves 31 August 2012

    Leave A Reply