FPGA-accelerated financial analytics get real

Automated trading and near-real-time financial analytics have been hot topics for some years now. Large organizations such as Bank of America deploy massive compute clusters to do such things as calculate the present value of options, or to model credit derivatives, in a virtual arms race to make trades with ever-higher levels of accuracy and ever-lower latencies. The banks and hedge funds that win this race each day have the potential to make millions or billions of dollars in extra profits. Vast amounts of power are consumed to drive the world’s most advanced supercomputers in a constant quest to produce, well, nothing at all… Just information used to move wealth from one global pants-pocket to another.

And all in the pursuit of market efficiency, of course. Hopefully all this money-shuffling is good for my meager retirement portfolio.

Editorializing aside, there has been a lot of buzz about the role of accelerators, including FPGAs, in financial applications. XtremeData this week generated some press regarding their new accelerated database for analytics. Their solution is attractive because it combines an FPGA module with an industry-standard HP Proliant server to accelerate specific algorithms (in this case SQL queries) by 15X over software-only equivalents.

As an industry, we need more turnkey solutions that highlight the benefits of FPGA acceleration. With enough such applications out there, the demand for programming and hardware platform solutions for other, possibly unrelated applications will increase.

Assuming, of course, all this financial alchemy doesn’t once again turn gold into lead.

Advertisements

Leave a comment

Filed under News Shmews, Reconshmiguration

An open letter to DeepChip readers

DeepChip? What’s that?

DeepChip is the website of John Cooley and home of the perennially unofficial and irreverent ESNUG (East Coast Synopsys Users Group). DeepChip has evolved over the years – almost two decades now – into a highly popular site for discussing design methods and tools of all flavors, with a particular focus on ASIC and, to a lesser extent, FPGA hardware design.

In recent weeks and months there have been a flurry of comments on DeepChip from vendors of C-to-FPGA tools, and from users of those tools, culminating with a long-overdue acknowledgement from John that such tools really are gaining traction. See “I Sense A Tremor In The Force”.

I found the dialogue somewhat heartening; it’s good to see the conservative world of ASIC design finally starting to embrace these tools. But I also found the theme of the whole debate – that these tools are new, untested and exotic – a little amusing. And so, during the week of the Design Automation Conference, here is my open letter to the DeepChip community:

Subject: C-to-FPGA Users: Who Are These People?

Here are some perspectives on the recent C-to-hardware debates….

Are there actual users of this stuff? Absolutely. The real question is, who are these people?

C-to-hardware tools are not yet common among traditional hardware designers. Yes, there are successful ASIC tapeouts and some scattered successes in, for example, consumer video processing on FPGAs. We are hearing more of these successes every year. But for the vast majority of hardware designers, RTL methods using VHDL and Verilog are still the preferred route. Particularly in a downturn, what project leader wants to risk their career on a new design method?

The move to higher level methods of design will happen; it’s just a matter of time, and of getting a critical mass of success stories with clearly stated benefits. We’ve seen this before, by the way… VHDL and Verilog did not take over from schematics overnight, and that move was less of a leap of abstraction than the current push from RTL into ESL.

So where is the action in C-to-hardware?

It’s on the software side of the world. It’s in embedded systems for defense and aerospace, it’s in reconfigurable computing research groups. it’s in financial computing and life sciences. It’s in places that do not have significant hardware development expertise. It’s in places where Deepchip.com is not widely read, and where “EDA” and “ESL” have little or no meaning.

I can state emphatically that C-to-FPGA tools really do work. Impulse C, for example, has users worldwide who are applying their C programming skills to create hardware coprocessors for embedded systems-on-FPGA, or to move processing intensive algorithms into dedicated FPGA logic, using the newest FPGA-accelerated computing platforms.

Are these tools perfect? By no means. Any user of Impulse C would report similar frustrations – but also the productivity benefits – that we’ve seen regarding other C-to-hardware tools. All of these tools have their peculiarities, and all require a certain amount of C-language refactoring in order to achieve acceptable performance. All of these tools require “best practices” training. However, I believe all of our tools have now matured to the point where that level of refactoring can be performed by a skilled software programmer, with little or no prior knowledge of RTL.

To summarize… I believe we are nearing a point at which traditional hardware engineers will begin moving en-masse to higher-level tools, including C-to-hardware. There will finally be a payoff for the ESL vendors that have been pushing these technologies forward, and a bigger payoff in productivity for the development teams that take the leap and use ESL for complex systems. But I also believe the bigger, unreported story is that a new generation of FPGA programmers is emerging, blurring the distinction between hardware and software for embedded and high performance computing systems.

David Pellerin, CEO
Impulse Accelerated Technologies

2 Comments

Filed under News Shmews, Reconshmiguration

Loring Wirbel on “Loose threads and blank slates”

Loring Wirbel (FPGA Gurus blog) provides good perspective on recent developments and setbacks in reconfigurable architectures, and the risks faced by FPGA startups in the current environment:

Loose Threads and Blank Slates

Leave a comment

Filed under Reconshmiguration

CSwitch closes its doors

On the heels of similar shutdowns last year of Mathstar and Ambric, the news broke earlier this month that reconfigurable device startup CSwitch has now shut down, the apparent victim of still-frozen capital markets.

This is unfortunate, a setback for reconfigurable computing overall. But it’s not surprising given the history of new and exotic reconfigurable devices. I still believe that, somewhere out there, there is a reconfigurable device architecture that can find its market. But as I’ve opined before, such a device needs to be programmable using higher-level methods, using established software or hardware programming languages. If I were starting a programmable device company – which I wouldn’t, by the way, because I’m not rich enough, smart enough, or maybe not dumb enough – I would start with a target application first, then decide how to program that application. And then… only then… I would have some very smart people design a programmable hardware device and a set of design tools and libraries that are optimized for that application, and that support that programming method.

There is a very good series of articles covering this topic in EE Times: See FPGA Startup Crunch.

Leave a comment

Filed under News Shmews, Reconshmiguration

University of Florida preps Novo-G FPGA cluster

The CHREC team at the University of Florida has announced a new reconfigurable computing cluster.

The Novo-G system is being built using FPGA accelerator cards provided by GiDEL and Altera. The system will have 96 Altera Stratix-III FPGA devices, installed into 24 networked servers with 576GB of memory and 20Gb/s InfiniBand for interconnection.

According to Professor Alan George, the purpose of Novo-G is to “advance and prove reconfigurable computing technologies at a level of scale, performance, and productivity unprecedented in this field, for applications from satellites to supercomputers”.

Novo-G is based on PCI Express FPGA cards provided by GiDEL and populated with FPGAs provided by Altera. Support for C programming of these cards has been enabled with an Impulse C Platform Support Package (PSP) developed by Rafael Garcia at the CHREC lab.

More information about this project can be found here.

Leave a comment

Filed under News Shmews

Medical imaging gets an FPGA boost

FPGAs are finding increased use in medical electronics. Frost and Sullivan reported in 2007 that FPGAs in medical imaging, including X-Ray, CT, PET, MRI, and ultrasound, already represented as much as $138M in revenue to FPGA companies, with CT alone representing $10M or more of that amount. Steady growth in these applications was forecast through 2011.

FPGAs assist medical imaging in two areas, detection and image construction. The detection part of medical imaging is an embedded systems application, with real-time performance requirements and significant hardware interface challenges. Image reconstruction, on the other hand, is more like a high-performance computing problem.

Image capture and display in computed tomography involves synchronizing large numbers of detectors arranged in a ring around the patient, in the large doughnut structure that we associate with CT, MRI and PET scanning. These detectors are often implemented using FPGAs, many hundreds of them, and already represent a large and profitable market for programmable logic devices.

While FPGAs are well-established in the detector part of the imaging problem, they can also help solve a significant problem in image reconstruction. They do this by serving as computing engines – as dedicated software/hardware application accelerators.

Tomographic reconstruction is a compute-intensive problem; the process of creating cross-sectional images from data acquired by a scanner requires a large amount of CPU cycles. The primary computational bottleneck after data capture by the scanner is the back-projection of the acquired data into image space to reconstruct the internal structure of the scanned object.

University of Washington graduate researchers Nikhil Subramanian and Jimmy Xu, working under the direction of Dr. Scott Hauck, recently completed a project evaluating the use of higher level programming methods for FPGAs, using back-projection as a benchmark algorithm. Nikhil and Jimmy achieved well over 100X speedup of the algorithm over a software-only equivalent. The target hardware for this evaluation was an XtremeData coprocessor module. This module, the XD1000, is based on Altera FPGA devices and serves as coprocesser to an AMD Opteron processor running Linux, via a HyperTransport socket interface.

This project, which was funded in part by a $100,000 Research and Technology Development grant from Washington Technology Center, was intended to determine the tradeoffs of using higher-level FPGA programming methods for medical imaging, radar and other applications requiring high throughput image reconstruction.

The key to accelerating back-projection is to exploit parallelism in the computation. Working in cooperation with Dr. Adam Alessio of the UW Department of Radiology, the two researchers converted and refactored an existing back-projection algorithm, using both C-to-FPGA (the Impulse C tools) and Verilog HDL, to evaluate design efficiency and overall performance.

This conversion, which included refactoring the algorithm for parallel execution in both C and Verilog, took 2/3 of the time when working in C than when working in Verilog. Perhaps more importantly, the two researchers found that later design revisions and iterations were much faster when working in C, with as little as 1/7 the time being required to make algorithm modifications when compared to Verilog.

The quick success of this project showed how even first-time users of C-to-FPGA methods can rival the results achieved from hand-coding in HDL, with surprisingly little performance penalty and faster time-to-deployment.

The results of this study have been published as Nikhil’s Master’s Thesis, which is available here: A C-to-FPGA Solution for Accelerating Tomographic Reconstruction.

1 Comment

Filed under News Shmews, Reconshmiguration

Xilinx earns VDC “Best of Show” at Embedded Systems Conference

VDC Press Release: Best of Show for ESC 2009

I like to fantasize that our two Impulse C demonstrations running in the Xilinx booth had something to do with this. Or perhaps it was the video processing workshop featuring C-to-FPGA methods that impressed the distinguished panel of judges.

More likely it was the timing of announcements from Xilinx of their new Virtex-6 and Spartan-6 devices.

Leave a comment

Filed under Embed This!, News Shmews