Andy Greenberg at Forbes Magazine has featured FPGA cluster computing in an article titled A Compact Code-Breaking Powerhouse. It’s a good article, check it out.
Category Archives: Reconshmiguration
Altera today announced that its next-generation, 28nm devices will support partial reconfiguration. This means that Altera FPGA users will now be able to update small portions of an FPGA device, rather than having to re-synthesize, re-map and reprogram the entire device every time the application changes.
Xilinx has supported this feature for quite a few years, but has never made it a mainstream, supported feature of their ISE Design Suite development tools. However, in a recent email update to its customers, Xilinx announced with almost no fanfare that, in version 12 of the Xilinx tools, this important capability would finally be given its proper status as a supported feature. Perhaps the marketing department at Xilinx heard rumors coming from the other side of San Jose?
Corporate intrigue aside, this is an important piece of news. Why? Because software application developers first investigating FPGAs are surprised – even shocked – to learn how long the iteration times are for programming and debugging FPGA applications. And the larger the FPGA is, the worse the problem becomes.
For most software developers, being faced with the equivalent of a two, three, or even eight hour iteration time for compile-link-test is completely unacceptable, no matter how much potential increase in performance there may be.
And if, at the end of that long iteration time all the programmer has is a non-working, difficult-to-debug bitmap and a thousand-lines long report filled with hardware-esque warning messages? Then forget it. They might as well go use GPUs and CUDA.
So, why has it taken so long for this seemingly obvious feature to become mainstream? One can only assume that Xilinx and Altera management, their chip architects, and perhaps even their tools developers are hardware engineers first, and software engineers second. Perhaps they can only think of their devices as the poor-man’s ASIC. And their tools as a poor-man’s EDA.
Reliable, vendor-supported partial reconfiguration, including dynamic run-time reconfiguration, has the potential to solve so many problems for software application developers, and to broaden the market for FPGAs.
Partial reconfiguration allows developers to iteratively recompile, re-synthesize, re-map and re-test a specific portion of the FPGA in just a few minutes, rather than a few hours. This in itself is a huge benefit. Instead of having to set up elaborate, hardware-oriented simulations to verify correct behavior before hitting that compile button, you can just try it out, again and again, in the same way you would when developing software. Indeed, this method of design was predominant for FPGAs in the 1980s and early 1990s. But as device densities have grown, iterative design-and-test methods have become impractical. That was good for the hardware simulator business, but bad for software developers.
Run-time reconfiguration allows certain parts of the FPGA to remain intact and functioning, for example the I/O processing, while other parts are being updated on-the-fly. Need to change the filtering of a video signal to respond to a change in resolution? Wham, it’s done, and so fast the viewer doesn’t even see a flicker. Want to change the FPGA’s function without causing a system reboot? How about leaving the PCIe endpoint alone and just changing the core algorithm?
Dynamic reconfiguration can increase the effective size of an FPGA. Loading of partial bitmaps at run-time allows the capacity of these devices to grow in the dimension of time. If you don’t need all the hardware capabilities you have programmed into your FPGA all the time, then use dynamic reconfiguration to swap hardware modules in and out as needed. Use reconfiguration to do more, with less.
Partial reconfiguration opens up new FPGA markets. Using partial reconfiguration, the vendors of FPGA-based platforms for specific types of applications – for financial transaction processing and automated trading, for example – could provide a “minimally programmable embedded system” in which most of the FPGA logic is pre-designed, pre-optimized and locked down, while a small portion in the middle is left available for end-user customization. This potentially opens up whole new application domains that were previously not available for FPGAs; applications in which the platform vendor and the end user both have unique domain knowledge, and have their own critical IP to protect.
Again, these capabilities are not really new; the Xilinx partial reconfiguration features have been used successfully, for years, in domains such as software-defined radio. What’s changing is that partial reconfiguration is finally becoming officially supported. With uncertainties about its future and its supportability reduced, software and platform vendors will now begin using these features to enable new programming, debugging and operating features into their own products.
It’s about time.
This week I was in San Diego attending the International Plant and Animal Genome Conference. PAG is a conference that brings together academic and commercial researchers and product vendors, with a particular emphasis on agricultural applications. (One of the first signs I saw when entering the lobby was a sign directing attendees to a “Sheep and Cattle” workshop. I wondered who would be cleaning the hotel carpets.)
In an earlier post, Please pass the dot plots, I described how Greg Edvenson at Pico Computing used an FPGA cluster and C-to-FPGA methods to demonstrate acceleration of a DNA sequence comparison algorithm. The quick success of that project was reason enough for us to attend PAG and learn more about the computing problems in genomics. Where are acceleration solutions needed?
It’s clear there are problems aplenty to be solved. As one researcher said to us, “The amount of raw data being generated by DNA sequencers each month is outpacing Moore’s Law by a wide margin.” He went on to describe how his group routinely undocks and hand-carries their hard drives down the hall because the time required to move the generated sequencing data across their network is too long. Solutions are needed for accelerating data storage throughput, and for the actual computations to do such things as assemble whole genomes from the small chunks of scrambled DNA that currently emerge from sequencing machines.
Why all the data? The human genome is about 2.91 billion base pairs in length*, and it’s not the longest genome out there, not even close. We have more base pairs than a pufferfish (365 million base pairs) but far less than a lungfish (130 billion base pairs).
Evolution is a curious crucible.
Sequencing technologies have advanced quickly. Machines and software offered by Illumina, Life Technologies, Roche and others can generate enormous amounts of genetic data. The bottleneck at present is in assembling all that data – like a billion-part jigzaw puzzle thrown to the floor – into a meaningful, searchable DNA sequence. The methods of doing this assembly, using algorithms such as ABySS and Velvet, may require parallelizing the problem across many CPUs, and using large amounts of intermediate memory – potentially terabytes of it.
If you are a researcher trying to figure out, for example, how to increase crop yields in sub-Saharan Africa, then you might be very interested in knowing how to breed a more pest-resistant and productive variety of barley (5 billion base pairs) or wheat (over 16 billion base pairs).
And if you’re Dupont or Monsanto, you may want to actually create and patent such a grain to have a competitive advantage.
To figure out such things, you may want to perform sequence comparisons of other species that appear to have the characteristics you are interested, and find the relevant genetic variances. You won’t have a chance of doing this unless you can sequence many varieties and perform detailed analysis of what you see. This takes lots of computing time and bags of money.
And so the gemonics industry looks for faster solutions for cracking the codes of life. The solutions involve cluster and cloud computing, GPUs and FPGAs, and perhaps exotic hybrid computing platforms to come.
*A “base pair” is two complementary nucleotides in a DNA strand, connected by a hydrogen bond. There are four kinds of nucleotides that make up these base pairs: adenine, thymine, guanine and cytosine. In the human genome only a small fraction of these base pairs are actually representing genes. It seems our bodies are mostly “junk DNA“, perhaps proving that we are what we eat.
…I call programs that don’t take into consideration legacy systems and that are obscenely difficult to integrate, “pornographic” programs — you can’t always describe them exactly, but you know them when you see them. In 1984, I converted a FORTRAN program from CDC to ANSI FORTRAN to see what they were doing and it was awful. In the contemporary world, CUDA is the new pornographic programming language.
Saving the world, one $#&^! FPGA at a time…
In their literature, Algolith is promoting the concept of One Card, One Price, More Choices. This is fundamentally what reconfigurable hardware is all about, and it makes sense for both the customer and for Algolith.
Why? Because video filtering and other broadcast video applications require hardware solutions, but don’t have static requirements. What’s needed for broadcasting, say, the Superbowl with 50 or more HD cameras, chaotic real-time action and thousands of opportunities for verbal and visual naughtiness, is probably quite different than the requirements for broadcasting a heathcare town meeting with a few unruly seniors.
How does reconfigurable hardware come in? A vendor such as Algolith can design and produce a hardware based solution, such as an HD-compatible video card, that has at its core one or more FPGAs. Some of the logic in these FPGAs is fixed and rarely updated, handling those parts of video processing that don’t change. Interfacing with video and network I/O devices, for example. Other parts of the video processing, however, are reconfigurable, allowing new types of clever delay filtering and other video gymnastics to be performed by the customer. Want to fuzz out someone’s unfortunate wardrobe problems with one mouse click, and track the offending [insert noun here] even as it moves around the scene? How about hiding all those annoying non-sponsor brand logos? Hey, somebody’s got to keep the viewers safe.
I don’t know if Algolith offers such high-zoot features in its video products, but these sorts of capabilities are certainly possible to implement in FPGAs today.
From a marketing perspective, the really attractive thing about reconfigurable hardware is the ability for a company like Algolith to become more than a hardware vendor – which is not a particularly scalable business – into a vendor of IP and services that use their reconfigurable hardware product as a platform for value-added options and reconfigurable firmware upgrades. And that’s a way $&@%*%! better business to be in.
Automated trading and near-real-time financial analytics have been hot topics for some years now. Large organizations such as Bank of America deploy massive compute clusters to do such things as calculate the present value of options, or to model credit derivatives, in a virtual arms race to make trades with ever-higher levels of accuracy and ever-lower latencies. The banks and hedge funds that win this race each day have the potential to make millions or billions of dollars in extra profits. Vast amounts of power are consumed to drive the world’s most advanced supercomputers in a constant quest to produce, well, nothing at all… Just information used to move wealth from one global pants-pocket to another.
And all in the pursuit of market efficiency, of course. Hopefully all this money-shuffling is good for my meager retirement portfolio.
Editorializing aside, there has been a lot of buzz about the role of accelerators, including FPGAs, in financial applications. XtremeData this week generated some press regarding their new accelerated database for analytics. Their solution is attractive because it combines an FPGA module with an industry-standard HP Proliant server to accelerate specific algorithms (in this case SQL queries) by 15X over software-only equivalents.
As an industry, we need more turnkey solutions that highlight the benefits of FPGA acceleration. With enough such applications out there, the demand for programming and hardware platform solutions for other, possibly unrelated applications will increase.
Assuming, of course, all this financial alchemy doesn’t once again turn gold into lead.
DeepChip? What’s that?
DeepChip is the website of John Cooley and home of the perennially unofficial and irreverent ESNUG (East Coast Synopsys Users Group). DeepChip has evolved over the years – almost two decades now – into a highly popular site for discussing design methods and tools of all flavors, with a particular focus on ASIC and, to a lesser extent, FPGA hardware design.
In recent weeks and months there have been a flurry of comments on DeepChip from vendors of C-to-FPGA tools, and from users of those tools, culminating with a long-overdue acknowledgement from John that such tools really are gaining traction. See “I Sense A Tremor In The Force”.
I found the dialogue somewhat heartening; it’s good to see the conservative world of ASIC design finally starting to embrace these tools. But I also found the theme of the whole debate – that these tools are new, untested and exotic – a little amusing. And so, during the week of the Design Automation Conference, here is my open letter to the DeepChip community:
Subject: C-to-FPGA Users: Who Are These People?
Here are some perspectives on the recent C-to-hardware debates….
Are there actual users of this stuff? Absolutely. The real question is, who are these people?
C-to-hardware tools are not yet common among traditional hardware designers. Yes, there are successful ASIC tapeouts and some scattered successes in, for example, consumer video processing on FPGAs. We are hearing more of these successes every year. But for the vast majority of hardware designers, RTL methods using VHDL and Verilog are still the preferred route. Particularly in a downturn, what project leader wants to risk their career on a new design method?
The move to higher level methods of design will happen; it’s just a matter of time, and of getting a critical mass of success stories with clearly stated benefits. We’ve seen this before, by the way… VHDL and Verilog did not take over from schematics overnight, and that move was less of a leap of abstraction than the current push from RTL into ESL.
So where is the action in C-to-hardware?
It’s on the software side of the world. It’s in embedded systems for defense and aerospace, it’s in reconfigurable computing research groups. it’s in financial computing and life sciences. It’s in places that do not have significant hardware development expertise. It’s in places where Deepchip.com is not widely read, and where “EDA” and “ESL” have little or no meaning.
I can state emphatically that C-to-FPGA tools really do work. Impulse C, for example, has users worldwide who are applying their C programming skills to create hardware coprocessors for embedded systems-on-FPGA, or to move processing intensive algorithms into dedicated FPGA logic, using the newest FPGA-accelerated computing platforms.
Are these tools perfect? By no means. Any user of Impulse C would report similar frustrations – but also the productivity benefits – that we’ve seen regarding other C-to-hardware tools. All of these tools have their peculiarities, and all require a certain amount of C-language refactoring in order to achieve acceptable performance. All of these tools require “best practices” training. However, I believe all of our tools have now matured to the point where that level of refactoring can be performed by a skilled software programmer, with little or no prior knowledge of RTL.
To summarize… I believe we are nearing a point at which traditional hardware engineers will begin moving en-masse to higher-level tools, including C-to-hardware. There will finally be a payoff for the ESL vendors that have been pushing these technologies forward, and a bigger payoff in productivity for the development teams that take the leap and use ESL for complex systems. But I also believe the bigger, unreported story is that a new generation of FPGA programmers is emerging, blurring the distinction between hardware and software for embedded and high performance computing systems.
David Pellerin, CEO
Impulse Accelerated Technologies