For the past couple of years, the big data debates have largely involved discussions of Hadoop versus Spark, Spark versus Storm, and MapReduce versus, well, anything but MapReduce. Now a whole new bird has entered the ring: Heron. Heron was born and bred in the inner sanctum of Twitter, and last year completely replaced Storm for production work at Twitter. Yet it remained secluded in the deep, dark bowels of Twitter until just a few weeks ago, when they released it under an open source license. Now that it’s widely available to the masses, expect a new debate to erupt: do we stick with Storm or migrate to Heron? What are the advantages, disadvantages, and other considerations? Relax, here are your answers.
Heron was Born to Solve Some of Twitter’s Storm Problems
Twitter’s Storm deployment was one of the largest challenges of the distributed real-time computation system. Twitter had problems with scaling Storm at this level, primarily due to the static nature of the way Storm allocates resources. Most organizations would simply have migrated to Spark or Flink, but that would have meant rewriting all of Twitter’s existing code. As the single longest-running user of Storm, that would have been quite an undertaking. They reckoned it would be just as easy to develop a brand new stream processing framework that included an API that was compatible with Storm, so that they could continue using their existing code.
The Benefits of Heron
From an operational standpoint, Heron runs on top of Mesos (another Apache open source project). Heron allowed Twitter to significantly reduce the hardware resources they needed to dedicate to the topologies, and at the same time increase throughput and reduce processing latency. One of the most significant differences with Heron, however, is that though the code is written in either Java or Scala, and the web-based components of the user interface are written in Python, the code that manages the topologies and network communications are written in C++.
Another of Heron’s selling points is its stability. It’s already been running all of the processing requirements of Twitter for over a year, which are almost certainly many times greater than yours, so it’s got a proven track record for reliability. Other top tech companies and other businesses have also put Heron to the test, including numerous Fortune 500 companies. This means it’s almost certain that Heron will be around awhile, so any investments you make in migration will likely be worthwhile.
The Drawbacks of Heron
Though the increase in throughput and reduction in latency are attractive selling points for Heron, it is dependent on Mesos. So, unless you already have a Mesos infrastructure in place, that needs to be established before you can leverage the advantages of Heron. This, unfortunately, is not such an easy task. Additionally, if you are currently using Storm’s DRPC features, you won’t be able to get as much out of those when using Heron.
Heron Versus Storm: You Decide
The bottom line? Unless you have the demands of Twitter or a Fortune 500 company, it probably isn’t worth the time, money, and not unsubstantial effort it would take to abandon Storm for Heron. If you’ve got Storm, and it’s doing the job, stick with it. Storm’s developers are already working on some appealing new features to whet your appetite for improvements.
What have our other customers been able to accomplish with Storm? Find out in our customer stories.