With Apache Hadoop YARN as its architectural center, Apache Hadoop continues to attract new engines to run within the data platform, as organizations want to efficiently store their data in a single repository and interact with it for batch, interactive and real-time streaming use cases. Apache Storm brings real-time data processing capabilities to help capture new business opportunities by powering low-latency dashboards, security alerts, and operational enhancements integrated with other applications running in the Hadoop cluster.
The community recently announced the release of Apache Storm 0.9.3. With this release, the team closed 100 JIRA tickets and delivered many new features, fixes and enhancements, including these three important improvements:
This blog gives a brief overview of these new features in Apache Storm 0.9.3 and also looks ahead to future plans for the project.
Apache Storm’s HDFS integration consists of several bolt and Trident state implementations that allow topology developers to easily write data to HDFS from any Storm topology. Many stream processing use cases involve storing data in HDFS for further batch processing and further analysis of historical trends.
Apache Storm’s HBase integration includes a number of components that allow Storm topologies to both write to and query HBase in real-time.
Many organizations use Apache HBase as part of their big data strategy for batch, interactive, and real-time workflows. Storm’s HBase integration allows users to leverage HBase data assets for streaming queries, and also use HBase as a destination for streaming computation results.
Apache Storm has supported Kafka as a streaming data source since version 0.9.2-incubating. Now Storm 0.9.3 brings a number of improvements to the Kafka integration and also adds the ability to write data to one or more Kafka clusters and topics.
The ability to both read and write to Kafka unlocks additional potential in the already powerful combination of Storm and Kafka. Storm users can now use Kafka as a source of and destination for streaming data. This allows for inter-topology communication, combining spout and bolt-based topologies with Trident-based data flows. It also enables integration with any external system that supports data ingest from Kafka.
In upcoming releases of Apache Storm, the community will be focusing on enhanced security, high availability, and deeper integration with YARN.
The Apache Storm PMC would like to thank the community of volunteers who made the many new features and fixes in this release a reality.