The Hortonworks Blog

Posts categorized by : Data Analyst & Scientist

Last week, Apache Tez graduated to become a top level project within the Apache Software Foundation (ASF). This represents a major step forward for the project and is representative of its momentum that has been built by a broad community of developers from not only Hortonworks but Cloudera, Facebook, LinkedIn, Microsoft, NASA JPL, Twitter, and Yahoo as well.

What is Apache Tez and why is it useful?

Apache™ Tez is an extensible framework for building YARN based, high performance batch and interactive data processing applications in Hadoop that need to handle TB to PB scale datasets.…

The Apache Pig community released Pig 0.13. earlier this month. Pig uses a simple scripting language to perform complex transformations on data stored in Apache Hadoop. The Pig community has been working diligently to prepare Pig to take advantage of the DAG processing capabilities in Apache Tez. We also improved usability and performance.

This blog post summarizes the progress we’ve made.

Support for Backends Other Than MapReduce

We made the Pig 0.13 architecture more general to support multiple backends beyond just MapReduce, while maintaining backward compatibility.…

As part of our YARN Ready program, we are hosting a series of technical webinars highlighting the technologies and resources available to developers for creating YARN applications. In our first webinar, “Introduction to YARN Ready,” we presented an overview of the YARN Ready program.

To extend your technical knowledge, please join us for our first in-depth YARN Ready technology webinar, “Integrating Applications Natively to YARN” on Thursday July 24 at 9am Pacific Time.…

Incremental Updates

Hadoop and Hive are quickly evolving to outgrow previous limitations for integration and data access.
On the near-term development roadmap, we expect to see Hive supporting full CRUD operations (Insert, Select, Update, Delete). As we wait for these advancements, there is still a need to work with the current options—OVERWRITE or APPEND— for Hive table integration.

The OVERWRITE option requires moving the complete record set from source to Hadoop.…

Tresata, a Hortonworks Certified Technology Partner, is a next-generation predictive analytics software company that helps enterprises monetize big data™they have moved to Hadoop . In this blog, Tresata’s Director of Marketing, Katie Levans, (@katie_levans) describes the value of HDP 2.1 certification and the benefit of their solution. 

Last month Tresata announced the release of the third generation of their hugely successful software application TREE 3.3 and its subsequent certification on HDP 2.1.…

Today we are delighted to announce the formal partnership between Accenture and Hortonworks, which is the continuing evolution of the ongoing collaboration between the two companies which started in 2012. With this formal agreement, Accenture and Hortonworks will collaborate on making large structured and unstructured datasets – including operational, video and sensor data – more accessible to organizations for insight-driven decision-making. Together, the two companies will continue to collaborate on joint horizontal and vertical solutions to speed the adoption of Apache Hadoop.…

Apache Cassandra is an open source NoSQL distributed database management system designed to handle large amounts of data offering a scalable real time solution that allows users to create online applications that are “always-on, no matter what.” DataStax is the company behind Cassandra, and a new Technology Partner of Hortonworks.

Lynn Walitch leads Partner Management for DataStax and is our guest blogger today. Lynn discusses the importance of the partnership and certification with Hortonworks.…

The Apache Storm community recently announced the release of Apache Storm 0.9.2, which includes improvements to Storm’s user interface and an overhaul of its netty-based transport.

We thank all who have contributed to Storm – whether through direct code contributions, documentation, bug reports, or helping other users on the mailing lists. Together, we resolved 112 JIRA issues.

Here are summaries of this version’s important fixes and improvements.

New Feature Highlights
Netty Transport Overhaul

Storm’s Netty-based transport has been overhauled to significantly improve performance through better utilization of thread, CPU, and network resources, particularly in cases where message sizes are small.…

We recently hosted the sixth of our seven Discover HDP 2.1 webinars, entitled Apache Storm for Stream Data Processing in Hadoop. Over 200 people attended the webinar and joined in the conversation.

Thanks to our presenters Justin Sears (Hortonworks’ Product Marketing Manager), Himanshu Bari (Hortonworks’ Senior Product Manager for Storm), and Taylor Goetz (Hortonworks’ Software Engineer and Apache Storm Committer) who presented the webinar. The speakers covered:

  • Why use Apache Storm?

Big Data In Healthcare

Electronic data is the heartbeat in a healthcare provider’s office. ZirMed is a Hortonworks customer and a leading provider of healthcare information management solutions. Healthcare providers, including physicians, hospitals and large health systems, use the company’s cloud-based revenue cycle management offerings to manage the complex process of billing and collecting revenue from patients and payers.

ZirMed’s Analytics solution aggregates healthcare data and makes it available to its customers, so they get a clearer view of their financial and operational performance.…

We recently hosted the fifth of our seven Discover HDP 2.1 webinars, entitled Apache Solr for Hadoop Search. Over 200 people attended the webinar, prompting an informative discourse.

The speakers outlined the Apache Solr overview and features, followed by a practical demo of how to process, index, search, and visualize server log data.

Thanks to our presenters Justin Sears (Hortonworks’ Product Marketing Manager), Rohit Bakhshi (Hortonworks’ senior product manager), and Paul Codding (Hortonworks’ Solution Engineer) who presented the webinar.…

Data Analytics Virtual Event

Hortonworks and Teradata have partnered to provide a clear path to Big Data Analytics via stable and reliable Hadoop for the enterprise. We are excited to support their upcoming Big Data Analytics virtual event, “Data Discovery in Action.” We will have experts standing by to help answer questions to help ensure you have the right strategy in place for all of your big data.

At this event on July 2 nd, you will learn more about how Teradata’s Unified Big Data Architecture™ provides a quick path to data discovery.…

We’re finally catching our breath after a phenomenal Hadoop Summit event last week in San Jose.  Thank you to everyone that came to participate in the celebration of Hadoop advances and adoption—from many of the organizations that shared their Hadoop journey with us that fundamentally transformed their businesses, to those just getting started, to the huge ecosystem of vendors. It is amazing to be part of such a broad and deep community that is contributing to making the market for everyone.…

Apache YARN, Apache Slider, and Docker

Join us June 19 at 6 pm at the Hilton Fort Worth, Texas for an educational workshop hosted by Hortonworks and Sendero Business Services. The topic is “The Key To Success is Consistently Making Good Decisions & The Key To Good Decisions is Good Information.” The speaker is Don Hilborn, Solutions Engineer at Hortonworks.

Don will introduce the paradigm of

  • Efficiency – double processing in Hadoop on the same hardware while providing predictable performance and quality of service; and
  • Resource sharing – providing a stable common set of shared resources across multiple, coordinated workloads in Hadoop.

Informatica is a Hortonworks Certified Technology Partner. This partnership makes it possible for organizations to use all the data internal and external to an enterprise to achieve the full predictive power that drives the success of modern data-driven businesses. 

That is why we’re excited to have John Haddad, Senior Director, Informatica to be our guest blogger. In this blog, John explores the benefits of certification on HDP 2.1.

When I was in high school, one of my best friends had a water ski boat we often took out on California lakes (what are friends for?).…

Go to page:12345...Last »