The Hortonworks Blog

Posts categorized by : Big Data

Today we are delighted to announce the formal partnership between Accenture and Hortonworks, which is the continuing evolution of the ongoing collaboration between the two companies which started in 2012. With this formal agreement, Accenture and Hortonworks will collaborate on making large structured and unstructured datasets – including operational, video and sensor data – more accessible to organizations for insight-driven decision-making. Together, the two companies will continue to collaborate on joint horizontal and vertical solutions to speed the adoption of Apache Hadoop.…

Apache Cassandra is an open source NoSQL distributed database management system designed to handle large amounts of data offering a scalable real time solution that allows users to create online applications that are “always-on, no matter what.” DataStax is the company behind Cassandra, and a new Technology Partner of Hortonworks.

Lynn Walitch leads Partner Management for DataStax and is our guest blogger today. Lynn discusses the importance of the partnership and certification with Hortonworks.…

Oscar Padilla, Vice President of Strategy at Luminar, is our guest blogger. He shares his thoughts and insights about Apache Hadoop, Hortonworks Data Platform, and Luminar’s journey to the Data Lake.

Luminar is the first big data analytics provider focused specifically on U.S. Latino consumers. Our company offers analysis based on empirical insights, rather than with a sample-based approach. Apache Hadoop and Hortonworks Data Platform (HDP) make this empirical approach work at scale.…

Data Analytics Virtual Event

Hortonworks and Teradata have partnered to provide a clear path to Big Data Analytics via stable and reliable Hadoop for the enterprise. We are excited to support their upcoming Big Data Analytics virtual event, “Data Discovery in Action.” We will have experts standing by to help answer questions to help ensure you have the right strategy in place for all of your big data.

At this event on July 2 nd, you will learn more about how Teradata’s Unified Big Data Architecture™ provides a quick path to data discovery.…

We recently hosted the fourth of our seven Discover HDP 2.1 webinars, entitled Apache 2.4.0, HDFS and YARN. It was very well attended and a very informative discourse. The speakers outlined the new features in YARN and HDFS in HDP 2.1 including:

  • HDFS Extended ACLs
  • HTTPs support for WebHDFS and for the Hadoop web UIs
  • HDFS Coordinated DataNode Caching
  • YARN Resource Manager High Availability
  • Application Monitoring through the YARN Timeline Server
  • Capacity Scheduler Preemption

Many thanks to our presenters, Rohit Bakhshi (Hortonworks’ senior product manager), Vinod Kumar Vavilapalli (co-author of the YARN Book, PMC, Hadoop YARN Project Lead at Apache and Hortonworks), and Justin Sears (Hortonworks’ Product Marketing Manager).…

Trifacta is a Hortonworks Technology Partner, a pioneer in data transformation, recently certified with HDP 2.1. Here, Trifacta’s CTO and Co-founder Sean Kandel, talks about their Predictive Interaction ™ solution with Hortonworks Data Platform.

“I spend more than half my time integrating, cleansing and transforming data without doing any actual analysis. Most of the the time I’m lucky if I get to do any analysis.” – Data Scientist [1]

The most commonly reported use of Hadoop today is data transformation. …

Customers want to make more rapid, data-driven decisions but historically this has been challenging in the era of Big Data. Predictive analytics, machine learning and statistical algorithms are at the leading edge of where enterprises can unlock the value hidden in their data to deliver timely insights for intelligent decisions.

Zementis is a new Hortonworks Technology Partner offering a standards-based predictive analytics scoring engine for Hortonworks Data Platform (HDP) and existing data repositories as part of the Modern Data Architecture (MDA).…

In this blog, Paul Phillips, EMEA Sales Director at Hortonworks, discusses the importance of extending big data science courses to PhD students and scientists. This joint venture with KPMG provides an opportunity to “bring excellent basic skills that are useful in data science and this programme aims to commercialize these skills and ease the path to a data science profession.”

At Hortonworks, we encourage our team members to innovate and as the Open Source community grows, it is also vital that we play our part to ensure the community is continually reinvigorated with new ideas and innovation. …

On Wednesday May 21, Himanshu Bari (Hortonworks’ senior product manager), Venkatesh Seetharam (committer to Apache Falcon), and Justin Sears ( Hortonworks’ Product Marketing Manager), hosted the third of our seven Discover HDP 2.1 webinars. Himanshu and Venkatesh discussed data governance in Hadoop through Apache Falcon that is included in HDP 2.1. As most of you know, ingesting data into Hadoop is one thing; having data governed, by dictating and defining data-pipeline policies, is another thing—a necessity in the enterprise.…

According to New York Observer, there were couple of major social reasons that spurred the genesis and growth of Meetup.com. First, it was Robert Putman’s book Bowling Alone, in which he talks about the collapse of communities in America. And the second was an event that not only changed the world but changed New York: it was the aftermath of September 11, where strangers cared about greeting, meeting, and talking.…

For years, experts in the healthcare industry have been searching for ways to detect (and possibly cure) Alzheimer’s disease, the most common form of dementia. Current estimates indicate that 35.6 million people are living with dementia, projected to jump to 135 million by 2050, according to the Global CEO Initiative on Alzheimer’s Disease. At a projected cost of over $600 billion each year, it’s a looming global health and fiscal crisis.…

Fino Consulting is a new Consulting and Systems Integration Partner of Hortonworks serving Fortune 1000 companies with winning business solutions through data science. Fino is an early mover in cloud computing, challenging clients to “Re-think what they know about cloud-computing” to build high-performance sustainable applications and stretch the boundaries of enterprise data. Fino uses HDInsight from Microsoft for client solutions because of its versatile, cloud-based data platform that manages data of any type, while leveraging all the features and functionality of Microsoft’s resources.…

As enterprises build new applications with the data they cost effectively capture and process with Apache Hadoop it is important for the platform to facilitate the app dev processes. That’s why we are excited to announce that we’ve expanded our partnership with Concurrent, Inc. to simplify and accelerate application development on Hadoop.

There are two components to this expanded partnership.

In this post, we will explore how to quickly and easily spin up our own VM with Vagrant and Apache Ambari. Vagrant is very popular with developers as it lets one mirror the production environment in a VM while staying with all the IDEs and tools in the comfort of the host OS.

If you’re just looking to get started with Hadoop in a VM, then you can simply download the Hortonworks Sandbox.…

Microsoft and Hortonworks have been working together for over two years now with the goal of bringing the power of Big Data to a billion people. As a result of that work, today we announced the General Availability of HDP 2.0 for Windows with the full power of YARN.

There are already over half a billion Excel users on this planet.

So, we have put together a short tutorial on the Hortonworks Sandbox where we walk through the end-to-end data pipeline using HDP and Microsoft Excel in the shoes of a data analyst at a financial services firm where she:

  • Cleans and aggregates 10 years of raw stock tick data from NYSE
  • Enriches the data model by looking up additional attributes from Wikipedia
  • Creates an interactive visualization on the model

You can find the tutorial here.…

Go to page:12345...Last »