The Hortonworks Blog

Posts categorized by : Apache Hadoop

HDP 1.3 release delivers on community-driven innovation in Hadoop with SQL-IN-Hadoop, and continued ease of enterprise integration and business continuity features.

Almost one year ago (50 weeks to be exact) we released Hortonworks Data Platform 1.0, the first 100% open source Hadoop platform into the marketplace.  The past year has been dynamic to say the least!  However, one thing has remained constant: the steady, predictable cadence of HDP releases.  In September 2012 we released 1.1, this February gave us 1.2 and today we’re delighted to release HDP 1.3.…

One of the goals with the Hortonworks Sandbox is around showcasing end-to-end use cases for Hadoop. With the most current release of Hadoop tutorials, you’ll find 2 specific use cases highlighted both around utilizing clickstream data.   There are 6 new tutorials for you to walk through – Tutorials 6 – 11.

(Update: if your version of Sandbox does not have “Enable Ambari” on the introductory page, you will need to download the latest version of the Sandbox in order to have access to these tutorials.)

Clickstream Analysis – Website User Behavior

 

Tutorials 6-10 are extensive, step-by-step lessons to walk you through the process to connect the Sandbox to Excel 2013 via the Hortonworks ODBC driver to access and analyze semi-structured data (like Omniture logs).…

We are excited to release the Hortonworks Data Platform 1.1 for Windows as a Generally Available product. In this blog post, I’m going to outline how to get started with HDP 1.1 for Windows.

With HDP for Windows, you can deploy Apache Hadoop and the HDP stack of components natively on a Windows Server cluster. The HDP for Windows download includes an MSI and remote installation scripts. With these artifacts, you can setup a multi-node Hadoop cluster in either a Workgroup or Active Directory Domain networking configuration.…

Smartphones have transformed our daily lives. A key indicator of this trend is our increased spend on data plans versus voice. We are a new generation of people who are in a constant state of activity, communication, and community building wherever we go ─ including the couch in front of the television where we can multi-screen and multi-task!

What does this mean for the Mobile Telecom industry?  For one of the top five mobile phone service providers in the world, responsible for developing and managing advanced data services for European countries with data services including mobile internet access for various devices, mobile email, instant messaging, news, weather updates and traffic reports ─ it means as mobile data services grow in revenue, so does the need to monitor that contribution easily and accurately.…

Today we announced a strategic alliance with operational intelligence leader Splunk. We are excited to be strengthening our relationship with Splunk and expanding the Apache Hadoop ecosystem and we expect this to further drive open source innovation. Additionally this alliance is further proof of Hadoop’s maturation as a key component of the next generation enterprise architecture.

One of the key benefits of the partnership is that it enables organizations to easily take advantage of the massive scale out storage and processing capabilities of Apache Hadoop with Splunk Enterprise via Splunk Hadoop Connect, which easily and reliably moves data between Splunk Enterprise and Hadoop.…

Today we are very excited to announce that Hortonworks Data Platform for Windows (HDP for Windows) is now generally available and ready to support the most demanding production workloads.

We have been blown away with the number and size of organizations who have downloaded the beta bits of this 100% open source, and native to Windows distribution of Hadoop and engaged Hortonworks and Microsoft around evolving their data architecture to respond to the challenges of enterprise big data.…

The release of Hive 0.11 is exciting and represents a big step forward to delivery of Project Stinger  and SQL-IN-Hadoop.  There is still some work to be done however.  We look forward to delivery of Hadoop 2 with YARN and the Apache Tez project as being huge increases to Hive performance, but this is not the only goal of Stinger.

SQL-In-Hadoop simply can’t be SQL without SQL compatibility

Today, HiveQL provides a fairly good set of SQL data types and semantics and while this (or a subset thereof) may be good enough for some of the “on” Hadoop solutions, we feel there needs to be more, especially if Hadoop and Hive are to meet the stringent requirements of enterprise class business analytics.…

Or as it’s more commonly being called: Week-ish in Review. Let’s recap on the latest – there’s some juicy technology goodness here.

Delivering on Stinger: Phase 1. Just this week, Hive 0.11 has been released. Owen (@owen_omalley) brought us the news that 55 – yes, fifty-five – developers from across the community have addressed 386 JIRA tickets and have delivered significant improvements to Hive along with an awesome demonstration of the power of community open-source development.…

In February, we announced the Stinger Initiative, which outlined an approach to bring interactive SQL-query into Hadoop.  Simply put, our choice was to double down on Hive to extend it so that it could address human-time use cases (i.e. queries in the 5-30 second range). So, with input and participation from the broader community we established a fairly audacious goal of 100X performance improvement and SQL compatibility.

Introducing Apache Hive 0.11 – 386 JIRA tickets closed

As representatives of this open, community led effort we are very proud to announce the first release of the new and improved Apache Hive, version 0.11. …

Retailers today are faced with addressing the new behaviors of an evolving customer base by leveraging the changing landscape and its new dynamics.  Retail consumers online are sharing, friend validating, researching, learning and developing a point of view ─ offline they are touching, brand comparing and brand associating.  Retailers now more than ever before have to think in terms of “integrated commerce” and leverage Big Data for big results in the marketplace.…

Apache Hadoop 2.0 continues to make its way through the open source community process at the Apache Software Foundation and is getting closer to being declared “ready” from a community development perspective.  Once ready, our team at Hortonworks will apply our usual enterprise rigor in providing a tested and integrated distribution that includes Hadoop 2.0 along with the other enterprise-focused services our customers and partners require.

In my roles both at Hortonworks and in the open-source Apache Hadoop community, I’m asked a lot of questions regarding the key aspects and motivations behind Hadoop 2.0.…

Microsoft has begun to treat its developer community to a number of Hadoop-y releases related to its HDInsight (Hadoop in the cloud) service, and it’s worth rounding up the material. It’s all Alpha and Preview so YMMV but looks like fun:

  • Microsoft .NET SDK for Hadoop. This kit provides .NET API access to aspects of HDInsight including HDFS, HCatalag, Oozie and Ambari, and also some Powershell scripts for cluster management. There are also libraries for MapReduce and LINQ to Hive.

We are excited that another critical Enterprise Hadoop integration requirement – NFS Gateway access to HDFS – is making progress through the main Apache Hadoop trunk.  This effort is architected and designed by Brandon Li and Suresh Srinivas, and is being delivered by the community. You can track progress in Apache JIRA HDFS-4750.

With NFS access to HDFS, you can mount the HDFS cluster as a volume on client machines and have native command line, scripts or file explorer UI to view HDFS files and load data into HDFS.  …

The following post is from Nicolas Liochon and Devaraj Das with thanks to all members of the HBase team.

HBase is an always-available service and remains available in the face of machine failures and rack failures. Machines in the cluster runs RegionServer daemons. When a RegionServer crashes or the machine goes offline, the regions it was hosting goes offline as well. The focus of the MTTR work in HBase is to be able to detect abnormalities and to be able to restore access to (failed) offlined regions as early as possible.…

And we are just about done with this week. But not quite – dig into the conversation from the past few days.

Hadoop Summit. We published the vast majority of sessions (70 so far) for the Hadoop Summit in San Jose, 26-27 June. The sessions stretch across 7 tracks from Architecture to Economics and we hope you can join us for THE Hadoop community event of the year. You can register here, and the schedule is here.…

Go to page:« First...1011121314...20...Last »

Thank you for subscribing!