Get fresh updates from Hortonworks by email

Once a month, receive latest insights, trends, analytics information and knowledge of Big Data.

cta

Get Started

cloud

Ready to Get Started?

Download sandbox

How can we help you?

closeClose button
October 19, 2017 | Shelby Khan | Dataworks Summit

7 Sessions From DataWorks Summit Sydney You Should See

October 18, 2017 | Kevin Jordan | Hortonworks Case Study

How Much Can You Trust Your Big Data?

October 16, 2017 | Matt Spillar | Hortonworks Case Study

Leveraging Data to Make Decisions in Financial Services

Viewing posts by: Wei Wang« Back to all

X
FILTERS
ALL
TECHNICAL
BUSINESS

All Topics















All Channels











CLEAR FILTERS

We’re cooking up some new tutorials for you to play with in your Hortonworks Sandbox to help you learn more about the Hortonworks Data Platform, Apache Hadoop, Hive, Pig and HCatalog, with maybe a smattering of Mahout in there as well. More about Sandbox » While you’re anxiously awaiting, we thought we’d give you some […]

More of a 2 weeks in review this time around owing to the Easter break. So what’s been happening? Falcon bringing Data Lifecycle Management for Hadoop. The big news this week was the newly approved Apache Software Foundation incubator project – Falcon. The project was initiated by the team at InMobi and engineers from Hortonworks towers with the […]

Big Data Defined

‘Big Data’ has become a hot buzzword, but a poorly defined one. Here we will define it. Wikipedia defines Big Data in terms of the problems posed by the awkwardness of legacy tools in supporting massive datasets: In information technology, big data[1][2] is a collection of data sets so large and complex that it becomes […]

And the voting is over and the results are in for the Community Choice program of the Hadoop Summit San Jose 2013. With over 300 sessions, and around 6000 users casting more than 15000 votes there was a lot of excitement to participate and influence the results – thanks to everyone for your contribution. At the end of the process, the selectees are: Application […]

We want to take a moment to thank everyone who attended the Hadoop Summit in Amsterdam – THANK YOU! With nearly 500 people registered for the event we think we can safely say is was a big success. We’ve had overwhelming support to do it again next year – so watch this space. The awesome Beurs Van Berlage venue […]

There have been many Apache Hadoop-related announcements the past few weeks, making it difficult to separate the signal from the marketing noise. One thing is crystal clear however… there is a large and growing appetite for Enterprise Hadoop because it helps unlock new insights and business opportunities in a way that was not previously technologically […]

  In Derrick Harris’ article on GigaOM entitled “EMC to Hadoop competition: See ya, wouldn’t wanna be ya.”, EMC unveiled their new Pivotal HD offering which effectively re-architects the Greenplum analytic database so it sits on top of the Hadoop Distributed File System (HDFS). Scott Yara, Greenplum cofounder, is excited about the new product. Since […]

Last week, the HBase community released 0.94.5, which is the most stable release of HBase so far. The release includes 76 jira issues resolved, with 61 bug fixes, 8 improvements, and 2 new features. Most of the bug fixes went against the REST server, replication, region assignment, secure client, flaky unit tests, 0.92 compatibility and […]

YARN is part of the next generation Hadoop cluster compute environment. It creates a generic and flexible resource management framework to administer the compute resources in a Hadoop cluster. The YARN application framework allows multiple applications to negotiate resources for themselves and perform their application specific computations on a shared cluster. Thus, resource allocation lies […]

  Last week, we outlined our approach for delivering an enterprise viable Apache Hadoop distribution in the open.  Simply put: we believe the fastest way to innovate is to do our work within the open source community, introduce enterprise feature requirements into that public domain, and to work diligently to progress existing open source projects […]

  MapReduce has served us well.  For years it has been THE processing engine for Hadoop and has been the backbone upon which a huge amount of value has been created.  While it is here to stay, new paradigms are also needed in order to enable Hadoop to serve an even greater number of usage […]

  UPDATE: Since this article was posted, the Stinger initiative has continued to drive to the goal of 100x Faster Hive. You can read the latest information at https://hortonworks.com/stinger Introduced by Facebook in 2007, Apache Hive and its HiveQL interface has become the de facto SQL interface for Hadoop.  Today, companies of all types and sizes […]

  Back in the day, in order to secure a Hadoop cluster all you needed was a firewall that restricted network access to only authorized users. This eventually evolved into a more robust security layer in Hadoop… a layer that could augment firewall access with strong authentication. Enter Kerberos.  Around 2008, Owen O’Malley and a […]

  As the Release Manager for hadoop-2.x, I’m very pleased to announce the next major milestone for the Apache Hadoop community, the release of hadoop-2.0.3-alpha! 2.0 Enhancements in this Alpha Release This release delivers significant major enhancements and stability over previous releases in hadoop-2.x series. Notably, it includes: QJM for HDFS HA for NameNode (HDFS-3077) […]

  At Hortonworks, our strategy is founded on the unwavering belief in the power of community driven open source software. In the spirit of openness, we think it’s important to share our perspectives around the broader context of how Apache Hadoop and Hortonworks came to be, what we are doing now, and why we believe […]