The Hortonworks Blog

Posts categorized by : Apache Hadoop

A few weeks back we posted a definition of “big data”.  There was definitely some internal conversation about the term and if this definition had captured what the term means.  Sum finding: it is a loaded term.  It means a lot of different things to a lot of different people.

When I first joined Hortonworks, I bought in to the three V’s (volume velocity and variety) definition of big data. …

Almost time to spend a relaxing weekend in the garden, or crushing some data in your garage-based homebrew Hadoop cluster – whichever you prefer. But before we choose our path, let’s take a look at the last two weeks of happenings (I was lost in Oregon last week).

Hadoop is the perfect app for OpenStack. While I was struggling with driving directions, Red Hat, Marantis and Hortonworks were announcing plans for Project Savanna which aims to automate the deployment of Hadoop on enterprise-class OpenStack-powered clouds. …

To deploy, configure, manage and scale Hadoop clusters in a way that optimizes performance and resource utilization there is a lot to consider. Here are  6 key things to think about as part of your planning:

  • Operating system:  Using a 64-bit operating system helps to avoid constraining the amount of memory that can be used on worker nodes. For example, 64-bit Red Hat Enterprise Linux 6.1 or greater is often preferred, due to better ecosystem support, more comprehensive functionality for components such as RAID controllers.…
  • As a preview to the April 30th webinar: Hadoop & the EDW: When to Use Which, Chad Meley, Global Director of Marketing at Teradata, interviewed the two luminary speakers, Eric Baldeschwieler (aka “eric14”) and Stephen Brobst, about the purpose of their presentation and what you can expect to take away from their shared experiences.

    Chad:  “Eric, in this webinar you’re going to talk about the strategic role of relational big data technologies, which have come under fire in some circles with the rise of Hadoop. …

    In a recent blog post I mentioned the 4 reasons for using Hadoop for data science. In this blog post I would like to dive deeper into the last of these reasons: data agility.

    In most existing data architectures, based on relational database systems, the data schema is of central importance, and needs to be designed and maintained carefully over the lifetime of the project. Furthermore, whatever data fits into the schema will be stored, and everything else typically gets ignored and lost.…

    On April 2nd, Hortonworks was excited to host the very first Apache Ambari Meetup. Thanks to all those who came along in person and virtually for a lot of vibrant discussion. If you would like to get involved in future Ambari Meetups, please visit this link. We are well on the way to making Hadoop management ‘dead simple’.

    We have embedded the sessions below with some notes:

    Overview and Demo of Ambari, Yusaku Sako, Hortonworks

    • This session covered Apache Ambari’s mission to “Make Hadoop management dead simple”, Ambari’s 4 major roles: 1) Provision, 2) Manage, 3) Monitor, and 4) Integrate, emphasized that everything that Ambari’s Web Client does is done thru Ambari’s REST API (100% REST), presented high-level architecture, and a live demo on how to provision, manage, and monitor a Hadoop cluster using the latest Ambari 1.2.2 release.

    The end of another action-packed week and just before we all head off for the weekend then let’s have a recap on the conversations from this week – we hope you’re enjoying them.

    We’re delighted by the response to our Hadoop Patterns of Use whitepaper and presentation - that really seems to have struck a chord with everyone thinking about what Hadoop can really do for their business. You can see that content just below here – an excellent read for the journey home.…

    While we are quite a far way away from hearing “Houston, tranquility base here… the eagle has landed”, the HP moonshot is definitely pushing us all toward a new class of infrastructure to run more efficient workloads, like Apache Hadoop. Hortonworks applauds the development of flexible Big Data appliances like Moonshot. We are excited about this development as it signals alignment across development, operations and infrastructure within organizations.  For quite some time, our team has been accustomed to a natural balance required across these three constituents and now the server the market is joining in on the game.…

    Over the last 10 years or so, large web companies such as Google, Yahoo!, Amazon and Facebook have successfully applied large scale machine learning algorithms over big data sets, creating innovative data products such as online advertising systems and recommendation engines.

    Apache Hadoop is quickly becoming a central store for big data in the enterprise, and thus is a natural platform with which enterprise IT can now apply data science to a variety of business problems such as product recommendation, fraud detection, and sentiment analysis.…

    Check out our new knowledgebase article on Ambari on EC2. With these instructions, you can boot an EC2 Apache Hadoop cluster in minutes using Ambari.

    Unstructured data, semi-structured data, structured data… it is all very interesting and we are in conversations about big and small versions of each of these data types every day. We love it…  we are data geeks at Hortonworks. We passionately understand that if you want to use any piece of data for some computation, there needs to be some layer of metadata and structure to interact with it.  Within Hadoop, this critical metadata service is provided by HCatalog.…

    “OK, Hadoop is pretty cool, but exactly where does it fit and how are other people using it?”  Here at Hortonworks, this has got to be the most common question we get from the community… well that and “what is the airspeed velocity of an unladen swallow?”

    We think about this (where Hadoop fits) a lot and have gathered a fair amount of expertise on the topic.  The core team at Hortonworks includes the original architects, developers and operators of Apache Hadoop and its use at Yahoo, and through this experience and working within the larger community they have been privileged to see Hadoop emerge as the technological underpinning for so many big data projects.…

    With any enterprise software implementation, the challenge is often the integration of a chosen system with existing enterprise systems architecture. One such existing investment may be an ERP (and related) systems such as those provided by SAP. In this real-world instance, SAP partnered with Hortonworks to enable integration of Apache Hadoop into SAP Real-Time Data Platforms using Hortonworks Data Platform to facilitate business intelligence and analysis of Big Data.

    The business challenges at hand will be familiar to everyone and are a great fit for a Hadoop solution.…

    Today we are excited to see another example of the power of community at work as we highlight the newly approved Apache Software Foundation incubator project named Falcon. This incubation project was initiated by the team at InMobi together with engineers from Hortonworks. Falcon is useful to anyone building apps on Hadoop as it simplifies data management through the introduction of a data lifecycle management framework.

    All About Falcon and Data Lifecycle Management

    Falcon is a data lifecycle management framework for Apache Hadoop that enables users to configure, manage and orchestrate data motion, disaster recovery, and data retention workflows in support of business continuity and data governance use cases.…

    In this post, we’ll explain the difference between Hadoop 1.0 and 2.0. After all, what is Hadoop 2.0? What is YARN?

    For starters – what is Hadoop and what is 1.0? The Apache Hadoop project is the core of an entire ecosystem of projects. It consists of four modules (see here):

    • Hadoop Common: The common utilities that support the other Hadoop modules.
    • Hadoop Distributed File System (HDFS™): A distributed file system that provides high-throughput access to application data.
    Go to page:« First...10...1415161718...Last »