The Hortonworks Blog

Posts categorized by : Big Data

In this post, we will explore how to quickly and easily spin up our own VM with Vagrant and Apache Ambari. Vagrant is very popular with developers as it lets one mirror the production environment in a VM while staying with all the IDEs and tools in the comfort of the host OS.

If you’re just looking to get started with Hadoop in a VM, then you can simply download the Hortonworks Sandbox.…

Microsoft and Hortonworks have been working together for over two years now with the goal of bringing the power of Big Data to a billion people. As a result of that work, today we announced the General Availability of HDP 2.0 for Windows with the full power of YARN.

There are already over half a billion Excel users on this planet.

So, we have put together a short tutorial on the Hortonworks Sandbox where we walk through the end-to-end data pipeline using HDP and Microsoft Excel in the shoes of a data analyst at a financial services firm where she:

  • Cleans and aggregates 10 years of raw stock tick data from NYSE
  • Enriches the data model by looking up additional attributes from Wikipedia
  • Creates an interactive visualization on the model

You can find the tutorial here.…

In God we trust, all others must bring data.
Dr. W. Edwards Deming
Dr. W. Edwards Deming was a statistician and manufacturing consultant who worked on Japanese reconstruction after WWII. His quality control methods influenced innovative Japanese manufacturing processes that simultaneously increased volume, reduced cost, and improved quality. Near the end of his career, Deming taught the same lessons to U.S. automakers.

To this day, the “Deming Prize” is one of the highest rewards for Total Quality Management in the world.…

2013 was certainly a revealing year for the Enterprise Hadoop market. We witnessed the emergence of the YARN-based architecture of Hadoop 2 and a strong ecosystem embracement that will fuel its next big wave of innovation. The analyst community accurately predicted Hadoop’s market momentum would greatly accelerate, but none predicted a pure play vendor would publicly declare its intent to pivot away from the Enterprise Hadoop market. Interesting times indeed!

Join us on Tuesday January 21st where we’ll be covering the Enterprise Hadoop State of the Union in more detail.…

This is a guest post from our partner, Revelytix who recently created a step-by-step tutorial on using Loom with the Hortonworks Sandbox. 

Enterprises are excited about the Hortonworks Data Platform (HDP) for many reasons, such as low cost, scalability, and flexibility. The latter in particular holds out new possibilities for data science. The Hadoop Distributed File System (HDFS) accepts files of any type and format, unlike traditional data warehouses which require a schema up front.…

Recently, SAP and Hortonworks announced the next step in the relationship with SAP, where SAP resells and provided enterprise support for the Hortonworks Data Platform. Since then, we’ve been working together to showcase how SAP HANA + Hortonworks Data Platform provide “Instant Insight and Infinite Scale”. The combination of HANA and the Hortonworks Data Platform is a perfect match. SAP HANA uniformly amplifies the value of Big Data across this data fabric including large data sets that are stored in Hadoop.…

A consequence of living in a globalized, connected world  is the unfortunate presence of online fraud. Fraud applies to all industries and affects businesses of all sizes. Given that we’re coming up on the holidays, and specifically with North America’s love of Black Friday and Cyber Monday, this week we partnered with Datameer on a very topical discussion  about best practices on how to fight fraud using Hortonworks Data Platform to integrate Hadoop and Datameer.…

We have heard plenty in the news lately about healthcare challenges and the difficult choices faced by hospital administrators, technology and pharmaceutical providers, researchers, and clinicians. At the same time, consumers are experiencing increased costs without a corresponding increase in health security or in the reliability of clinical outcomes.

One key obstacle in the healthcare market is data liquidity (for patients, practitioners and payers) and some are using Apache Hadoop to overcome this challenge, as part of a modern data architecture.…

We had a lot of fun in NYC and hope you did too. Thanks to the hundreds of you who dropped by the booth, attended dinners, parties, meetups and sessions.

As we have known for some time, Hortonworks customers are already building a modern data architecture with Hadoop as the technology of choice for handling the data they have streaming in from all directions. They care that it matches their needs, integrates with their existing infrastructure and solves real problems with flexibility.…

You’re a Java developer, you use Spring and you’re just itching to get your arms around some big data. Well, now you can do that even easier than before as we announced this morning that Spring is now certified for Hortonworks Data Platform.

To celebrate this development, we have a community tutorial for Sandbox (1.3 currently) that shows you how to use Spring XD to collect data streamed from Twitter, load into HDFS and then run simple sentiment analysis with Apache Hive.…

Today our partner Rackspace announced their Big Data solution for dedicated and cloud environments, powered by Hortonworks Data Platform. This collaboration between Hortonworks and Rackspace provides customers a flexible choice of deployment offerings of Apache Hadoop from one of the most trusted vendors in the cloud computing market.

Enterprise adoption of Apache Hadoop

This expanded collaboration is a strong indicator of the ecosystem rallying around Hortonworks Data Platform and our goal at Hortonworks of making Apache Hadoop a core component of the modern data architecture, whether on premise, in a VM, as an appliance, or in the cloud.…

Designed for senior IT executives, IT architects, technology planners, and business technologists, Knowledgent’s three-day facilitated Big Data Immersion workshop recently held in New York City, provided participants with an intensive deep dive answering the big data questions:

  • Why Big Data? What are the issues that brought it all about?
  • Demystifying Big Data: How can Hadoop help with big data issues?
  • Implementation: How do I operationalize big data? How is big data analytics different?

On October 16, we’ve been invited to join our partner SAP to talk Big Data and how the integrated SAP HANA + Hadoop approach can solve your big data challenges. This chat will be a live Google Hangout with:

  • Irfan Khan, SVP & GM SAP Global Big Data at SAP (@i_kHANA)
  • Ari Zilka,  CTO at Hortonworks (@ikarzali)
  • Timo Elliot, Innovation Evangelist at SAP (@timoelliott)

When: Wednesday, October 16, 8am PT / 11am ET / 5pm CET…

A lot of people ask me: how do I become a data scientist? I think the short answer is: as with any technical role, it isn’t necessarily easy or quick, but if you’re smart, committed and willing to invest in learning and experimentation, then of course you can do it.

In a previous post, I described my view on “What is a data scientist?”: it’s a hybrid role that combines the “applied scientist” with the “data engineer”. …

‘The world is being digitized’ proclaimed Geoffrey Moore in his keynote at Hadoop Summit 2012 over a year ago. His belief is that we are moving away from an analog society where we collect only casual recording of events to one that is digital, where everything is captured. It is our belief that Hadoop is one of the key technologies powering this shift to a digital society.

There is almost an expectation that we capture the pics, vids and conversations that run before us. …

Go to page:12345

Thank you for subscribing!