Hadoop Ecosystem

Industry news, partner stories, buzz and happenings

Apache Spark’s momentum continues to grow and throughout 2015 we saw customers across all industries get real value from using it with the Hortonworks Data Platform (HDP). Examples include:

Insurance Optimize their claims reimbursements process by using Spark’s machine learning capabilities to process and analyze all claims. Healthcare Build a Patient Care System using Spark Core, Streaming and SQL. Retail Use Spark to analyze point-of-sale data and coupon usage. Internet Use Spark’s ML capability to identify fake profiles and enhance products matches that they show their customers.…

Is a Lake Big Enough to House Your Ocean of Data?

Contrary to popular belief, Hadoop was not the elephant-in-the-china-shop that marauded and disrupted the data center. The real culprit is data and how it has exploded in volume. The past two or three years have seen a rise in the number of successful Hadoop projects in enterprises to tackle this explosion of big data. These large volumes of data, the emergence of the Hadoop technology and the need to store all the siloed data in one place have prompted the phenomenon called the Data Lake among enterprises.…

Our guest blogger today is Rob Rosen, Senior Director Partner Solutions at Platfora, describes how to help customers achieve strategic advantage through data discovery.

While many people have heard the notion of “known unknowns” and “unknown unknowns,” it may surprise you to discover that the concept was first popularized by a NASA scientist. In a presentation given at TEDx GeorgeMasonU, Dr. Kirk Borne described how he used the concept of “known unknowns” (things that we knew might exist, but hadn’t seen evidence of) and “unknown unknowns” (things that we could discover and knew nothing about, but would truly surprise us), and how they relate to the concept of Big Data.…

Metro Transit of St. Louis (MTL) operates the public transportation system for the St. Louis metropolitan region. The organization’s mission is “Meeting the region’s transit needs by providing safe, reliable, accessible, customer-focused service in a fiscally responsible manner.”

Meeting the Challenge to Provide Safe, Reliable Public Transport

To ensure the safety of passengers and the proper use of public funds, MTL has always performed regular maintenance on its bus fleet. But lacking detailed data on how bus components were actually performing, the agency maintained vehicles retroactively.…

The Personalized Medicine Initiative (PMI), based out of the Life Sciences Institute of the University of BC, has deployed HDP and PHEMI Central Big Data Warehouse to collect, store and manage genomic and clinical data for Molecular You (MY). 

PHEMI is a Hortonworks Technology partner and in this blog, Richard Proctor, General Manager, Global Healthcare at Hortonworks interviews PHEMI’s Roy Wilds, Dir. of Product Management, along with PMI’s Chief Operating Officer and Co-founder of Molecular You, Rob Fraser, to discuss this groundbreaking work.  …

Our guest blog today is from Don Brown, COO and Founder of Rocana, Hortonworks Technology Partner, talks about our partnership, mainstream Hadoop adoption and the importance of global IT Operations management.

Our partnership with Hortonworks is another exciting step on the path to mainstream adoption of Hadoop as the critical platform for modern, global-scale IT Operations management. Hortonworks’ emphasis on a platform that scales with the demands of big data applications is a great fit for the IT Operations market and for customers looking for more reliable, extensible, analytics, and limitless solutions.…

Today Microsoft has announced the Generally Availability of Azure HDInsight, with Apache Hadoop 2.6, available on Ubuntu Linux clusters. Azure HDInsight is a Hadoop managed service in the cloud and uses the Hortonworks Data Platform (HDP).

This release is a direct result of the commitment that Microsoft has to Open Source. Microsoft has worked along with Hortonworks® in the community to contribute towards Apache Hadoop and related projects, including Apache Ambari.…

Yahoo! JAPAN needed a data platform that could scale to generate 100,000 reports per day as well as having the ability to process large amounts of data. It needed to keep the last 13 months’ worth of data, which is approximately 500 billion rows, organized and easily accessible. Relational Database Management Systems (RDBMS) cannot scale to these levels from a cost and processing power perspective. Yahoo! JAPAN explored Hadoop to achieve this and evaluated two platforms based on our requirements; Hortonworks Hive and Tez on YARN and Cloudera Impala.…

On September 22nd at 10:00 am PST, Vincent Lam, Director of Product Marketing at Protegrity, and Syed Mahmood, Sr. Product Marketing Manager at Hortonworks, will be talking about how to secure sensitive data in Hadoop Data Lakes.

Register Now

In this blog, they provide answers to some of the most frequently asked questions they have heard on the topic.

  • What’s the best approach for the security of Hadoop Data Lakes?
  • As enterprises continue to harness the power of Hadoop to store large amounts of data, security becomes an even more important part of the ecosystem.…

    Symantec helps consumers and organizations secure and manage their information-driven world by protecting digital information and online transactions.

    The Symantec Cloud Platform team turned to Hortonworks to ingest an enormous volume of security logs, analyze that security metadata and then use that insight to protect its customers. Symantec now analyzes threat data much more quickly because it optimized its data architecture using the storage and processing power of HDP—for both historical and real-time analysis.…

    Guest blogger David Hill, Business Development Director at Open Energi, explains the challenges of building a virtual power station, and why data is the fuel. Follow Open Energi on @openenergi

    Open Energi is working with businesses in the UK to harness the flexible energy demand from their equipment and aggregating it to create a virtual power station. We’re turning the whole system on its head so that instead of energy supply adjusting to meet demand, our demand for energy adjusts to meet supply – in real-time.…

    Today’s guest blogger is from Hortonworks Technology Partner, WANdisco. Peter Scott, SVP of Business Development and OEM Sales at WANdisco, talks about how to easily migrate from one Hadoop distribution to Hortonworks Data Platform (HDP).

    Migration between Hadoop versions and distributions can be difficult, often causing extended downtime and disruption, unless you use the right tools. DistCp (distributed copy) is a tool available from Apache™ Hadoop®  used for large inter/intra-cluster copying from Apache.…

    Our guest blogger today comes from our partner Talend, who has been working with us for many years to help organizations transition from data chaos to a modern data architecture. In this blog, Talend’s Ashley Stirrup, CMO, talks about a helping organizations to support a dynamic data supply chain.

    In order to remain viable in increasingly competitive markets, companies must create ever-more detailed models of the business that incorporate all data – regardless of source or volume.…

    Are you still learning about the Data Lake? Wondering how it can help your organization manage and leverage massive amounts of data? On September 8th, VHA, the largest member-owned health care company delivering supply chain management services and clinical services to its members, will share their experience and explain how they simplified data management and enabled faster data discovery with Hadoop and data virtualization.

    Register Now

    At VHA, product, supplier and member information, among other data, was siloed across multiple sources.…

    This blog is jointly submitted by Alexander Gray, Ph.D., is chief technology officer, Skytree, a Hortonworks Technology Partner, and Eric Thorsen, general manager, consumer products and retail, Hortonworks.

    As consumers increasingly reveal their shopping habits online, retailers can access social media, purchase history, consumer demand and market trends to better understand their customers, maximize spending and encourage repeat purchases. Retailers are considered early adopters of big data technology, integrating it into every imaginable business process to achieve a deeper understanding of consumers and associated buying trends.…