<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Hortonworks</title>
	<atom:link href="http://hortonworks.com/feed/" rel="self" type="application/rss+xml" />
	<link>http://hortonworks.com</link>
	<description>Architecting the future of big data</description>
	<lastBuildDate>Wed, 16 May 2012 16:06:41 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.3.1</generator>
		<item>
		<title>Big Data Refinery Fuels Next-Generation Data Architecture</title>
		<link>http://hortonworks.com/blog/big-data-refinery-fuels-next-generation-data-architecture/</link>
		<comments>http://hortonworks.com/blog/big-data-refinery-fuels-next-generation-data-architecture/#comments</comments>
		<pubDate>Wed, 16 May 2012 15:55:38 +0000</pubDate>
		<dc:creator>Shaun Connolly</dc:creator>
				<category><![CDATA[Apache Hadoop]]></category>
		<category><![CDATA[Hadoop Ecosystem]]></category>
		<category><![CDATA[Hortonworks Topics]]></category>

		<guid isPermaLink="false">http://hortonworks.com/?p=4465</guid>
		<description><![CDATA[Since joining Hortonworks at the beginning of the year, a question I’ve heard over and over again is “What is Apache Hadoop and what is it used for?” There’s clearly a lot of hype [and confusion] in this emerging Big Data market, and it feels as if each [...]]]></description>
			<content:encoded><![CDATA[<p>Since joining Hortonworks at the beginning of the year, a question I’ve heard over and over again is <strong><em>“What is Apache Hadoop and what is it used for?”</em></strong></p>
<p>There’s clearly a lot of hype [and <em>confusion</em>] in this emerging Big Data market, and it feels as if each new technology, as well as existing technologies, are pushing the meme of <em>“<a title="Apache Hadoop Hortonworks" href="http://en.wikipedia.org/wiki/All_your_base_are_belong_to_us" target="_blank">all your data are belong to us</a>”</em>. It is great to see the wave of innovation occurring across the landscape of SQL, NoSQL, NewSQL, EDW, MPP DBMS, Data Marts, and Apache Hadoop (to name just a few), but enterprises and the market in general can use a healthy dose of clarity on just how to use and interconnect these various technologies in ways that benefit the business.</p>
<p>In my post entitled <a title="Apache Hadoop Hortonworks" href="http://hortonworks.com/blog/7-key-drivers-for-the-big-data-market/">7 Key Drivers for the Big Data Market</a>, I asserted that the Big Data movement is not only about the classic world of <em>transactions</em>, but it factors in the new(er) worlds of <em>interactions</em> and <em>observations</em>. This new world brings with it a wide range of multi-structured data sources that are forcing a new way of looking at things.</p>
<p><span id="more-4465"></span>In order to make sense of this emerging space, I’ve created two graphics designed to walk through a vision of a next-generation data architecture. At the highest level, I describe three broad areas of data processing and outline how these areas interconnect.</p>
<p>The three areas are:</p>
<ol>
<li>Business Transactions &amp; Interactions</li>
<li>Business Intelligence &amp; Analytics</li>
<li>Big Data Refinery</li>
</ol>
<p>The graphic below illustrates a vision for how these three types of systems can interconnect in ways aimed at deriving maximum value from all forms of data.</p>
<p style="text-align: center;"><a href="http://hortonworks.com/wp-content/uploads/2012/05/bigdatarefinery.png"><img class="wp-image-4468 aligncenter" title="bigdatarefinery" src="http://hortonworks.com/wp-content/uploads/2012/05/bigdatarefinery.png" alt="Apache Hadoop: Big Data Refinery" width="506" height="360" /></a></p>
<p>Enterprise IT has been connecting systems via classic <a title="Apache Hadoop Hortonworks" href="http://en.wikipedia.org/wiki/Extract,_transform,_load" target="_blank">ETL processing</a>, as illustrated in <strong>Step 1</strong> above, for many years in order to deliver structured and repeatable analysis. In this step, the business determines the questions to ask and IT collects and structures the data needed to answer those questions.</p>
<p>The “Big Data Refinery”, as highlighted in <strong>Step 2</strong>, is a new system capable of storing, aggregating, and transforming a wide range of multi-structured raw data sources into usable formats that help fuel new insights for the business. The Big Data Refinery provides a cost-effective platform for unlocking the potential value within data and discovering the business questions worth answering with this data. A popular example of big data refining is processing Web logs, clickstreams, social interactions, social feeds, and other user generated data sources into more accurate assessments of customer churn or more effective creation of personalized offers.</p>
<p>More interestingly, there are businesses deriving value from processing large video, audio, and image files. Retail stores, for example, are leveraging in-store video feeds to help them better understand how customers navigate the aisles as they find and purchase products. Retailers that provide optimized shopping paths and intelligent product placement within their stores are able to drive more revenue for the business. In this case, while the video files may be big in size, the refined output of the analysis is typically small in size but potentially big in value.</p>
<p>The Big Data Refinery platform provides fertile ground for new types of tools and data processing workloads to emerge in support of rich multi-level data refinement solutions.</p>
<p>With that as backdrop, <strong>Step 3</strong> takes the model further by showing how the Big Data Refinery interacts with the systems powering Business Transactions &amp; Interactions and Business Intelligence &amp; Analytics. Interacting in this way opens up the ability for businesses to get a richer and more informed 360 ̊ view of customers, for example.</p>
<p>By directly integrating the Big Data Refinery with existing Business Intelligence &amp; Analytics solutions that contain much of the transactional information for the business, companies can enhance their ability to <em>more accurately understand the customer behaviors that lead to the transactions</em>.</p>
<p>Moreover, systems focused on Business Transactions &amp; Interactions can also benefit from connecting with the Big Data Refinery. Complex analytics and calculations of key parameters can be performed in the refinery and flow downstream to fuel runtime models powering business applications with the goal of more accurately targeting customers with the best and most relevant offers, for example.</p>
<p>Since the Big Data Refinery is great at retaining large volumes of data for long periods of time, the model is completed with the feedback loops illustrated in <strong>Steps 4 and 5</strong>. Retaining the past 10 years of historical “Black Friday” retail data, for example, can benefit the business, especially if it’s blended with other data sources such as 10 years of weather data accessed from a third party data provider. The point here is that the opportunities for creating value from multi-structured data sources available inside and outside the enterprise are virtually endless if you have a platform that can do it cost effectively and at scale.</p>
<p>Let me conclude by describing how the various data processing technologies fit within this next-generation data architecture.</p>
<p style="text-align: center;"><a href="http://hortonworks.com/wp-content/uploads/2012/05/nextgendataarchitecture.png"><img class="wp-image-4469 aligncenter" title="nextgendataarchitecture" src="http://hortonworks.com/wp-content/uploads/2012/05/nextgendataarchitecture.png" alt="Next Generation Enterprise Data Architecture - Hortonworks" width="474" height="360" /></a></p>
<p>In the graphic above, Apache Hadoop acts as the Big Data Refinery. It’s great at storing, aggregating, and transforming multi-structured data into more useful and valuable formats.</p>
<p>Apache Hive is a Hadoop-related component that fits within the Business Intelligence &amp; Analytics category since it is commonly used for querying and analyzing data within Hadoop in a SQL-like manner. Apache Hadoop can also be integrated with other EDW, MPP, and NewSQL components such as Teradata, Aster Data, HP Vertica, IBM Netezza, EMC Greenplum, SAP Hana, Microsoft SQL Server PDW and many others.</p>
<p>Apache HBase is a Hadoop-related NoSQL Key/Value store that is commonly used for building highly responsive next-generation applications. Apache Hadoop can also be integrated with other SQL, NoSQL, and NewSQL technologies such as Oracle, MySQL, PostgreSQL, Microsoft SQL Server, IBM DB2, MongoDB, DynamoDB, MarkLogic, Riak, Redis, Neo4J, Terracotta, GemFire, SQLFire, VoltDB and many others.</p>
<h3>Key Takeaway</h3>
<p>A next-generation data architecture is emerging that connects the classic systems powering Business Transactions &amp; Interactions and Business Intelligence &amp; Analytics with Apache Hadoop, a “Big Data Refinery” capable of storing, aggregating, and transforming multi-structured raw data sources into usable formats that help fuel new insights for the business.</p>
<p>Enterprises that can maximize the value from all of their data (i.e. transactions, interactions, and observations) will put themselves in a position to drive more business, enhance productivity, or discover new and lucrative business opportunities.</p>
<p>~ Shaun Connolly</p>
]]></content:encoded>
			<wfw:commentRss>http://hortonworks.com/blog/big-data-refinery-fuels-next-generation-data-architecture/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>7 Key Drivers for the Big Data Market</title>
		<link>http://hortonworks.com/blog/7-key-drivers-for-the-big-data-market/</link>
		<comments>http://hortonworks.com/blog/7-key-drivers-for-the-big-data-market/#comments</comments>
		<pubDate>Mon, 14 May 2012 17:30:06 +0000</pubDate>
		<dc:creator>Shaun Connolly</dc:creator>
				<category><![CDATA[Apache Hadoop]]></category>
		<category><![CDATA[Hortonworks Topics]]></category>
		<category><![CDATA[Industry Happenings]]></category>

		<guid isPermaLink="false">http://hortonworks.com/?p=4421</guid>
		<description><![CDATA[I attended the Goldman Sachs Cloud Conference and participated on a panel focused on “Data: The New Competitive Advantage”. The panel covered a wide range of questions, but kicked off covering two basic questions: “What is Big Data?” and “What are the drivers behind the Big Data market?” [...]]]></description>
			<content:encoded><![CDATA[<p>I attended the <a title="Goldman Sachs Cloud Conference Hortonworks" href="http://hortonworks.com/about-us/news/hortonworks-vice-president-of-corporate-strategy-to-present-at-goldman-sachs-cloud-computing-conference" target="_blank">Goldman Sachs Cloud Conference</a> and participated on a panel focused on “Data: The New Competitive Advantage”. The panel covered a wide range of questions, but kicked off covering two basic questions:</p>
<p><em>“What is Big Data?”</em> and <em>“What are the drivers behind the Big Data market?”</em></p>
<p>While most definitions of Big Data focus on the new forms of unstructured data flowing through businesses with new levels of “volume, velocity, variety, and complexity”, I tend to answer the question using a simple equation:</p>
<p><strong><em>Big Data = Transactions + Interactions + Observations</em></strong></p>
<p>The following graphic illustrates what I mean:</p>
<p><a href="http://hortonworks.com/wp-content/uploads/2012/05/bigdata_diagram.png"><span id="more-4421"></span><img class="alignleft  wp-image-4423" title="bigdata_diagram" src="http://hortonworks.com/wp-content/uploads/2012/05/bigdata_diagram.png" alt="Big Data Diagram" width="647" height="458" /></a></p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>&nbsp;</p>
<p>ERP, SCM, CRM, and transactional Web applications are classic examples of systems processing Transactions. Highly structured data in these systems is typically stored in SQL databases.</p>
<p>Interactions are about how people and things interact with each other or with your business. Web Logs, User Click Streams, Social Interactions &amp; Feeds, and User-Generated Content are classic places to find Interaction data.</p>
<p>Observational data tends to come from the “<a title="Internet of Things Hortonworks" href="http://en.wikipedia.org/wiki/Internet_of_Things" target="_blank">Internet of Things</a>”. Sensors for heat, motion, pressure and RFID and GPS chips within such things as mobile devices, ATM machines, and even aircraft engines provide just some examples of “things” that output Observation data.</p>
<p>With that basic definition of Big Data as background, let’s answer the question:</p>
<h3>What are the 7 Key Drivers Behind the Big Data Market?</h3>
<p><em>Business</em></p>
<ol>
<li>Opportunity to enable innovative new business models</li>
<li>Potential for new insights that drive competitive advantage</li>
</ol>
<p><em>Technical</em></p>
<ol>
<li>Data collected and stored continues to grow exponentially</li>
<li>Data is increasingly everywhere and in many formats</li>
<li>Traditional solutions are failing under new requirements</li>
</ol>
<p><em>Financial</em></p>
<ol>
<li>Cost of data systems, as a percentage of IT spend, continues to grow</li>
<li>Cost advantages of commodity hardware &amp; open source software</li>
</ol>
<p>There’s a new generation of data management technologies, such as Apache Hadoop, that are providing an innovative and cost effective foundation for the emerging landscape of Big Data processing and analytics solutions. Needless to say, I’m excited to see how this market will mature and grow over the coming years.</p>
<h3>Key Takeaway</h3>
<p>Being able to dovetail the classic world of Transactions with the new(er) worlds of Interactions and Observations in ways that drives more business, enhances productivity, or discovers new and lucrative business opportunities is why Big Data is important.</p>
<p>One promise of Big Data is that companies who get good at collecting, aggregating, refining, analyzing, and maximizing the value derived from Transactions, Interactions, and Observations will put themselves in a position to answer such questions as:</p>
<p><strong><em>What are the behaviors that lead to the transaction?</em></strong></p>
<p>And even more interestingly:</p>
<p><strong><em>How can I better encourage those behaviors and grow my business?</em></strong></p>
<p>So ask yourself, what’s your Big Data strategy?</p>
<p>~ Shaun Connolly</p>
]]></content:encoded>
			<wfw:commentRss>http://hortonworks.com/blog/7-key-drivers-for-the-big-data-market/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Executive Video Series: Introduction to HCatalog</title>
		<link>http://hortonworks.com/blog/executive-video-series-introduction-to-hcatalog/</link>
		<comments>http://hortonworks.com/blog/executive-video-series-introduction-to-hcatalog/#comments</comments>
		<pubDate>Wed, 09 May 2012 20:08:37 +0000</pubDate>
		<dc:creator>John Kreisa </dc:creator>
				<category><![CDATA[Apache Hadoop]]></category>
		<category><![CDATA[HCatalog]]></category>
		<category><![CDATA[Hortonworks Topics]]></category>
		<category><![CDATA[Pig]]></category>

		<guid isPermaLink="false">http://hortonworks.com/?p=4394</guid>
		<description><![CDATA[We just added a video to the Hortonworks Executive Video library that features Alan Gates, Hortonworks co-founder and Apache PMC member. In this video, Alan discusses HCatalog, one of the most compelling projects in the Apache Hadoop ecosystem. HCatalog is a metadata and table management system that provides [...]]]></description>
			<content:encoded><![CDATA[<p>We just added a video to the Hortonworks Executive Video library that features Alan Gates, Hortonworks co-founder and Apache PMC member. In this video, Alan discusses HCatalog, one of the most compelling projects in the Apache Hadoop ecosystem.</p>
<p>HCatalog is a metadata and table management system that provides a consistent data model and schema for users of tools such as MapReduce, Hive and Pig. When you consider that there are often users accessing Hadoop clusters using different tools that independently don&#8217;t agree on schema, data types, how and where data is stored, etc., then you can understand the value of having a tool such as HCatalog.</p>
<p>In this video, Alan does a good job of not only explaining the role of HCatalog, but also laying out the future direction of the project. He talks about improving the integration with HBase, improving information lifecycle management and expanding the HCatalog data model to address the challenges of unstructured data.</p>
<div class="video_cell">
<p><a title="Interview with Alan Gates, Hortonworks co-founder, on HCatalog. HCatalog is a metadata and table management system for Apache Hadoop. Includes an overview of HCatalog and a look into new features planned." href="http://player.vimeo.com/video/41808314?portrait=0&amp;color=81e62e&amp;autoplay=1" rel="shadowbox;width=640;height=360"><span id="more-4394"></span><img class=" wp-image-3348" title="Overview and Future of HCatalog" src="http://hortonworks.com/wp-content/uploads/2012/02/alan-gates.jpg" alt="" width="160" height="121" /></a></p>
</div>
<p>If you would like to learn more about HCatalog or any of the Apache Hadoop projects, I strongly suggest that you attend <a title="Apache Hadoop Summit Hortonworks" href="http://hadoopsummit.org" target="_blank">Hadoop Summit</a> next month. There will be a number of compelling sessions, including a <a title="Apache Hadoop Summit Hortonworks" href="http://hadoopsummit.org/program/#session57" target="_blank">presentation on HCatalog</a> hosted by Alan Gates himself.</p>
<p>~ John Kreisa</p>
]]></content:encoded>
			<wfw:commentRss>http://hortonworks.com/blog/executive-video-series-introduction-to-hcatalog/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Record Support for Hadoop Summit</title>
		<link>http://hortonworks.com/blog/record-support-for-hadoop-summit/</link>
		<comments>http://hortonworks.com/blog/record-support-for-hadoop-summit/#comments</comments>
		<pubDate>Tue, 08 May 2012 20:19:02 +0000</pubDate>
		<dc:creator>John Kreisa </dc:creator>
				<category><![CDATA[Hadoop Ecosystem]]></category>
		<category><![CDATA[Industry Happenings]]></category>

		<guid isPermaLink="false">http://hortonworks.com/?p=4364</guid>
		<description><![CDATA[In case you didn&#8217;t see the news today, Hadoop Summit announced record ecosystem support for this year&#8217;s conference. The original and world&#8217;s largest Apache Hadoop conference, now in its fifth year, is being sponsored this year by more than 40 traditional and open source software and services companies. [...]]]></description>
			<content:encoded><![CDATA[<p>In case you didn&#8217;t see the news today, Hadoop Summit <a title="Apache Hadoop Summit Hortonworks" href="http://hortonworks.com/about-us/news/hadoop-summit-2012-announces-record-ecosystem-support">announced record ecosystem support</a> for this year&#8217;s conference. The original and world&#8217;s largest Apache Hadoop conference, now in its fifth year, is being sponsored this year by more than 40 traditional and open source software and services companies.</p>
<p>Hortonworks and our co-host Yahoo! would like to thank the following companies for helping to make Hadoop Summit possible:</p>
<p><strong><em><span id="more-4364"></span>Platinum Sponsors</em></strong></p>
<ul>
<li>Cisco Systems</li>
<li>Datameer</li>
<li>IBM</li>
<li>Karmasphere</li>
<li>MarkLogic</li>
<li>Microsoft</li>
<li>Savvis</li>
<li>Splunk</li>
<li>StackIQ</li>
<li>Teradata Aster</li>
<li>Vertica, An HP Company</li>
<li>VMware</li>
</ul>
<p><strong><em>Gold Sponsors</em></strong></p>
<ul>
<li>Cloudera</li>
<li>Dropbox</li>
<li>Intel</li>
<li>Lucid Imagination</li>
<li>MapR Technologies</li>
<li>Pentaho</li>
<li>Syncsort</li>
</ul>
<p><strong><em>Silver Sponsors</em></strong></p>
<ul>
<li>Amazon Web Services</li>
<li>Cloudwick Technologies</li>
<li>Dataguise</li>
<li>DataStax</li>
<li>Dell</li>
<li>Drawn to Scale</li>
<li>Facebook</li>
<li>Hadapt</li>
<li>HStreaming</li>
<li>Jive Software</li>
<li>Lilien</li>
<li>Mellanox Technologies</li>
<li>NetApp</li>
<li>Pervasive Big Data</li>
<li>Quest Software</li>
<li>Qubole</li>
<li>SoftNet Solutions</li>
<li>Super Micro Computer</li>
<li>Tableau Software</li>
<li>Talend</li>
<li>Think Big Analytics</li>
<li>Zettaset</li>
</ul>
<p>Hadoop Summit would also like to thank its 2012 media sponsors: <a href="http://www.datanami.com/">Datanami</a>, <a href="http://dbta.com/">DBTA</a>, <a href="http://www.nosqlweekly.com/">NoSQL Weekly</a> and <a href="http://siliconangle.com/">SiliconANGLE</a>.</p>
<p>Registration for Hadoop Summit 2012 remains open however the conference is filling up fast. Don’t miss the opportunity to attend by <a title="Apache Hadoop Summit Hortonworks" href="http://hadoopsummit.org/register/" target="_blank">registering today</a>. For more information on the conference, please visit the Hadoop Summit <a title="Apache Hadoop Summit Hortonworks Program" href="http://hadoopsummit.org/program/" target="_blank">session guide</a> and <a title="Apache Hadoop Summit Hortonworks" href="http://hadoopsummit.org/schedule/" target="_blank">schedule</a>.</p>
<p>~ John Kreisa</p>
]]></content:encoded>
			<wfw:commentRss>http://hortonworks.com/blog/record-support-for-hadoop-summit/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Executive Video Series: Apache Hadoop and Next Generation MapReduce</title>
		<link>http://hortonworks.com/blog/executive-video-series-apache-hadoop-and-next-generation-mapreduce/</link>
		<comments>http://hortonworks.com/blog/executive-video-series-apache-hadoop-and-next-generation-mapreduce/#comments</comments>
		<pubDate>Tue, 01 May 2012 15:29:44 +0000</pubDate>
		<dc:creator>John Kreisa </dc:creator>
				<category><![CDATA[Apache Hadoop]]></category>
		<category><![CDATA[Hortonworks Topics]]></category>
		<category><![CDATA[MapReduce]]></category>

		<guid isPermaLink="false">http://hortonworks.com/?p=4299</guid>
		<description><![CDATA[The third installment of the Hortonworks executive video series features Arun C. Murthy, co-founder of Hortonworks and VP of Apache Hadoop for the Apache Software Foundation. In this video, Arun shares his view of the power of Apache Hadoop and provides some insight into the future direction of [...]]]></description>
			<content:encoded><![CDATA[<p>The third installment of the Hortonworks executive video series features Arun C. Murthy, co-founder of Hortonworks and VP of Apache Hadoop for the Apache Software Foundation. In this video, Arun shares his view of the power of Apache Hadoop and provides some insight into the future direction of MapReduce, including the ability to support alternate programming paradigms.</p>
<p><a title="Interview with Arun C. Murhy, co-founder of Hortonworks and VP of Apache Hadoop for the Apache Software Foundation." href="http://player.vimeo.com/video/40962547?portrait=0&amp;color=81e62e&amp;autoplay=1" rel="shadowbox;width=640;height=360"><img class=" wp-image-3348" title="Apache Hadoop &amp; NextGen MapReduce" src="http://hortonworks.com/wp-content/uploads/2012/02/arun.jpg" alt="" width="160" height="121" /></a></p>
<p><span id="more-4299"></span>If you&#8217;re not already doing so, I strongly suggest that you follow both Arun (<a title="Arun Murthy Apache Hadoop Hortonworks" href="https://twitter.com/#!/acmurthy" target="_blank">@acmurthy</a>) and Hortonworks (<a title="Hortonworks Apache Hadoop" href="https://twitter.com/#!/hortonworks" target="_blank">@hortonworks</a>) on Twitter. You might also want to check out various presentations given by Arun and many of the engineers and executives at Hortonworks now available on <a title="Hortonworks on Slideshare.net" href="http://www.slideshare.net/hortonworks" target="_blank">Slideshare.net</a>.</p>
<p>If you&#8217;re interested in Apache Hadoop training, please note every <a title="Apache Hadoop Training from Hortonworks" href="http://hortonworks.com/training/developing-solutions-for-apache-hadoop/">Developing Solutions using Apache Hadoop</a> class between now and the end of June will include a lunch and learn session with Arun. Of course, you can also attend <a title="Apache Hadoop Summit Hortonworks" href="http://hadoopsummit.org" target="_blank">Hadoop Summit</a> in June to meet the Hortonworks technical and executive team in person.</p>
<p>~ John Kreisa</p>
]]></content:encoded>
			<wfw:commentRss>http://hortonworks.com/blog/executive-video-series-apache-hadoop-and-next-generation-mapreduce/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Hadoop Observations from the U.K.</title>
		<link>http://hortonworks.com/blog/hadoop-observations-from-the-u-k/</link>
		<comments>http://hortonworks.com/blog/hadoop-observations-from-the-u-k/#comments</comments>
		<pubDate>Mon, 30 Apr 2012 19:00:08 +0000</pubDate>
		<dc:creator>Steve Loughran</dc:creator>
				<category><![CDATA[Hadoop Ecosystem]]></category>
		<category><![CDATA[Industry Happenings]]></category>

		<guid isPermaLink="false">http://hortonworks.com/?p=4279</guid>
		<description><![CDATA[As part of Big Data Week, Dan Harvey of the London Hadoop User Group organised an afternoon session for the usergroup, which we were glad to sponsor, along with Canonical and Facegroup. I had the pleasure of presenting my view of the current and future status of Apache [...]]]></description>
			<content:encoded><![CDATA[<p>As part of Big Data Week, Dan Harvey of the London Hadoop User Group organised an afternoon session for the usergroup, which we were glad to sponsor, along with Canonical and Facegroup. I had the pleasure of <a title="Apache Hadoop Hortonworks" href="http://www.slideshare.net/steve_l/2012-04hadoop-london-hug-v4" target="_blank">presenting my view</a> of the current and future status of Apache Hadoop to an audience that ranged from those curious about Hadoop to heavy users.</p>
<p>Every talk of the day was excellent, from the use cases by Datasift, Mendeley and MusicMetric, to the talk by Francine Bennett of MastodonC on the CO2 footprint of different cloud computing infrastructures, including a <a href="http://www.mastodonc.com/dashboard" target="_blank">live dashboard</a> on the current CO2/hour of many cloud infrastructure sites.</p>
<p>In my discussions with attendees, I was impressed how broadly Hadoop is starting to be adopted in the U.K. There is adoption from &#8220;pure data&#8221; companies like Mendeley, DataSift, MusicMatch, Last.fm, as well as media companies and financial organisations. London is a centre of finance and data and as such, from a Hadoop perspective, it is a source of data waiting to be stored and mined.</p>
<p><span id="more-4279"></span>The combination of enterprises, web companies and the infrastructure developers such as Canonical and -as of this week- Hortonworks, means that London and the South of England has a good opportunity to grow it&#8217;s big data community centred around Hadoop.</p>
<p>Steve Loughran<br />
Hortonworks<br />
Bristol, England</p>
]]></content:encoded>
			<wfw:commentRss>http://hortonworks.com/blog/hadoop-observations-from-the-u-k/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>New Features in Apache Pig 0.10</title>
		<link>http://hortonworks.com/blog/new-features-in-apache-pig-0-10/</link>
		<comments>http://hortonworks.com/blog/new-features-in-apache-pig-0-10/#comments</comments>
		<pubDate>Fri, 27 Apr 2012 19:02:43 +0000</pubDate>
		<dc:creator>Daniel Dai</dc:creator>
				<category><![CDATA[Apache Hadoop]]></category>
		<category><![CDATA[Pig]]></category>

		<guid isPermaLink="false">http://hortonworks.com/?p=4192</guid>
		<description><![CDATA[Another important milestone for Apache Pig was reached this week with the release of Pig 0.10. The purpose of this blog is to summarize the new features in Pig 0.10. Boolean Data Type Pig 0.10 introduces boolean data type as a first-class Pig data type. Users can use the [...]]]></description>
			<content:encoded><![CDATA[<p>Another important milestone for Apache Pig was reached this week with the <a title="Apache Pig Hortonworks" href="http://pig.apache.org/releases.html  " target="_blank">release</a> of Pig 0.10. The purpose of this blog is to summarize the new features in Pig 0.10.</p>
<h4>Boolean Data Type</h4>
<p>Pig 0.10 introduces boolean data type as a first-class Pig data type. Users can use the keyword &#8220;boolean&#8221; anywhere where a data type is expected, such as load-as clause, type cast clause, etc.</p>
<p>Here are some sample use cases:</p>
<p>a = load &#8216;input&#8217; as (a0:boolean, a1:tuple(a10:boolean, a11:int), a2);</p>
<p>b = foreach a generate a0, a1, (boolean)a2;</p>
<p>c = group b by a2; &#8212; group by a boolean field</p>
<p>When loading boolean data using PigStorage, Pig expects the text &#8220;true&#8221; (ignore case) for a true value, and &#8220;false&#8221; (ignore case) for a false value; while other values map to null. When storing boolean data using PigStorage, true value will emit text &#8220;true&#8221; and false value will emit text &#8220;false&#8221;.<br />
<span id="more-4192"></span></p>
<h4>Nested Cross/Foreach</h4>
<p>You can use nested cross and nested foreach statements inside foreach nested plan in Pig 0.10. Here is one example:</p>
<pre>C = cogroup user by uid, session by uid;
D = foreach C {
    crossed = cross user, session;
    filtered = filter crossed by user::region == session::region;
    result = foreach filtered generate processSession(user::age, user::gender, session::ip); -- processSession is a UDF
    generate result;
}</pre>
<p>Note the maximum level of nested plan is 2, that is, the nested foreach statement cannot have a nest plan.</p>
<p>For more information, please refer to the <a title="Apache Pig Foreach Hortonworks" href="http://pig.apache.org/docs/r0.10.0/basic.html#foreach" target="_blank">Foreach section of the Pig documentation</a>.</p>
<h4>JRuby UDF</h4>
<p>In addition to Python/JavaScript, in Pig 0.10, you can now use JRuby UDFs as well.</p>
<p>To write a JRuby UDF, you need to create a new JRuby class and extend PigUdf, and add your UDFs as methods of the new class. Here is one example:</p>
<pre>require 'pigudf'
class Myudfs &lt; PigUdf
    def concat *input
        input.inject(:+)
    end
end</pre>
<p>There are two ways to define output schema for the UDF: annotation or schema function. Either is fine for defining the output schema in the previous sample:</p>
<pre>class Myudfs &lt; PigUdf
    outputSchema "word:chararray"
    def concat *input
        input.inject(:+)
    end
end</pre>
<p>or,</p>
<pre>schema function:
class Myudfs &lt; PigUdf
    outputSchemaFunction :concatSchema
    def concat *input
        input.inject(:+)
    end
    def squareSchema input
        input
    end
end</pre>
<p>You can also write algebraic and accumulative UDFs in JRuby, which is not yet the case for other scripting languages. For more information, please refer to the <a title="Apache Hadoop Pig Hortonworks" href="http://pig.apache.org/docs/r0.10.0/udf.html#jruby-udfs" target="_blank">Pig documentation for Writing Ruby UDFs</a>.</p>
<h4>Hadoop 0.23 (a.k.a. Hadoop 2.0) Support</h4>
<p>Pig 0.10.0 supports Hadoop 0.23.X. All unit and end-to-end tests passed with hadoop-0.23. To run Pig with hadoop-0.23, you need to recompile Pig with hadoopversion flag set to 23:</p>
<pre>ant -Dhadoopversion=23</pre>
<p>You also need to set up all of the environment variables necessary to run the hadoop -23 client, plus, point HADOOP_HOME to HADOOP_COMMON_HOME, and make sure $HADOOP_HOME/bin/hadoop exists.</p>
<h4>Performance Improvements</h4>
<p><strong>Map Aggregation</strong></p>
<p>Map aggregation will aggregate records before it sends them to combiner. It reduces the serializing/deserializing costs of using combiner by sending fewer records to the combiner. It is especially useful in a group-by statement with very few group keys. In our experiments, map aggregation reduces the runtime for a map task for a group by clause by up to 50%.</p>
<p>Map aggregation is turned off by default. To turn it on, set &#8220;pig.exec.mapPartAgg&#8221; property to true.</p>
<p>For more information about map aggregation, read the <a title="Apache Pig Hortonworks" href="https://issues.apache.org/jira/browse/PIG-2228" target="_blank">PIG-2228</a> JIRA.</p>
<p><strong>Push Limit into Loader</strong></p>
<p>Pig optimizes limit query by pushing limit automatically to the loader, thus requiring only a fraction of the entire input to be scanned.</p>
<h4>Language Enhancements</h4>
<p><strong>Re-aliasing</strong></p>
<p>In the Pig script, you can rename an alias, and refer to the new name:</p>
<pre>A = LOAD 'student' USING PigStorage() AS (name:chararray, age:int, gpa:float);

B = A;

DUMP B;</pre>
<p><strong>Limit/Sample by Expression</strong></p>
<p>The Limit/Sample statement takes expression in addition to constant. For example:</p>
<pre>a = load 'a.txt';

b = group a by all;

c = foreach b generate COUNT(*) as sum;

d = order a by $0;

e = limit d c.sum/100;</pre>
<p><strong>Default Split Destination</strong></p>
<p>You can specify an &#8220;otherwise&#8221; destination for split statement. Split will automatically identify inputs that don&#8217;t belong to any of the other branches and direct those inputs to the &#8220;otherwise&#8221; destination:</p>
<p>split a into b if id &gt; 3, c if id &lt; 5, d otherwise;</p>
<p><strong>TOMAP/TOTUPLE/TOBAG Syntax Support</strong></p>
<p>You can compose a map/tuple/bag within a Pig script:</p>
<pre>B = foreach A generate (name, age);  -- generate tuple

B = foreach A generate [name, age];  -- generate map

B = foreach A generate {name, age};  -- generate bag of single item tuples</pre>
<p><strong>Globbing in Register</strong></p>
<p>Pig now supports globbing in register statements:</p>
<pre>register lib/*.jar</pre>
<h4>UDF Enhancements</h4>
<p><strong>Improvements to PigStorage</strong></p>
<p><strong></strong>We added a couple of options to PigStorage:</p>
<p>*-schema</p>
<p>This is for storing a .pig_schema along a data file when when using PigStorage. When loading data from PigStorage, Pig will check the existence of .pig_schema and use it automatically:</p>
<pre>store a into 'output_dir' using PigStorage('\t', '-schema');</pre>
<p>* -tagsource</p>
<p>PigStorage now adds a new column INPUT_FILE_NAME, which indicates the input file name of that input.</p>
<pre>a = load 'input_dir' using PigStorage('\t', '-tagsource');</pre>
<p>The first column of the output will be INPUT_FILE_NAME</p>
<p><strong>Turn off the Write Ahead Log for HBaseStorage</strong></p>
<p>You can now use the &#8220;-noWAL&#8221; option in HBaseStorage to turn off write ahead log while doing bulk loads into HBase:</p>
<pre>STORE myalias INTO 'MyTable' USING org.apache.pig.backend.hadoop.hbase.HBaseStorage('mycolumnfamily:field1 mycolumnfamily:field2','-noWAL');</pre>
<p><strong>JsonLoader/JsonStorage</strong></p>
<p>We added new pair of UDFs to the load/store Json format. Note JsonLoader does not auto detect the schema of your input data. You will still need to tell JsonLoader the schema of the data. Such as:</p>
<pre>a = load 'input.json' using JsonLoader('a0:int,a1:{(a10:int,a11:chararray)},a2:(a20:double,a21:bytearray),a3:[chararray]');</pre>
<p>However, if you are storing the data using JsonStorage, there will be a schema file stored along with the data. In this scenario, you don&#8217;t have to specify the schema for JsonLoader. JsonLoader will detect the schema file and use it.</p>
<p><strong>Bloom Filters</strong></p>
<p>Bloom filters are a common way to select a limited set of records before moving data for a join or other heavyweight operation. Pig includes two UDFs: BuildBloom to build a bloom filter and Bloom to use the bloom filter in a filter statement. At present, users will need to explicitly call both UDFs to get the full benefit of bloom filter. In the future, we will include them in the optimizer so that large join queries can use bloom filter automatically.</p>
<p>Please read the <a title="Apache Pig Hortonworks" href="https://issues.apache.org/jira/browse/PIG-2328" target="_blank">PIG-2328</a> JIRA for more information about bloom filters.</p>
<p><strong>Implement UDF by Simulation</strong></p>
<p>In the chain of EvalFunc -&gt; Accumulator -&gt; Algebraic, if you implement a more complex UDF (righthand side), you can use simulation to get a simpler UDF (lefthand side) for free. You can achieve this by using AlgebraicEvalFunc or AccumulatorEvalFunc. Check <a href="http://pig.apache.org/docs/r0.10.0/api/org/apache/pig/AlgebraicEvalFunc.html">AlgebraicEvalFunc.html</a> and <a href="http://pig.apache.org/docs/r0.10.0/api/org/apache/pig/AccumulatorEvalFunc.html">AccumulatorEvalFunc.html</a> for detail.</p>
<h4>Other Improvements</h4>
<p><strong>Sparse Joins</strong></p>
<p>Pig 0.10 introduces a new join type: &#8216;merge-sparse’. This is useful for cases when both joined tables are pre-sorted and indexed, and the right-hand table has few ( &lt; 1% of its total) matching keys. Further detail on sparse joins is available in the <a title="Apache Pig Hortonworks" href="http://pig.apache.org/docs/r0.10.0/perf.html#merge-sparse-joins" target="_blank">Pig documentation</a>.</p>
<p><strong>Complete S3 Support</strong></p>
<p>In Pig 0.10, every component of a Pig script can be in HDFS or Amazon Web Service&#8217;s S3. This includes the Pig script file, dependent jars, parameter files, macros, scripting UDFs, etc.</p>
<p><strong>Kill Hadoop Job</strong></p>
<p>If you kill a Pig job using Ctrl-C or &#8220;kill&#8221;, Pig will now kill all associated Hadoop jobs currently running. This is applicable to both grunt mode and non-interactive mode.</p>
<h4>Conclusion</h4>
<p>Remember to check out <a title="Apache Pig Hortonworks" href="http://pig.apache.org/releases.html" target="_blank">Pig 0.10</a> when you get a chance. Also, don&#8217;t forget to register for <a title="Apache Hadoop Summit Hortonworks" href="http://hadoopsummit.org" target="_blank">Hadoop Summit 2012</a>. There will be some useful Pig presentations including my session with Thejas Nair, <a title="Apache Hadoop Pig Hortonworks" href="http://hadoopsummit.org/program/#session20" target="_blank">Pig Programming is More Fun: New Features in Pig</a>.</p>
<p>~ Daniel Dai</p>
]]></content:encoded>
			<wfw:commentRss>http://hortonworks.com/blog/new-features-in-apache-pig-0-10/feed/</wfw:commentRss>
		<slash:comments>4</slash:comments>
		</item>
		<item>
		<title>Snapshots for HDFS</title>
		<link>http://hortonworks.com/blog/snapshots-for-hdfs/</link>
		<comments>http://hortonworks.com/blog/snapshots-for-hdfs/#comments</comments>
		<pubDate>Wed, 25 Apr 2012 18:08:37 +0000</pubDate>
		<dc:creator>Hari Mankude</dc:creator>
				<category><![CDATA[Apache Hadoop]]></category>
		<category><![CDATA[HDFS]]></category>

		<guid isPermaLink="false">http://hortonworks.com/?p=4165</guid>
		<description><![CDATA[This blog covers our on-going work on Snapshots in Apache Hadoop HDFS. In this blog, I will cover the motivations for the work, a high level design and some of the design choices we made. Having seen snapshots in use with various filesystems, I believe that adding snapshots [...]]]></description>
			<content:encoded><![CDATA[<p>This blog covers our on-going work on Snapshots in Apache Hadoop HDFS. In this blog, I will cover the motivations for the work, a high level design and some of the design choices we made. Having seen snapshots in use with various filesystems, I believe that adding snapshots to Apache Hadoop will be hugely valuable to the Hadoop community. With luck this work will be available to Hadoop users in late 2012 or 2013.</p>
<p>A <strong><em>snapshot</em></strong> is a point-in-time image of the entire filesystem or a subtree of a filesystem. Some of the scenarios where snapshots are very useful:</p>
<ol>
<li><strong>Protection against user errors</strong>:  Admin sets up a process to take read-only (RO) snapshots periodically in a rolling manner so that there are always x number of RO snapshots on HDFS. If a user accidentally deletes a file, the file can be restored from the latest RO snapshot that contains the file. <strong></strong></li>
<li><strong>Backup:</strong> Admin wants backup the entire file system, a subtree in the file system or just a file. Depending on the requirements, admin takes a read-only (henceforth referred to as RO) snapshot and uses this snapshot as the starting point of a full backup. Incremental backups are then taken by doing a diff between two snapshots.<strong></strong></li>
<li><strong>Experimental/Test setups:</strong>  A user wants to test an application against the main dataset. Normally, without doing a full copy of the dataset, this is a very risky proposition because the test setup can corrupt/overwrite production data. Admin creates a read-write (henceforth referred to as RW) snapshot of the production dataset and assigns the RW snapshot to the user to be used for experiment. Changes done to the RW snapshot will not be reflected on the production dataset.<strong></strong></li>
<li><strong>Disaster Recovery:</strong>  RO Snapshots can be used to create a consistent point in time image for replication and this can be copied over to remote site for Disaster Recovery.</li>
</ol>
<h4>High Level Requirements</h4>
<ol>
<li>Read-only (RO) snapshots: These are immutable copies of underlying elements of the file system.</li>
<li>Read-write (RW) snapshots: RW snaps can be modified by a user.</li>
<li>Support for taking snapshots of the entire namespace, or a subtree.</li>
<li>Support for a reasonable number of snapshots in a single namenode.</li>
<li>Snapshots should be easy to browse using standard commands and tools, and copying of data from a snapshot should work with standard Hadoop commands and API.</li>
</ol>
<h4>High Level approaches</h4>
<p>We considered two options for snapshots.</p>
<p><strong>Option #1:</strong> Both datanodes and namenode are aware of the snapshots and save state internally about the snapshots. Datanode is aware of the fact that some of the blocks are for the snapshot files.<strong></strong></p>
<p><strong>Option #2</strong>: Only namenode is aware of the snapshot. Datanode is not aware of the fact that some of the blocks are owned by snapshots of the original file.</p>
<p>Option #2 is selected to keep the design simple. Additionally, taking snapshots is very fast with option #2. Datanode does not know anything about snapshots and is not aware of block ownership issues between root file system and snapshots. Keeping datanodes free from snapshot information simplifies the design immensely by eliminating the need for distributed co-ordination from the design of the snapshots by restricting the changes to namenode only.</p>
<h4>Creating and Deleting Snapshots</h4>
<p>A key requirement is to ensure that it is very easy to create and delete snapshots. Snapshot creation and deletion is an admin-only capability. To create a snapshot, one specifies a  snapshot name, a path to the root of the subtree whose snapshot is to be taken, and whether or not the snapshot is read-only or a read-write. Deleting snapshot requires just a snapshot name. A command to list all the snaps in the filesystem will be provided.</p>
<h4>Accessing Directories and Files in a Snapshot</h4>
<p>Snapshots can be referenced with regular HDFS path names with a reserved string .snapshot_&lt;name&gt;:</p>
<p><strong><em> hdfs://host:port/pathOfSnapshot/.snapshot_&lt;name&gt;/restOfPathInSnapshot</em></strong></p>
<p>This has the benefit that snapshots can be referenced with all existing Hadoop commands and APIs that take a pathname by adding a reserved snapshot string to the pathname.</p>
<p>Examples:  Consider a directory structure of /a/b/c/foo.txt. Admin has created a snapshot hdfs1 at /a/b. To access data related to snapshot hdfs1, some examples of the commands would be:</p>
<p><em>hadoop dfs -ls /a/b/<strong>.snapshot_hdfs1</strong>/c/foo.txt</em><em> </em></p>
<p>To copy file from /temp/foo/foo1.txt in snapshot branch to /fooBar would be,</p>
<p><em>hadoop dfs -cp /a/b/<strong>.snapshot_hdfs1</strong>/c/foo.txt /foobar/.</em></p>
<p>Some caveats for RO snapshots include the fact that RO snapshot is immutable. So, operations such as creating a new file, deleting a file, creating a new directory, renaming a file or directory will fail when executed on the snapshot branch.<strong> </strong></p>
<h4>Conclusion</h4>
<p>Snapshots are a very useful feature to have in a mature filesystem. This is a work in progress and we have a functional prototype implemented. The first version of this feature will support RO snapshots only. The support for RW snapshots will be added in the subsequent releases. There are several features that can be incorporated into snapshots, such as time to live for snapshots with auto deletion, schedule based creation of snapshots, marking specific directories as snapshot-worthy, quota based restriction on space used by RW snapshots and delegation of authority for creating/deleting snapshots at specific locations to users etc.</p>
<p>To track the development of snapshots feature in HDFS, please follow the jira <a title="Apache Hadoop Hortonworks" href="https://issues.apache.org/jira/browse/HDFS-2802" target="_blank">HDFS-2802</a>.</p>
<p>~ Hari Mankude</p>
]]></content:encoded>
			<wfw:commentRss>http://hortonworks.com/blog/snapshots-for-hdfs/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Executive Video Series: Overview of Hortonworks Data Platform</title>
		<link>http://hortonworks.com/blog/executive-video-series-overview-of-hortonworks-data-platform/</link>
		<comments>http://hortonworks.com/blog/executive-video-series-overview-of-hortonworks-data-platform/#comments</comments>
		<pubDate>Mon, 16 Apr 2012 12:48:25 +0000</pubDate>
		<dc:creator>John Kreisa </dc:creator>
				<category><![CDATA[Ambari]]></category>
		<category><![CDATA[Hortonworks Topics]]></category>

		<guid isPermaLink="false">http://hortonworks.com/?p=3990</guid>
		<description><![CDATA[We just released the second video in the Hortonworks Executive Series. This one features Matt Foley, Test and Release Engineering Manager for Hortonworks. In this video, Matt provides an overview of Hortonworks Data Platform (HDP), including a summary of the Apache Hadoop components included in the distribution and [...]]]></description>
			<content:encoded><![CDATA[<p>We just released the second video in the Hortonworks Executive Series. This one features Matt Foley, Test and Release Engineering Manager for Hortonworks.</p>
<p>In this video, Matt provides an overview of <a title="Hortonworks Data Platform " href="http://hortonworks.com/technology/hortonworksdataplatform/">Hortonworks Data Platform </a>(HDP), including a summary of the Apache Hadoop components included in the distribution and the testing processes involved in the release process. Matt also provides an overview of <a title="Apache Ambari Hadoop Monitoring Hortonworks" href="http://incubator.apache.org/ambari/" target="_blank">Apache Ambari</a>, an open source project that is adding monitoring and management capabilities to Apache Hadoop.</p>
<p><span id="more-3990"></span></p>
<div class="video_cell"><a title="Matt Foley talks about Hortonworks Data Platform and the testing processes involved in its release. Also discussed is Apache Ambari, an open source project aimed at adding monitoring and management functionality for Apache Hadoop." href="http://player.vimeo.com/video/40236403?portrait=0&amp;color=81e62e&amp;autoplay=1" rel="shadowbox;width=640;height=360"><img class=" wp-image-3348" title="Overview of Hortonworks Data Platform" src="http://hortonworks.com/wp-content/uploads/2012/02/overview-thumb.jpg" alt="" width="160" height="121" /></a></div>
<p>If you are interested in learning more about Hortonworks technology, please visit the <a title="Hortonworks Data Platform" href="http://hortonworks.com/technology/hortonworksdataplatform/">HDP page</a> or attend one of the live or on-demand <a title="Apache Hadoop Webinars Hortonworks" href="http://hortonworks.com/webinars/">Webinars</a>. Even better, consider attending the <a title="Apache Hadoop Summit Hortonworks" href="http://hadoopsummit.org" target="_blank">Hadoop Summit</a> conference in June, where you will have a chance to learn from many of the core Apache Hadoop developers and other community leaders. You should also consider attending a Hortonworks <a title="Apache Hadoop Training Hortonworks" href="http://hortonworks.com/training/">training</a> class, either in conjunction with Hadoop Summit or at another time and location more convenient to your schedule.</p>
<p>~John Kreisa</p>
]]></content:encoded>
			<wfw:commentRss>http://hortonworks.com/blog/executive-video-series-overview-of-hortonworks-data-platform/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Hortonworks Welcomes Citrix and CloudStack to the Apache Community</title>
		<link>http://hortonworks.com/blog/hortonworks-welcomes-citrix-and-cloudstack-to-the-apache-community/</link>
		<comments>http://hortonworks.com/blog/hortonworks-welcomes-citrix-and-cloudstack-to-the-apache-community/#comments</comments>
		<pubDate>Tue, 03 Apr 2012 12:04:39 +0000</pubDate>
		<dc:creator>Eric Baldeschwieler</dc:creator>
				<category><![CDATA[Hortonworks Topics]]></category>
		<category><![CDATA[Industry Happenings]]></category>

		<guid isPermaLink="false">http://hortonworks.com/?p=3743</guid>
		<description><![CDATA[We are pleased to support today&#8217;s announcement from Citrix that they have contributed CloudStack to the Apache community. For those new to CloudStack, it is an open source cloud computing software that helps organizations build and manage cloud infrastructures. It is similar to Amazon Web Services EC2 environment [...]]]></description>
			<content:encoded><![CDATA[<p>We are pleased to support today&#8217;s announcement from <a title="Citrix Hortonworks" href="http://www.citrix.com" target="_blank">Citrix</a> that they have contributed <a title="CloudStack Hortonworks" href="http://www.cloudstack.org/" target="_blank">CloudStack</a> to the Apache community. For those new to CloudStack, it is an open source cloud computing software that helps organizations build and manage cloud infrastructures. It is similar to Amazon Web Services EC2 environment except that it enables organizations to build public, private or hybrid cloud environments using their own pooled computing resources.</p>
<p>Citrix <a title="Citrix CloudStack Apache" href="http://www.citrix.com/English/NE/news/news.asp?newsID=2323072" target="_blank">announced</a> today that they were reaffirming their commitment to open source by working with the Apache Software Foundation to make CloudStack 3 an Apache project, released under Apache Software License 2.0. This is yet further acknowledgement that Apache is the logical home for open source projects that are transforming the enterprise software industry. As a Gold Sponsor of the ASF and major contributor to Apache projects, Hortonworks is pleased that leading vendors such as Citrix are recognizing the value that Apache can provide in terms of accelerating development and innovation and driving adoption as the preferred destination for enterprise-class open source software.</p>
<p><span id="more-3743"></span>Today&#8217;s announcement also highlights the great synergies between CloudStack and Apache Hadoop. As the first cloud platform in the industry to join the ASF, CloudStack becomes the logical cloud choice for organizations that prefer an open source option for their cloud and big data infrastructure. Hortonworks is excited to work with the CloudStack project team to identify opportunities where Hadoop components can be used to back Cloud APIs and also where Cloud APIs can be used to deploy Hadoop.</p>
<p>Today marks a win for the Apache Software Foundation, CloudStack, Apache Hadoop and Hortonworks. The leading open source cloud community with more than 30,000 active members today joins forces with the Apache Software Foundation. This announcement is also further validation that Apache projects are the right place for developing software for enterprise grade platforms to meet the demanding cloud and Big Data needs of the enterprise.</p>
<p>~E14</p>
]]></content:encoded>
			<wfw:commentRss>http://hortonworks.com/blog/hortonworks-welcomes-citrix-and-cloudstack-to-the-apache-community/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
	</channel>
</rss>

