Representatives from Twitter, Yahoo, LinkedIn, Hortonworks and IBM met at Twitter HQ on Thursday to talk HCatalog. Committers from HCatalog, Pig and Hive were on hand to discuss the state of HCatalog and its future.
Apache HCatalog is a table and storage management service for data created using Apache Hadoop.
A central theme was using HCatalog to enable sharing and use of legacy data and diverse formats like TSV, JSON, RCFile, Protobuf, Thrift and Avro, among diverse tools like Pig, Hive, Cascading, SQL-H and JAQL.
A key issue discussed were the mechanics of HCatalog’s integration with Hive as the project develops and matures. Some HCatalog users use Hive, and some do not – but HCatalog relies on the Hive metastore regardless. As usual in open source, each organization has its own set of problems, perspectives and priorities, and the discussion centers around commonalities in finding a common path forward.
One thing was clear: HCatalog is HOT! An increasing number of organizations are adopting HCatalog for managing data and systems integration around Hadoop.