HCatalog Introduction

Summary -

In this topic, we described about the Introduction in detail.

HCatalog is a Hadoop storage and table management layer. HCatalog enables different data processing tools like Pig, MapReduce for Users. Users can easily read and write data on the grid by using the tools enabled by HCatalog.

Users can directly load the tables using pig or MapReduce and no need to worry about re-defining the input schemas. HCatalog exposes the tabular data of HCatalog metastore to other Hadoop applications. Apache HCatalog is a project enabling non-HCatalog scripts to access HCatalog tables.

The users need not worry about where or in what format their data is stored. HCatalog table concept provides a relational view of data in the Hadoop Distributed File System (HDFS) to the users.

HCatalog can displays data from RCFile format, text files, or sequence files in a tabular view. HCatalog also provides APIs to access these tables metadata by external systems.

Features -

HCatalog functions for MapReduce users that provide a table abstraction.
HCatalog functions provide a table abstraction to Pig users.
The REST interface to the metadata server that allows new data management frameworks to create, update, delete and explore HCatalog tables.
HCatalog interfaces that allow parallel reads and writes of records in and out of tables.
HCatalog functions allow readers to push down partition pruning predicates and column projections.
HCatalog support for storing data in binary format without translating it.
HCatalog support for adding columns to partitions without requiring restating of existing stored data.
HCatalog support for presenting HBase tables as HCatalog tables.