HBase - Executive Summary

HBase - Executive Summary

By  digitalART2

Hadoop is the de facto standard for Big Data.  HBase is the Hadoop database, it is designed to host very large tables atop clusters of industrial standard servers with effective random access. 


Key Advantages of Hadoop and HBase

Hadoop is the popular data storage and big data analysis platform. Large and successful companies are using it to do powerful analysis. Hadoop offers two important services: It can store any kind of data from any source, inexpensively and at very large scale, and it can do very sophisticated analysis of that data easily and quickly.  Hadoop and HBase deliver several key advantages:


Store anything and no information is lost  
Hadoop stores data in its native format without forcing the transformation when data arrives, therefore no information is lost. Downstream analyses run with no loss of fidelity.  Hadoop allows the data analyst to choose how and when to digest, analyze and transform data.

Extremely cost effective to handle Big Data 
Hadoop runs on industrial standard hardware. It means that the cost per terabyte, for both storage and processing, is much lower than on older systems. HBase makes efficient use of disk space by support pluggable compression algorithms.  Adding or removing storage capacity is simple. You can dedicate new hardware to a cluster incrementally.

Use with confidence 
The user community of Hadoop and HBase is global, active and diverse. Companies across many industries participate, including social networking, media, financial services, telecommunications, retail, health care and others (for more information, please read: Who uses HBase and Hadoop).

Proven at scale
You may not have petabytes of data that you need to analyze today, nevertheless, you can deploy Hadoop with confidence because companies like Facebook, Yahoo! and others run very large Hadoop instances managing enormous amounts of data. When you adopt a platform for data management and analysis, you are making a commitment that you will have to live with for years. The success of the biggest Web companies in the world demonstrates that Hadoop can grow as your business does.


Key features of HBase:

HBase is the Hadoop database, "a distributed, persistent, strictly consistent storage system with near-optimal write and excellent read performance, and it makes efficient use of disk space by supporting pluggable compression algorithms that can be selected based on the nature of the data in specific column families ...... and remove the wait for deadlocks-related pauses experienced with other systems"   - Lars George

  • Full Hadoop integration:  fully supports HDFS and Hadoop MapReduce
  • Highly fault tolerance !
  • Built-in scalability !
  • Built-in load-balancing !
  • Strongly consistent !
  • Automatic versioning !
  • Automatic RegionServer failover
  • Flexible secondary index solutions
  • Automatic sharding: automatically split big tables and re-distribute them as your data grows.
  • Supports massively parallelized processing
  • Provides Push-Down predicates
  • Java Client API: HBase supports an easy to use Java API for programmatic access.
  • Thrift/REST API: HBase also supports Thrift and REST for non-Java front-ends.
  • Block Cache and Bloom Filters: HBase supports a Block Cache and Bloom Filters for high volume query optimization.
  • Operational Management: HBase provides build-in web-pages for operational insight as well as JMX metrics.

One of the most important features is the Strongly consistent, by using multiversioning it can help you to avoid edit conflicts caused by concurrent decoupled processes, provides near-optimal write and excellent read performance (for more information about consistent, please read "All Things Distributed" written by Mr. Werner Vogels CTO - Amazon.com).


Use cases of HBase and Hadoop

Simple numerical summaries – average, minimum, sum – were sufficient for the business problems of the 1980s and 1990s. Large amounts of complex data, though, require new techniques. Recognizing customer preferences requires analysis of purchase history, but also a close examination of browsing behavior and products viewed, comments and reviews logged on a web site, and even complaints and issues raised with customer support staff. Predicting behavior demands that customers be grouped by their preferences, so that behavior of one individual in the group can be used to predict the behavior of others. The algorithms involved include natural language processing, pattern recognition, machine learning and more. These techniques run very well on Hadoop.

Use case of HBase and Hadoop, as follows but not limited to:

  • Recommendation Engine - How can companies predict customer preferences? Click-stream analysis, log analysis at web scale
  • Customer Churn Analysis - How to win more customers and avoid really losing customers?  Sophisticated data mining 
  • AD Targeting - How can companies increase campaign efficiency? Marketing automation, business intelligence
  • Point-of-sales Transaction Analysis - How do retailers target promotions guaranteed to make you buy?
  • Analyzing Network Data to Predict - How can organizations use machine generated data to identify potential trouble?
  • Threat Analysis - How can companies detect threats and fraudulent activity? Crawling, text processing
  • Trade Surveillance - How can a bank spot the rogue trader?
  • Search Quality - What’s in your search?
  • Data Sandbox - What can you do with new data? Big data archiving and sandbox, including of relational/tabular data
  • GIS - 3D maps, spatial applications
  • Real-time Customer Segmentation - Marketing analytics 

When you need random, realtime read/write access to your Big Data, you may consider HBase now.

Please feel free to contact us if you have any queries.

PostgreSQL, Open Source, database, Oracle, SQLServer, MYSQL