Hadoop is the de facto standard for Big Data. HBase is the Hadoop database, it is designed to host very large tables atop clusters of industrial standard servers with effective random access.
Key Advantages of Hadoop and HBase
Hadoop is the popular data storage and big data analysis platform. Large and successful companies are using it to do powerful analysis. Hadoop offers two important services: It can store any kind of data from any source, inexpensively and at very large scale, and it can do very sophisticated analysis of that data easily and quickly. Hadoop and HBase deliver several key advantages:
• Store anything and no information is lost
Hadoop stores data in its native format without forcing the transformation when data arrives, therefore no information is lost. Downstream analyses run with no loss of fidelity. Hadoop allows the data analyst to choose how and when to digest, analyze and transform data.
• Extremely cost effective to handle Big Data
Hadoop runs on industrial standard hardware. It means that the cost per terabyte, for both storage and processing, is much lower than on older systems. HBase makes efficient use of disk space by support pluggable compression algorithms. Adding or removing storage capacity is simple. You can dedicate new hardware to a cluster incrementally.
• Use with confidence
The user community of Hadoop and HBase is global, active and diverse. Companies across many industries participate, including social networking, media, financial services, telecommunications, retail, health care and others (for more information, please read: Who uses HBase and Hadoop).
• Proven at scale
You may not have petabytes of data that you need to analyze today, nevertheless, you can deploy Hadoop with confidence because companies like Facebook, Yahoo! and others run very large Hadoop instances managing enormous amounts of data. When you adopt a platform for data management and analysis, you are making a commitment that you will have to live with for years. The success of the biggest Web companies in the world demonstrates that Hadoop can grow as your business does.
Key features of HBase:
HBase is the Hadoop database, "a distributed, persistent, strictly consistent storage system with near-optimal write and excellent read performance, and it makes efficient use of disk space by supporting pluggable compression algorithms that can be selected based on the nature of the data in specific column families ...... and remove the wait for deadlocks-related pauses experienced with other systems" - Lars George
One of the most important features is the Strongly consistent, by using multiversioning it can help you to avoid edit conflicts caused by concurrent decoupled processes, provides near-optimal write and excellent read performance (for more information about consistent, please read "All Things Distributed" written by Mr. Werner Vogels CTO - Amazon.com).
Use cases of HBase and Hadoop
Simple numerical summaries – average, minimum, sum – were sufficient for the business problems of the 1980s and 1990s. Large amounts of complex data, though, require new techniques. Recognizing customer preferences requires analysis of purchase history, but also a close examination of browsing behavior and products viewed, comments and reviews logged on a web site, and even complaints and issues raised with customer support staff. Predicting behavior demands that customers be grouped by their preferences, so that behavior of one individual in the group can be used to predict the behavior of others. The algorithms involved include natural language processing, pattern recognition, machine learning and more. These techniques run very well on Hadoop.
Use case of HBase and Hadoop, as follows but not limited to:
When you need random, realtime read/write access to your Big Data, you may consider HBase now.
Please feel free to contact us if you have any queries.