tag:blogger.com,1999:blog-860423771829255614.comments2023-06-22T01:52:58.037-07:00LinelandLars Georgehttp://www.blogger.com/profile/13990677998590435541noreply@blogger.comBlogger127125tag:blogger.com,1999:blog-860423771829255614.post-50475125111856846712015-12-22T03:42:33.912-08:002015-12-22T03:42:33.912-08:00Hi Lars,
Are you still doing HBase consulting? Jus...Hi Lars,<br />Are you still doing HBase consulting? Just curious to know - how far you were successful in your endeavor?Unknownhttps://www.blogger.com/profile/09534820336082802947noreply@blogger.comtag:blogger.com,1999:blog-860423771829255614.post-50396450122774848622015-11-26T01:32:23.635-08:002015-11-26T01:32:23.635-08:00Great article... Thanx for sharing... Can you plea...Great article... Thanx for sharing... Can you please upload more????<br /><br /><br /><br /><a href="http://www.newyearimages2016.com/" rel="nofollow">Happy New Year 2016</a>Anonymoushttps://www.blogger.com/profile/06186924297990357520noreply@blogger.comtag:blogger.com,1999:blog-860423771829255614.post-73226274994314485392014-10-06T14:03:36.137-07:002014-10-06T14:03:36.137-07:00Because of that, data is written into new files an...Because of that, data is written into new files and as their number grows HBase compacts them into another set of new, consolidated files.<a href="https://owncloud.com/secure-file-sharing-owncloud-makes-sense-todays-data-breached-world/" rel="nofollow">file sharing</a><br />Anna Schaferhttps://www.blogger.com/profile/09633259957714692411noreply@blogger.comtag:blogger.com,1999:blog-860423771829255614.post-24401162330893800332014-03-27T10:31:23.641-07:002014-03-27T10:31:23.641-07:00Very informative. I searched the net for this leve...Very informative. I searched the net for this level of details but found sparse data about the internals. This article has helped me connect the various dots.<br />Thanks Lars for such an amazing article.Anonymoushttps://www.blogger.com/profile/11932591151554476728noreply@blogger.comtag:blogger.com,1999:blog-860423771829255614.post-9062836775961359622014-01-08T15:17:39.480-08:002014-01-08T15:17:39.480-08:00Yes, merge is using B+tree
in its own RegionServer...Yes, merge is using B+tree<br />in its own RegionServer<br />Yes.Pihttps://www.blogger.com/profile/09459270445676730997noreply@blogger.comtag:blogger.com,1999:blog-860423771829255614.post-88147928093348967662014-01-08T15:13:11.242-08:002014-01-08T15:13:11.242-08:00In HMatser
From ZoopkeeperIn HMatser <br />From ZoopkeeperPihttps://www.blogger.com/profile/09459270445676730997noreply@blogger.comtag:blogger.com,1999:blog-860423771829255614.post-14139187218935312082014-01-08T15:01:05.676-08:002014-01-08T15:01:05.676-08:00HFile.
you mean memstore, yes.
Hbase will split in...HFile.<br />you mean memstore, yes.<br />Hbase will split into another region in same regionserverPihttps://www.blogger.com/profile/09459270445676730997noreply@blogger.comtag:blogger.com,1999:blog-860423771829255614.post-14601782504679310992014-01-08T14:53:53.220-08:002014-01-08T14:53:53.220-08:00per RegionServerper RegionServerPihttps://www.blogger.com/profile/09459270445676730997noreply@blogger.comtag:blogger.com,1999:blog-860423771829255614.post-19587267684946218252013-09-21T01:06:41.981-07:002013-09-21T01:06:41.981-07:00Nice Comparision/discussion
Regarding Metadata, H...Nice Comparision/discussion<br /><br />Regarding Metadata, HCATALOG(Merged with Hive v11) can use be used, to get the metadata defined in Hive metastore , to be used in pig. It would be helpful to manage your metadata across pig, hive and mapreduce.hussain jamalihttps://www.blogger.com/profile/14341196606337610294noreply@blogger.comtag:blogger.com,1999:blog-860423771829255614.post-32446103989038515612013-08-14T04:53:41.028-07:002013-08-14T04:53:41.028-07:00
Very good analysis and summary.
I look at Hive a...<br />Very good analysis and summary.<br /><br />I look at Hive as being an open sourced<br />DWS for the Hadoop Framework.<br /><br />Pig / PigLatin is a data_Flow language that is <br />geared up for Big_Data Information-Streams.<br /><br />VBR/ Wallis Dudhnath<br />Wallis Dudhnathhttps://www.blogger.com/profile/08920863882231754966noreply@blogger.comtag:blogger.com,1999:blog-860423771829255614.post-58182880451148497122013-06-05T21:55:24.774-07:002013-06-05T21:55:24.774-07:00Yes you are right.Yes you are right.Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-860423771829255614.post-46641824392443305112013-06-05T21:54:21.530-07:002013-06-05T21:54:21.530-07:00This article is talking HFile v1. In V2 this has c...This article is talking HFile v1. In V2 this has changed to multi level indexing with B+tree like data structure for efficient processing and the index is moved at block level. Anonymousnoreply@blogger.comtag:blogger.com,1999:blog-860423771829255614.post-60477007218326137922013-04-30T21:07:51.888-07:002013-04-30T21:07:51.888-07:00Lars, Great writeup. Its quite informative. I have...Lars, Great writeup. Its quite informative. I have apriblem that is actually counter intutive. I have an Hbase/ Hadoop cluster with 8 nodes. All the tables I store fit into a single region. So any time I scan a table, it hits a single node. The cluster is absolutely idle, and the tables get no writes at all. If I run an hbase shell on one of the nodes and scan one of the tables the performane varies 6x depending on where i scan it. If I scan it on any machine in the cluster which is not the regionserver hosting the region for the table I am scanning it runs mich faster tan if I were to scan the table from the machine that actually hosts the specific region i am scanning. All other nodes are ok. And this happens regardless of which table I choose to scan. These are scan table from hbase shell, not even a mapreduce. Any ideas on why might this be? I have hbase 0.92 installed. ThanksAbhishekhttps://www.blogger.com/profile/13560099952162290375noreply@blogger.comtag:blogger.com,1999:blog-860423771829255614.post-45810661359892482712013-03-29T23:08:13.864-07:002013-03-29T23:08:13.864-07:00One question: In the first big picture, the HLog s...One question: In the first big picture, the HLog should be per RegionServer, not per Region ?Anonymoushttps://www.blogger.com/profile/12185377828997169368noreply@blogger.comtag:blogger.com,1999:blog-860423771829255614.post-9023661256233294532013-03-08T07:47:25.711-08:002013-03-08T07:47:25.711-08:00Hi Lars,
Thank you for this article. I am working...Hi Lars,<br /><br />Thank you for this article. I am working on building a Hadoop System and there will be HBase too. We are planning to server multiple purposes with the same data. There will be HBase, but there will be Mapreduce as well. You mention use of HFiles, does that mean I will have to plan space for both HFiles and Mapfiles for the same data (like doubling the storage volume), if I were to use Hbase and Mapreduce on the same Hadoop cluster.<br /><br />Thank you,<br /><br />OnerOakhttps://www.blogger.com/profile/08196532970542175787noreply@blogger.comtag:blogger.com,1999:blog-860423771829255614.post-36721844030046965032013-01-07T01:24:09.018-08:002013-01-07T01:24:09.018-08:00Hbase is the Hadoop database. Think of it as a di...<a href="http://www.rizecorp.com/bigdatasolutions.html" rel="nofollow"> Hbase </a> is the Hadoop database. Think of it as a distributed, scalable, big data store. Anonymoushttps://www.blogger.com/profile/13499063742475905482noreply@blogger.comtag:blogger.com,1999:blog-860423771829255614.post-6806332693428082592012-12-07T04:36:29.089-08:002012-12-07T04:36:29.089-08:00I am working on Hbase. I have query regarding how ...I am working on Hbase. I have query regarding how Hbase store the data in sorted order by LSM Tree.<br /><br />As per my understanding, Hbase use LSM Tree for data transfer in large scale data processing. when Data comes from client, it store in-memory sequentially first and than sort and store as B-Tree as Store file. Than it is merging the Store file with Disk B-Tree(of key). is it correct ? Am I missing something ?<br /><br />If Yes, than in cluster env. there are multiple RegionServers who take the client request. On that case, How all the Hlogs (of each regionServer) merge with disk B-Tree ?<br /><br />Is it like Hlog only merge the data with Hfile of same regionServer ?Anonymoushttps://www.blogger.com/profile/14482644241256122217noreply@blogger.comtag:blogger.com,1999:blog-860423771829255614.post-17300378357513635372012-11-27T05:28:11.361-08:002012-11-27T05:28:11.361-08:00Excellent post with informative Excellent post with informative Anonymoushttps://www.blogger.com/profile/15338498300054277130noreply@blogger.comtag:blogger.com,1999:blog-860423771829255614.post-90679236963860176262012-11-12T06:43:33.450-08:002012-11-12T06:43:33.450-08:00Great article, Thanks! I was actually struggling t...Great article, Thanks! I was actually struggling to understand how HBase deals with data locality. This has been helpfulא. פ.https://www.blogger.com/profile/12303893728441515853noreply@blogger.comtag:blogger.com,1999:blog-860423771829255614.post-78394269868502636552012-09-25T19:19:32.064-07:002012-09-25T19:19:32.064-07:00I find it ridiculous that we need to configure Had...I find it ridiculous that we need to configure Hadoop to allow this many threads. And I would also argue that HBase shouldn't die just because a DataNode was in a bad mood and decided to run out of "xcievers" (sic).tsunahttps://www.blogger.com/profile/06114951663056205324noreply@blogger.comtag:blogger.com,1999:blog-860423771829255614.post-33814431338339778982012-07-12T06:02:28.545-07:002012-07-12T06:02:28.545-07:00Hi,
I have few questions.
Q1: In RDBMS we have ...Hi, <br /><br />I have few questions.<br /><br />Q1: In RDBMS we have multiple DB schemas\oracle user instances. Similarly, can we have multiple db schemas in hbase? If yes, can we have multiple schemas one one hadoop-hbase cluster? <br /><br />If multiple schemas possible, how can we define them? Using configuration or programatically? <br /><br />Q2: can we have same column family name in multiple tables? if yes, does it impacts performance if we have same name column family in multiple tables? <br /><br />Q3: Sequential keys improves read performance and random keys improves write performance. which way one must go? <br /><br />Q4: What are best practices to improve hadoop+hbase performance? <br /><br />Q5: when one program is deleting tables, another program is accessing a row of that table. what would be impact of it? can we have some sort of lock while reading or while deleting a table? <br /><br />Q6: as everything in application is byte form, what would happen if hbase db and application are using different character set? can we synch both for some particular character set by configuration or programatically?<br /><br /><br />Regards,<br />RashmiAnonymoushttps://www.blogger.com/profile/02168040218675169728noreply@blogger.comtag:blogger.com,1999:blog-860423771829255614.post-76186532240472591992012-06-26T00:10:03.304-07:002012-06-26T00:10:03.304-07:00Thanks a lot! Its a really great blog! Any details...Thanks a lot! Its a really great blog! Any details on the magic part of the header?Anonymoushttps://www.blogger.com/profile/01221815252348791056noreply@blogger.comtag:blogger.com,1999:blog-860423771829255614.post-3742379285133814052012-06-14T17:54:48.699-07:002012-06-14T17:54:48.699-07:00@anti neutrino the default blocksize for HFiles ar...@anti neutrino the default blocksize for HFiles are 64KB. As George mentioned, smaller blocksizes are better for random read access and larger blocksizes are better for sequential reads such as scan. I've seen some people setting it up to 1GB for their purpose but your configuration may varyNantacobenhttps://www.blogger.com/profile/02778366209504086505noreply@blogger.comtag:blogger.com,1999:blog-860423771829255614.post-47871996145343987872012-06-07T12:34:34.575-07:002012-06-07T12:34:34.575-07:00Hi,
this is really insightful.
I have 1 query thou...Hi,<br />this is really insightful.<br />I have 1 query though, what is the ideal or recommended size of the HDFS block size for storing HBASE files ? To my guesses, the smaller the HDFS block size better it would be for HBase performance.anti neutrinohttps://www.blogger.com/profile/06318150636797804211noreply@blogger.comtag:blogger.com,1999:blog-860423771829255614.post-74848575317192780032012-05-18T14:19:34.596-07:002012-05-18T14:19:34.596-07:00If a record is like {
Name : "ABC"
Addre...If a record is like {<br />Name : "ABC"<br />Address : "XYZ"<br />Number : "123"<br />}<br /><br />Does HBase store the name of the column in the row with every row or does it store it once for the table and reference it in the individual rows?Yash Ganthehttps://www.blogger.com/profile/07800211441627230676noreply@blogger.com