

By default, HBase-Writer writes crawled url content into an HBase table as individual records or 'rowkeys'. HOWEVER, I do not want to do an uninstall/reinstall, because we use Cloudera, and I don't want to mess with their whole weird configuration and setup. Heritrix is written by the Internet Archive and HBase Writer enables Heritrix to store crawled content directly into HBase tables running on the Hadoop Distributed FileSystem. The formula is as follows: Number of concurrent archive tasks Total size of log files to be archived/Size of. I want to completely nuke my HBase installation and start over fresh and clean. start the hfile archive cleaner thread LOG.info(INIT HFILE.
Hbase archive cleaner code#
If inconsistencies still remain after these steps, you most likely have table integrity problems related to orphaned or overlapping regions.īasically, I have no interest in digging in and trying to fix this. startServiceThreads()->Execute the following code to initialize HFileCleaner and LogCleaner. Navigate to the Services->All Services screen in Cloudera Manager.

I ran this command :Īnd the readout ended with this : Summary:ĭeployed on: localhost.localdomain,60020,1340917622717 Cleaning Your Splice Machine Database on a Cloudera-Managed Cluster Shut down HBase and HDFS. Somehow, my HBase installation has gotten totally corrupted. Running HBase in pseudo-distributed mode on my dev box.
