I wanted to find a table in redshift cluster and was looking for something similar to dba_ or v$ views in oracle to find the table.
[Read More]
How to remove the license from a Cloudera Cluster
Process to remove license from a cloudera cluster
Recently we wanted to remove license for one of our clusters, and we used the following approach to remove the license from the cluster. We are on CDH 5.11 which is very old and gone out of support.
[Read More]
How to disable kerberos in a CDH cluster
Sharing my experience and steps followed for Dekerberization of hadoop cluster
I am sure the first question arising in any sane person’s mind is why are we doing it? normally everyone goes the other way, we would want to make our cluster more secure and kerberize the cluster. And here I am, going the other way. It is due to change...
[Read More]
Application Template for Spark Scala with Gradle
Sharing my tips for developing an application on local desktop and automating the complete build process using Apache Spark with Scala
Apache Spark is the unified analytics engine for large scale data processing. Although it supports writing applications in Java, Scala, Python, R, SQL; Scala is preferred by many(especially by me) to develop applications due to its engine’s nativity. Furthermore, developing applications utilizing Spark on local desktop, automating build, testing and...
[Read More]
How to monitor network packet drops in a Cluster
monitor network packet drops in cloudera cluster
Hadoop clusters, or for that matter, any cluster which perform data processing does put a lot of pressure on the network infrastructure. So we need to understand and monitor and tune the network performance of the nodes of the cluster.
[Read More]