Most frequently asked Cloudera Admin Interview Questions

What is Cloudera?
What is cloudera used for?
Name some individual services supported by Cloudera ?
What are Cloudera's Products?
What is the architecture of Cloudera Manager Deployment?
What is Hadoop?
Explain Hadoop Architecture?
What is MapReduce?
Explain Hadoop Yarn architecture?
How to check Spark Version?
Explain the architecture of Cloud Navigator Auditing?
How can we find cdh version hadoop?
How to run HBase shell against a remote cluster?
How to set configuration in Hive-Site.xml file for hive metastore connection?
Explain Conolical Stream processing architecture?

What is Cloudera?

Cloudera is known as a data management that offers unified platform for massive data and The Enterprise Data.Cloudera provides enterprises one place for storing, processing and analyzing all of the data by empowering all of them for extending the value of the existing investments and also enabling fundamental new ways for deriving value from its data.
It also offers software for business critical data challenges that contains storage, accessing, managing, analyzing, security and searching.

What is cloudera used for?

Cloudera is management used for enabling long term deployments for hundreds of customers containing petabytes of data collection under management, across all the diverse industries and also used in providing an enterprise data cloud for building an open source technology as Cloudera platform is used as analytics and machine learning for yielding insights from the data through a secure connection.

What are some individual services supported by Cloudera?

Services supported by Cloudera are as foolows:

Cloudera Data Platform
Cloudera DataFlow
Cloudera Data Engineering
Cloudera Data Warehouse
Cloudera Operational Database
Cloudera Machine Learning
Cloudera Data Hub
Cloudera Data Visualization
Cloudera Workload Manager
Cloudera SDX
Cloudera Fast Forward Labs
Data Hub

What are Cloudera's Products?

Cloudera is used specifically for addressing all customers the opportunities and challenges in Big Data that is available in the form of an unsupported or supported, enterprise class software in the form of an annual subscription and the integration the work done for us and the entire solution is entirely tested for enterprise requirements and fully documented.

What is the architecture of Cloudera Manager Deployment?

What is Hadoop?

Hadoop is called as an open source that helps in allowing all the distributed processing of large datasets across clusters of computers by using simple programming models.Hadoop is used in providing the distributed storage and computation through the clusters of the computers.It is designed for scaling up all single server to thousands of machines, and offering local computation and storage.

Explain Hadoop Architecture?

Hadoop contains two layers:
Processing/Computation layer (MapReduce)
Storage layer (Hadoop Distributed File System)

Cloudera

What is MapReduce?

MapReduce acts as a program model used for distributing the computing based on Java and containing all the important tasks, namely Map and Reduce.It takes a set of data and also converts it into another set of data where all the elements are broken down in to tuples.MapReduce is also used for reducing the tasks and also taking the output from a map as an input and combines all the data tuples into small sets of tuples.

Explain Hadoop Yarn architecture?

How to check Spark Version?

spark-submit --version

Explain the architecture of Cloud Navigator Auditing?

How can we find cdh version hadoop?

$ hadoop version

How can we run HBase shell against a remote cluster?

<configuration>
  <property>
   <name>hbase.cluster.distributed</name>
   <value>true</value>
 </property>
  <property>
    <name>hbase.zookeeper.quorum</name>
 <value>zk1,zk2,zk3</value>
  </property>
   <property>
        <name>zookeeper.znode.parent</name>
<!--or /hbase-->
        <value>/hbase-unsecure</value>
    </property>
</configuration>

How can we set configuration in Hive-Site.xml file for hive metastore connection?

import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.ResultSet;
import java.sql.Statement;

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.hive.conf.HiveConf;
import org.apache.hadoop.hive.conf.HiveConf.ConfVars;

public class HiveMetastoreJDBCTest {

    public static void main(String[] args)  Exception {

        Connection conn = null;
        try {
            HiveConf conf = new HiveConf();
            conf.addResource(new Path("file:///path/to/hive-site.xml"));
            Class.forName(conf.getVar(ConfVars.METASTORE_CONNECTION_DRIVER));
            conn = DriverManager.getConnection(
                    conf.getVar(ConfVars.METASTORECONNECTURLKEY),
                    conf.getVar(ConfVars.METASTORE_CONNECTION_USER_NAME),
                    conf.getVar(ConfVars.METASTOREPWD));

            Statement st = conn.createStatement();
            ResultSet rs = st.executeQuery(
                "select ttbl_name, slocation from tbls t " +
                "join sds s on t.sd.id = s.sd.id");
            while (rs.next()) {
                System.out.println(rs.getString(1) + " : " + rs.getString(2));
            }
        }        

    }
}