Top Apache Flink Interview Questions (2024) | CodeUsingJava
















Most frequently asked Apache Flink Interview Questions


  1. What is Apache Flink?
  2. What are the features of Apache Flink?
  3. Explain Apache Flink Architecture?
  4. Explain the Apache Flink Job Execution Architecture?
  5. What is Storage / Streaming in Apache Flink?
  6. What is DataSet API?
  7. What is the difference between Apache Hadoop and Apache Flink?
  8. What are the DSL Tool's in Flink?
  9. Explain Complex Event Processing in Flink?
  10. What is the meaning of flow?


What is Apache Flink?

Apache Flink is a distributed streaming data flow engine written in Java and Scala.It is an open source that reduces complexity that have been faced by the other distributed data driven engine.Flink helps in running every dataflow program in data parallel and pipelined fashion.
Apache Flink is shipped be vendors like Cloudera, MapR, Oracle, and Amazon.It is a framework for high performance, scalable, and accurate real time applications.Apache Flink was founded by Data Artisans company and is developed under Apache License by Apache Flink Community.

What are the features of Apache Flink?

Flink uses common runtime for data streaming applications and batch processing applications.
  • It can run both Batch and Stream programs.
  • Provides APIs.
  • It has lightning fast speed.
  • Can process data in low latency.
  • It can easily integrate with big data tools like Apache Hadoop, Apache MapReduce, etc.
  • It is highly scalable.
  • It acts as a Fault tolerent.

Explain Apache Flink Architecture?

Apache Flink works on Kappa Architecture which treats all input as stream and the streaming engine processes the data in real time.
Following diagram shows the Architecture of Apache Flink.

Flink



Explain the Apache Flink Job Execution Architecture?

Apache Flink also works on Lambda architecture that can easily separate processors for batch and streaming data.We can separate codebases for batch and stream views.
Here is a diagram showing the process.

Flink

Client - helps in taking code and constructing job data flow, then pass it to JobManager.
JobManager - helps in creating the execution graph.
TaskManager - helps to run the tasks in separate slots in specified parallelism.


What is Storage / Streaming in Apache Flink?

Flink compose various capacity framework and can devour information from gushing frameworks.The following gushing framework, flink can persue compose information like:
FLume - Acts as an Aggregation Tool
HBase - Acts like NoSQL Databse in a Hadoop ecosystem
HDFS - Helps as Hadoop Distributed File System
Kafka - Helps in distributing messaging queue
RabbitMQ - Acts as a Messaging queue
S3 - It is Simple Storage Service from Amazon


What is DataSet API?

DataSet API helps us in enabling the client to actualize activities like a guide, channel, gathering and so on.It is utilized for appropriated preparing, it is an uncommon instance of stream preparing where we have a limited information source.They are regular programs that implement transformation on data sets like filtering, mapping, etc.
Data sets are created from sources like reading files, local collections, etc.All the results are returned through sinks, the execution can happen in a local JVM or on clusters of many machines.


What is the difference between Apache Hadoop and Apache Flink?

Apache Hadoop
  • Data processing engine is done in batches.
  • Data processing speed is slow.
  • Programming model used is MapReduce.
  • API level is low.
  • Graphic support is NA.
  • Machine Learning Support is by NA.
Apache Flink
  • Data processing engine is done in streams.
  • Data processing speed is fast.
  • Programming model used is Cyclic dataflows.
  • API level is high.
  • Graphic support is Gelly.
  • Machine Learning Support is by FlinkML.

What are the DSL Tool's in Flink?

There are 3 types of DSL tools:
  • Table - Its is implanted in DataSet and DataStream API.As it helps in performing impromptu investigation utilizing SQL like articulation languague for preparing bunch and social strem.
  • Gelly - It enables clients to run a set of tasks and change the procedure of the diagram.
  • FlinkML - It is composed in Scala.It gives instinctive APIs and proficient calculation to deal with applications.

Explain Complex Event Processing in Flink?

Flink CEP enables and analyses pattern on continuous streaming data, those event are real time having high throughput and low latency.It is mostly used on Sensor data which are very hard to process.It provides the ability in providing real time notifications and alerts in case the event pattern is complex.
This is how sample architecture works with CEP looks like :

Flink


What is the meaning of flow?

We can use synchronous Apache HttpClient client in sink:

public class SyncHttpSink extends RichSinkFunction<SessionItem> {
    private static final String URL = "http://httpbin.org/post";

    private CloseableHttpClient httpClient;

    private Histogram httpStatusesAccumulator;

    @Override
    public void open(Configuration parameters) throws Exception {
        httpClient = HttpClients.custom()
            .setKeepAliveStrategy(DefaultConnectionKeepAliveStrategy.INSTANCE)
            .build();

        httpStatusesAccumulator = getRuntimeContext().getHistogram("http_statuses");
    }

    @Override
    public void close() throws Exception {
        httpClient.close();

        httpStatusesAccumulator.resetLocal();
    }

    @Override
    public void invoke(SessionItem value) throws Exception {
        List<NameValuePair> params = new ArrayList<>();
        params.add(new BasicNameValuePair("session_uid", value.getSessionUid()));
        params.add(new BasicNameValuePair("traffic", String.valueOf(value.getTraffic())));
        params.add(new BasicNameValuePair("duration", String.valueOf(value.getDuration())));

        UrlEncodedFormEntity entity = new UrlEncodedFormEntity(params, Consts.UTF_8);

        HttpPost httpPost = new HttpPost(URL);
        httpPost.setEntity(entity);

        try(CloseableHttpResponse response = httpClient.execute(httpPost)) {
            int httpStatusCode = response.getStatusLine().getStatusCode();

            httpStatusesAccumulator.add(httpStatusCode);
        }
    }
}