Spark Streaming
Spark is internally a microbatch engine. At every interval, as and when the data arrives, a small microbatch gets created and this gives an illusion of a real-time streaming. However, internally it is getting processed in the form of micro-batches. To answer the following questions : - What is the Size of the Micro-batch? - When is the Micro-batch triggered? It is required to have an understanding of different types of Triggers. Types of Triggers - 1. Unspecified (Default) In this case, once the first micro-batch (say there are 2 files - File1 & File2 in the first micro-batch) processing is complete, the second micro-batch will be triggered provided there is some data that needs to be processed. The subsequent micro-batch gets triggered only when there are some files that need to be processed. Second micro-batch gets triggered when the File3 arrives. Suppose File4 has arrived and File3 is still in process, then the third micro-batch will be triggered soon after the 2nd micro-b...