Apache Flink DataStream API: Join Operation Example
This code snippet showcases a basic join operation in Apache Flink's DataStream API. Let's break it down:
-
DataSets Creation:
val input1: DataSet[(Int,String)] = env.fromElements((1,'spark'),(2,'flink'))
creates a DataSet namedinput1
containing two tuples. Each tuple consists of an integer and a string.val input2: DataSet[(String,Int)] = env.fromElements(('spark',1),('flink',2))
creates another DataSet namedinput2
with two tuples, each containing a string and an integer.
-
Join Operation:
val result = input1.join(input2).where(0).equalTo(1)
performs a join betweeninput1
andinput2
. Thewhere(0)
clause specifies that the first element (index 0) ofinput1
should be used for comparison. Similarly,equalTo(1)
indicates that the second element (index 1) ofinput2
will be compared. The join operation combines tuples from both DataSets where the specified elements match.
-
Result Display:
result.print()
prints the joined tuples to the console.
Output: The output of this code will be:
(1,spark,spark,1) (2,flink,flink,2)
Explanation of Output:
Each output tuple combines a tuple from input1
and a tuple from input2
based on the join condition. For example, the tuple (1, 'spark')
from input1
matches the tuple ('spark', 1)
from input2
because the 'spark' string in both tuples is at the respective positions defined by the join condition. The output tuple then combines both tuples, resulting in (1, 'spark', 'spark', 1).
This example demonstrates the fundamental concept of joining data in Apache Flink, enabling you to combine data from different sources based on specific conditions.
原文地址: http://www.cveoy.top/t/topic/oXdH 著作权归作者所有。请勿转载和采集!