Flink入门(2)--Flink常用API
Flink常用APIDataStream ApisourcetransformationsinkDataSource ApisourcetransformationsinkFlink Table Api&SQL APIDataStream Apisource基于文件 env.readTextFile 逐行读取文件基于Socketenv.socketTextStream基于集合env.fro
·
Flink常用API
DataStream Api
source
- 基于文件
env.readTextFile逐行读取文件 - 基于Socket
env.socketTextStream - 基于集合
env.fromCollection(collection) - 基于Flink提供的连接器,kafka
env.addSource(new FlinkKafkaConsumer<>(topic,deserializerScehma,props)) - 自定义数据源,继承sourceFucntion/PararelleSourceFunction/RichSourceFunction重写open.run.close,cancel等方法
transformation
- Map DataStream->DataStream
- FlatMap DataStream->DataStream
- Filter DataStream->DataStream
- KeyBy DataStream->KeyedStream 按照指定的key进行逻辑分区,相同的key会被分发到相同的任务中
- reduce KeyedStream->DataStream
- fold 有初始值合并 KeyedStream->DataStream
- aggredations sum等聚合操作
- Window KeyedStream->WindowStream
- Window All 所有记录都会汇集到同一个任务中
- Window Apply/Reduce/Flod/ aggregations windowedStream->DataStream
- Union DataStream->DataStream
- WindowJoin
- IntervalJoin
- Window CoGroup,Connect,CoMap,CoFlatMap
- Split DataStream->SplitDataStream 流分割
- select SplitStream->DataStream
- iterate DataStream->IterativeStream
sink
- writeAsText()
- print()
- 自定义输出 继承RichSinkFunction,重写open.invoke,close方法
DataSource Api
source
- 基于集合 fromCollection
- 基于文件 readTextFile
- 也可自定义
transformation
- Map
- FlatMap
- MapPartition
- Filter
- reduce
- ReduceGroup
- Aggregate
- Distinct
- Join
- OuterJoin
- CoGroup
- Cross
- Union
- Reblance
- Hash-Partition
- Range-Partition
- CustomPartitioning
- First-N
- partitionByHash
- sortPartition
sink
writeAsText(),writeAsCSV,print()等
Flink Table Api&SQL API
1.概述:Table API 和SQL实现了流批统一
2.table API编程
获取table执行环境
StreamTableEnvironment tenv = StreamTableEnvironment.create(env)
将流式数据转换成table
Table table = tenv.fromDataStream(data,$("name"),$("age"))
进行查询
table.select($("name"))
更多推荐




所有评论(0)