TensorFlow2.0 Guide 官方教程学习笔记2-Keras functional API

本笔记参照TensorFlow官方教程，主要是对‘Keras functional API’（以下简称KFA）教程内容翻译和内容结构编排原文链接:Keras Functional API目录1.导入TensorFlow库2.引言3.训练、评价、推断4.保存和序列化5.相同层图定义多个模型6.模型的可调用性7.控制复杂图层拓扑8.共享层9.提取和重用层图中的节点10.通过定制...

黄水生

863人浏览 · 2019-10-07 22:11:01

黄水生 · 2019-10-07 22:11:01 发布

本笔记参照TensorFlow官方教程，主要是对‘Keras functional API’（以下简称KFA）教程内容翻译和内容结构编排
原文链接:Keras Functional API

一、导入TensorFlow库

from __future__ import absolute_import, division, print_function, unicode_literals

try:
  # %tensorflow_version only exists in Colab.
  %tensorflow_version 2.x
except Exception:
  pass
import tensorflow as tf

tf.keras.backend.clear_session()  # For easy reset of notebook state.

二、引言

Keras Functional API 是一种创建模型的方法，它比序列模型更灵活，它可以处理具有非线性结构的模型，具有共享层的模型以及有多个输入输出的模型。它基于深度学习模型是层的有向无环图（DAG）概念，它是一组用于构建层图的工具
DAG（Directed Acyclic Graph）：百度百科：DAG
举例：创建一个3层的图

(input: 784-dimensional vectors)
       ↧
[Dense (64 units, relu activation)]
       ↧
[Dense (64 units, relu activation)]
       ↧
[Dense (10 units, softmax activation)]
       ↧
(output: probability distribution over 10 classes)

使用KFA创建一个模型，首先从创建一个输入节点开始：

from tensorflow import keras

inputs = keras.Input(shape=(784,))

在上面的代码中我们定义了输入数据的形状（shape）是一个784维度的向量。需要注意的是，batch_size通常是缺省的，只需要指定每个样本的形状。对于表示图像形状的输入，我们可以这样用：

img_inputs = keras.Input(shape=(32, 32, 3))

这样定义后，我们就从中可以得到变量‘inputs’的信息，包含：形状（shape）、提供给模型的输入数据的数据类型（dtype)。操作代码如下：

inputs.shape
imputs.dtype

通过调用这个input对象上的一个层，你可以在层图中创建一个新节点:

from tensorflow.keras import layers

dense = layers.Dense(64, activation='relu')
x = dense(inputs)

“layer call”操作就像从“inputs”到我们创建的这个层上画一个箭头。我们把输入“passing”到稠密层，得到x。

我们开始添加剩下的层图：

x = layers.Dense(64, activation='relu')(x)
outputs = layers.Dense(10, activation='softmax')(x)

到这里后，我们就可以通过制定层图里的输入输出创建模型了！

model = keras.Model(inputs=inputs, outputs=outputs)

综上，以下是定义一个模型的完整流程：

from tensorflow.keras import layers
inputs = keras.Input(shape=(784,), name='img')
x = layers.Dense(64, activation='relu')(inputs)
x = layers.Dense(64, activation='relu')(x)
outputs = layers.Dense(10, activation='softmax')(x)

model = keras.Model(inputs=inputs, outputs=outputs, name='mnist_model')

查看我们刚刚创建的模型长的什么样：

model.summary()

在这里插入图片描述
也可以将模型图像显示出来：

keras.utils.plot_model(model,'my_first_model.png')

在这里插入图片描述
也可以将每一层的输入输出形状显示成图像：

keras.utils.plot_model(model,'my_first_model_with_shape_info.png',show_shapes=True)

在这里插入图片描述
这个图形和我们编写的代码实际上是相同的。在代码版本中，连接箭头被简单地替换为call操作。对于深度学习模型来说，“层图”是一种非常直观的心理图像，而KFA是一种创建与这种心理图像密切对应的模型的方法。

三、训练，评价，推断

对于使用KFA构建的模型比如序列模型，训练、评价、推断的工作方式完全相同。下面是一个小demo，这里我们加载‘MNIST’图像数据，将其重塑为向量，在数据上匹配模型（同时在一个验证分割集上监控模型的性能），最后我们用测试数据集评价我们创建的模型：

(x_train, y_train), (x_test, y_test) = keras.datasets.mnist.load_data()
x_train = x_train.reshape(60000, 784).astype('float32') / 255
x_test = x_test.reshape(10000, 784).astype('float32') / 255

model.compile(loss='sparse_categorical_crossentropy',
              optimizer=keras.optimizers.RMSprop(),
              metrics=['accuracy'])
history = model.fit(x_train, y_train,
                    batch_size=64,
                    epochs=5,
                    validation_split=0.2)
test_scores = model.evaluate(x_test, y_test, verbose=2)
print('Test loss:', test_scores[0])
print('Test accuracy:', test_scores[1])

在这里插入图片描述

四、保存和序列化

对于使用KFA创建的模型比如序列模型，模型的保存和序列化的工作方式完全相同。保存函数模型的标准方法是调用model.save()将整个模型保存到单个文件中。以后可以从这个文件重新创建相同的模型，即使您不再能够访问创建模型的代码。
这个包含：
（1）模型的架构
（2）模型的权重值（从训练中学习得到的权重值）
（3）模型的训练配置（传递给‘compile’的内容），如果有的话
（4）优化器和它的状态，如果有的话（这个让我们可以从离开的地方重新训练）

model.save('path_to_my_model.h5')
del model
# Recreate the exact same model purely from the file:
model = keras.models.load_model('path_to_my_model.h5')

五、相同层图定义多个模型

在KFA中，模型是通过在层图中制定输入和输出来创建的，这就意味着一个层图可以用来生成多个模型。在下面的示例中，我们使用相同的层堆栈来实例化两个模型：将图像输入转换为16维向量的‘encoder’模型，以及用于训练的端到端的‘autoencoder’模型。

encoder_input = keras.Input(shape=(28, 28, 1), name='img')
x = layers.Conv2D(16, 3, activation='relu')(encoder_input)
x = layers.Conv2D(32, 3, activation='relu')(x)
x = layers.MaxPooling2D(3)(x)
x = layers.Conv2D(32, 3, activation='relu')(x)
x = layers.Conv2D(16, 3, activation='relu')(x)
encoder_output = layers.GlobalMaxPooling2D()(x)

encoder = keras.Model(encoder_input, encoder_output, name='encoder')
encoder.summary()

x = layers.Reshape((4, 4, 1))(encoder_output)
x = layers.Conv2DTranspose(16, 3, activation='relu')(x)
x = layers.Conv2DTranspose(32, 3, activation='relu')(x)
x = layers.UpSampling2D(3)(x)
x = layers.Conv2DTranspose(16, 3, activation='relu')(x)
decoder_output = layers.Conv2DTranspose(1, 3, activation='relu')(x)

autoencoder = keras.Model(encoder_input, decoder_output, name='autoencoder')
autoencoder.summary()

在这里插入图片描述
Model: “autoencoder”

Layer (type) Output Shape Param #

=================================================================

img (InputLayer) [(None, 28, 28, 1)] 0

conv2d_28 (Conv2D) (None, 26, 26, 16) 160

conv2d_29 (Conv2D) (None, 24, 24, 32) 4640

max_pooling2d_7 (MaxPooling2 (None, 8, 8, 32) 0

conv2d_30 (Conv2D) (None, 6, 6, 32) 9248

conv2d_31 (Conv2D) (None, 4, 4, 16) 4624

global_max_pooling2d_7 (Glob (None, 16) 0

reshape_7 (Reshape) (None, 4, 4, 1) 0

conv2d_transpose_28 (Conv2DT (None, 6, 6, 16) 160

conv2d_transpose_29 (Conv2DT (None, 8, 8, 32) 4640

up_sampling2d_7 (UpSampling2 (None, 24, 24, 32) 0

conv2d_transpose_30 (Conv2DT (None, 26, 26, 16) 4624

conv2d_transpose_31 (Conv2DT (None, 28, 28, 1) 145

=================================================================
Total params: 28,241
Trainable params: 28,241
Non-trainable params: 0

keras.utils.plot_model(encoder,'encoder.png',show_shapes=True)

encoder

keras.utils.plot_model(autoencoder,'autoencoder.png',show_shapes=True)

autoencoder
注意：解码架构需要跟编码架构严格对称，这样我们得到的输出才和输入的形状相同（28，28，1），Conv2D层的反向层是Conv2DTranspose层，MaxPooling2D层的反向是UpSampling2D层。下面的段落将会用到这个知识点。

六、所有的模型都可调用（跟层一样可调用）

我们可以将任何模型视为一个层，调用模型作为另一个层的输入或者输出。需要注意的是：当你调用一个模型的时候你不仅仅重新使用了模型的架构，还重新使用了它的权重。
让我们看看模型调用是如何运作的，下面是另外一个自动编码器的例子，它创建了一个编码器模型，一个解码器模型，并在两个调用中链接它们来获得自动编码器模型：

encoder_input = keras.Input(shape=(28, 28, 1), name='original_img')
x = layers.Conv2D(16, 3, activation='relu')(encoder_input)
x = layers.Conv2D(32, 3, activation='relu')(x)
x = layers.MaxPooling2D(3)(x)
x = layers.Conv2D(32, 3, activation='relu')(x)
x = layers.Conv2D(16, 3, activation='relu')(x)
encoder_output = layers.GlobalMaxPooling2D()(x)

encoder = keras.Model(encoder_input, encoder_output, name='encoder')
encoder.summary()

decoder_input = keras.Input(shape=(16,), name='encoded_img')
x = layers.Reshape((4, 4, 1))(decoder_input)
x = layers.Conv2DTranspose(16, 3, activation='relu')(x)
x = layers.Conv2DTranspose(32, 3, activation='relu')(x)
x = layers.UpSampling2D(3)(x)
x = layers.Conv2DTranspose(16, 3, activation='relu')(x)
decoder_output = layers.Conv2DTranspose(1, 3, activation='relu')(x)

decoder = keras.Model(decoder_input, decoder_output, name='decoder')
decoder.summary()

autoencoder_input = keras.Input(shape=(28, 28, 1), name='img')
encoded_img = encoder(autoencoder_input)
decoded_img = decoder(encoded_img)
autoencoder = keras.Model(autoencoder_input, decoded_img, name='autoencoder')
autoencoder.summary()

运行结果：
Model: “encoder”

Layer (type) Output Shape Param #

=================================================================
original_img (InputLayer) [(None, 28, 28, 1)] 0

conv2d_72 (Conv2D) (None, 26, 26, 16) 160

conv2d_73 (Conv2D) (None, 24, 24, 32) 4640

max_pooling2d_18 (MaxPooling (None, 8, 8, 32) 0

conv2d_74 (Conv2D) (None, 6, 6, 32) 9248

conv2d_75 (Conv2D) (None, 4, 4, 16) 4624

global_max_pooling2d_18 (Glo (None, 16) 0

=================================================================
Total params: 18,672
Trainable params: 18,672
Non-trainable params: 0

Model: “decoder”

Layer (type) Output Shape Param #

=================================================================
encoded_img (InputLayer) [(None, 16)] 0

reshape_17 (Reshape) (None, 4, 4, 1) 0

conv2d_transpose_68 (Conv2DT (None, 6, 6, 16) 160

conv2d_transpose_69 (Conv2DT (None, 8, 8, 32) 4640

up_sampling2d_17 (UpSampling (None, 24, 24, 32) 0

conv2d_transpose_70 (Conv2DT (None, 26, 26, 16) 4624

conv2d_transpose_71 (Conv2DT (None, 28, 28, 1) 145

=================================================================
Total params: 9,569
Trainable params: 9,569
Non-trainable params: 0

Model: “autoencoder”

Layer (type) Output Shape Param #

=================================================================
img (InputLayer) [(None, 28, 28, 1)] 0

encoder (Model) (None, 16) 18672

decoder (Model) (None, 28, 28, 1) 9569

=================================================================
Total params: 28,241
Trainable params: 28,241
Non-trainable params: 0

因此我们可以看到，模型之间可以嵌套（nest），因为模型可以被视作‘层’，那一个模型就能够包含一个子模型。关于模型嵌套的通用案例是：集成（ensembling）。下面是如何将一组模型集成到一个单独的模型中，对它们的预测进行平均的例子：

def get_model():
  inputs = keras.Input(shape=(128,))
  outputs = layers.Dense(1, activation='sigmoid')(inputs)
  return keras.Model(inputs, outputs)

model1 = get_model()
model2 = get_model()
model3 = get_model()

inputs = keras.Input(shape=(128,))
y1 = model1(inputs)
y2 = model2(inputs)
y3 = model3(inputs)
outputs = layers.average([y1, y2, y3])
ensemble_model = keras.Model(inputs=inputs, outputs=outputs)
keras.utils.plot_model(ensemble_model,'ensemble_model.png',show_shapes=True)

在这里插入图片描述

七、控制复杂图层拓扑

7.1 多输入多输出的模型

KFA可以很简单的控制多输入多输出模型，但‘Sequential API’不能。下面是个简单的例子：假设我们正在构建一个系统，按照优先级对自定义发行票据进行排序，并将它们路由到正确的部门。
经分析，模型将有3个输入：
（1）票据标题（文本输入）
（2）票据文本主体（文本输入）
（3）用户添加的任何标记（分类输入）
模型将有两个输出：
（1）优先级分数从0到1（纯量sigmoid输出）
（2）票据归属的部门（softmax 输出）

num_tags = 12  # Number of unique issue tags
num_words = 10000  # Size of vocabulary obtained when preprocessing text data
num_departments = 4  # Number of departments for predictions

title_input = keras.Input(shape=(None,), name='title')  # Variable-length sequence of ints
body_input = keras.Input(shape=(None,), name='body')  # Variable-length sequence of ints
tags_input = keras.Input(shape=(num_tags,), name='tags')  # Binary vectors of size `num_tags`

# Embed each word in the title into a 64-dimensional vector
title_features = layers.Embedding(num_words, 64)(title_input)
# Embed each word in the text into a 64-dimensional vector
body_features = layers.Embedding(num_words, 64)(body_input)

# Reduce sequence of embedded words in the title into a single 128-dimensional vector
title_features = layers.LSTM(128)(title_features)
# Reduce sequence of embedded words in the body into a single 32-dimensional vector
body_features = layers.LSTM(32)(body_features)

# Merge all available features into a single large vector via concatenation
x = layers.concatenate([title_features, body_features, tags_input])

# Stick a logistic regression for priority prediction on top of the features
priority_pred = layers.Dense(1, activation='sigmoid', name='priority')(x)
# Stick a department classifier on top of the features
department_pred = layers.Dense(num_departments, activation='softmax', name='department')(x)

# Instantiate an end-to-end model predicting both priority and department
model = keras.Model(inputs=[title_input, body_input, tags_input],
                    outputs=[priority_pred, department_pred])

将模型显示出来：

keras.utils.plot_model(model, 'multi_input_and_output_model.png', show_shapes=True)

在这里插入图片描述
当编译这个模型时，我们可以分配不同的代价函数给每一个输出，甚至可以分配不同的权重给每个代价函数，以调整他们对总训练损失的贡献。

model.compile(optimizer=keras.optimizers.RMSprop(1e-3),
              loss=['binary_crossentropy', 'categorical_crossentropy'],
              loss_weights=[1., 0.2])

既然我们可以对输出命名，同样我们可以对代价函数进行命名：

model.compile(optimizer=keras.optimizers.RMSprop(1e-3),
              loss={'priority': 'binary_crossentropy',
                    'department': 'categorical_crossentropy'},
              loss_weights=[1., 0.2])

现在我们可以通过传递一系列由输入变量和目标组成的Numpy数组给模型进行训练：

import numpy as np

# Dummy input data
title_data = np.random.randint(num_words, size=(1280, 10))
body_data = np.random.randint(num_words, size=(1280, 100))
tags_data = np.random.randint(2, size=(1280, num_tags)).astype('float32')
# Dummy target data
priority_targets = np.random.random(size=(1280, 1))
dept_targets = np.random.randint(2, size=(1280, num_departments))

model.fit({'title': title_data, 'body': body_data, 'tags': tags_data},
          {'priority': priority_targets, 'department': dept_targets},
          epochs=2,
          batch_size=32)

当使用数据集对象调用fit时，它应该产生一个元组列表([title_data, body_data, tags_data]， [priority_targets, dept_targets])或一个元组字典({‘title’: title_data， ‘body’: body_data， ‘tags’: tags_data}， {‘priority’: priority_targets， ‘department’: dept_targets})。

7.2 A toy resnet model

除了具有多个输入和输出的模型外，KFA还使操作非线性连接拓扑变得很容易，也就是说，在模型中层之间不是按顺序连接的。这也不能使用‘Sequential API’来处理。常见的用例有：剩余连接（residual connections），下面就用CIFAR10构建一个玩具ResNet模型来：

inputs = keras.Input(shape=(32, 32, 3), name='img')
x = layers.Conv2D(32, 3, activation='relu')(inputs)
x = layers.Conv2D(64, 3, activation='relu')(x)
block_1_output = layers.MaxPooling2D(3)(x)

x = layers.Conv2D(64, 3, activation='relu', padding='same')(block_1_output)
x = layers.Conv2D(64, 3, activation='relu', padding='same')(x)
block_2_output = layers.add([x, block_1_output])

x = layers.Conv2D(64, 3, activation='relu', padding='same')(block_2_output)
x = layers.Conv2D(64, 3, activation='relu', padding='same')(x)
block_3_output = layers.add([x, block_2_output])

x = layers.Conv2D(64, 3, activation='relu')(block_3_output)
x = layers.GlobalAveragePooling2D()(x)
x = layers.Dense(256, activation='relu')(x)
x = layers.Dropout(0.5)(x)
outputs = layers.Dense(10, activation='softmax')(x)

model = keras.Model(inputs, outputs, name='toy_resnet')
model.summary()

执行结果：
Model: “toy_resnet”

Layer (type) Output Shape Param # Connected to

==================================================================================================
img (InputLayer) [(None, 32, 32, 3)] 0

conv2d_76 (Conv2D) (None, 30, 30, 32) 896 img[0][0]

conv2d_77 (Conv2D) (None, 28, 28, 64) 18496 conv2d_76[0][0]

max_pooling2d_19 (MaxPooling2D) (None, 9, 9, 64) 0 conv2d_77[0][0]

conv2d_78 (Conv2D) (None, 9, 9, 64) 36928 max_pooling2d_19[0][0]

conv2d_79 (Conv2D) (None, 9, 9, 64) 36928 conv2d_78[0][0]

add (Add) (None, 9, 9, 64) 0 conv2d_79[0][0]
max_pooling2d_19[0][0]

conv2d_80 (Conv2D) (None, 9, 9, 64) 36928 add[0][0]

conv2d_81 (Conv2D) (None, 9, 9, 64) 36928 conv2d_80[0][0]

add_1 (Add) (None, 9, 9, 64) 0 conv2d_81[0][0]
add[0][0]

conv2d_82 (Conv2D) (None, 7, 7, 64) 36928 add_1[0][0]

global_average_pooling2d (Globa (None, 64) 0 conv2d_82[0][0]

dense_10 (Dense) (None, 256) 16640 global_average_pooling2d[0][0]

dropout (Dropout) (None, 256) 0 dense_10[0][0]

dense_11 (Dense) (None, 10) 2570 dropout[0][0]

==================================================================================================
Total params: 223,242
Trainable params: 223,242
Non-trainable params: 0

把模型用图像显示出来：

keras.utils.plot_model(model, 'mini_resnet.png', show_shapes=True)

在这里插入图片描述
下面就开始训练它：

(x_train, y_train), (x_test, y_test) = keras.datasets.cifar10.load_data()
x_train = x_train.astype('float32') / 255.
x_test = x_test.astype('float32') / 255.
y_train = keras.utils.to_categorical(y_train, 10)
y_test = keras.utils.to_categorical(y_test, 10)

model.compile(optimizer=keras.optimizers.RMSprop(1e-3),
              loss='categorical_crossentropy',
              metrics=['acc'])
model.fit(x_train, y_train,
          batch_size=64,
          epochs=1,
          validation_split=0.2)

八、共享层

KFA的另一个很好的用途是使用共享层的模型。共享层是在同一个模型中多次重用的层实例:它们学习与图层中的多条路径对应的特性。共享层通常用于对来自类似空间的输入进行编码(例如，具有类似词汇表的两个不同文本片段)，因为它们支持跨这些不同输入共享信息，并且可以在更少的数据上训练这样的模型。如果给定的单词出现在某个输入中，那么将有利于处理通过共享层的所有输入。要在函数API中共享一个层，只需多次调用相同的层实例。例如，下面是一个嵌入层共享两个不同的文本输入:

# Embedding for 1000 unique words mapped to 128-dimensional vectors
shared_embedding = layers.Embedding(1000, 128)

# Variable-length sequence of integers
text_input_a = keras.Input(shape=(None,), dtype='int32')

# Variable-length sequence of integers
text_input_b = keras.Input(shape=(None,), dtype='int32')

# We reuse the same layer to encode both inputs
encoded_input_a = shared_embedding(text_input_a)
encoded_input_b = shared_embedding(text_input_b)

九、提取和重用层图中的节点

因为KFA中操作的层图是一个静态数据结构，所以可以访问和检查它。例如，这就是我们如何将功能模型绘制为图像的方法。这也意味着我们可以访问中间层(图中的“节点”)的激活，并在其他地方重用它们。这对于特征提取非常有用。下面是一个在ImageNet上预先训练权重的VGG19模型:

from tensorflow.keras.applications import VGG19

vgg19 = VGG19()

而且，这些是模型的中间层激活函数，通过查询图层的数据结构获得的

features_list = [layer.output for layer in vgg19.layers]

我们可以使用这些特征去创建一个分离特征模型，它返回中间层激活的值，这些我们只需3行代码就可以实现：

feat_extraction_model = keras.Model(inputs=vgg19.input, outputs=features_list)

img = np.random.random((1, 224, 224, 3)).astype('float32')
extracted_features = feat_extraction_model(img)

十、通过定制化层来扩展API

tf.keras有许多已经建立好的层，比如：
（1）卷积层： Conv1D, Conv2D, Conv3D, Conv2DTranspose等等
（2）池化层（Pooling Layer）：MaxPooling1D, MaxPooling2D, MaxPooling3D, AveragePooling1D等等
（3）RNN层：GRU, LSTM, ConvLSTM2D等等
（4）BatchNormalization, Dropout, Embedding等等
如果没有找到你所需要的层，我们可以用KFA自己定义层。所有的层继承（subclass）'Layer’类和实现：（原文：All layers subclass the Layer class and implement）
调用方法，它指定层所做的计算。
构建方法，它创建了层的权重(注意这只是一个样式约定;你也可以在_init__中创建权重)。

class CustomDense(layers.Layer):

  def __init__(self, units=32):
    super(CustomDense, self).__init__()
    self.units = units

  def build(self, input_shape):
    self.w = self.add_weight(shape=(input_shape[-1], self.units),
                             initializer='random_normal',
                             trainable=True)
    self.b = self.add_weight(shape=(self.units,),
                             initializer='random_normal',
                             trainable=True)

  def call(self, inputs):
    return tf.matmul(inputs, self.w) + self.b

inputs = keras.Input((4,))
outputs = CustomDense(10)(inputs)

model = keras.Model(inputs, outputs)

如果你想定制化层来支持序列化，你也可以定义一个‘get_config’方法，它返回层实例的构建属性：

class CustomDense(layers.Layer):

  def __init__(self, units=32):
    super(CustomDense, self).__init__()
    self.units = units

  def build(self, input_shape):
    self.w = self.add_weight(shape=(input_shape[-1], self.units),
                             initializer='random_normal',
                             trainable=True)
    self.b = self.add_weight(shape=(self.units,),
                             initializer='random_normal',
                             trainable=True)

  def call(self, inputs):
    return tf.matmul(inputs, self.w) + self.b

  def get_config(self):
    return {'units': self.units}


inputs = keras.Input((4,))
outputs = CustomDense(10)(inputs)

model = keras.Model(inputs, outputs)
config = model.get_config()

new_model = keras.Model.from_config(
    config, custom_objects={'CustomDense': CustomDense})

另外，我们也可以执行类方法‘from_config（cls,config）’，它是用来通过给定它的配置字典来重新创建一个层实例的。‘from_config’默认执行代码是：

def from_config(cls, config):
  return cls(**config)

十一、什么时候用KFA？

如何决定是使用KFA创建新模型，还是直接子类化模型类?一般来说，函数式API是更高级的、更容易和更安全的，并且有许多子类化模型不支持的特性。然而，模型子类化为您提供了更大的灵活性，当创建模型时，这些模型不容易表示为层的有向无环图(例如，您不能使用函数API实现树- rnn，您将不得不直接子类化模型)。
KFA优点：
（1）变量少
（2）定义模型的过程中就在验证模型的有效性
（3）用KFA定义的模型具有可视性和可审查
（4）用KFA定义的模型可以被序列化或克隆
KFA缺点：
（1）它不支持动态结构
（2）有时候，你需要从头开始写

十二、混合和匹配（Mix-and-matching）不同的API样式

重要的是，在函数API和模型子类之间进行选择并不是一个二元决策，它将您限制在一个模型类别中。所有tf.keras API的模型可以互相交互，无论它们是顺序模型、功能模型还是从零开始编写的子类模型/层。你总是可以使用功能模型或顺序模型作为子类模型/层的一部分:

units = 32
timesteps = 10
input_dim = 5

# Define a Functional model
inputs = keras.Input((None, units))
x = layers.GlobalAveragePooling1D()(inputs)
outputs = layers.Dense(1, activation='sigmoid')(x)
model = keras.Model(inputs, outputs)


class CustomRNN(layers.Layer):

  def __init__(self):
    super(CustomRNN, self).__init__()
    self.units = units
    self.projection_1 = layers.Dense(units=units, activation='tanh')
    self.projection_2 = layers.Dense(units=units, activation='tanh')
    # Our previously-defined Functional model
    self.classifier = model

  def call(self, inputs):
    outputs = []
    state = tf.zeros(shape=(inputs.shape[0], self.units))
    for t in range(inputs.shape[1]):
      x = inputs[:, t, :]
      h = self.projection_1(x)
      y = h + self.projection_2(state)
      state = y
      outputs.append(y)
    features = tf.stack(outputs, axis=1)
    print(features.shape)
    return self.classifier(features)

rnn_model = CustomRNN()
_ = rnn_model(tf.zeros((1, timesteps, input_dim)))

相反，你可以使用KFA中的任何子类层或模型，只要它实现的调用方法遵循以下模式之一:
（1）call(self, inputs, kwargs)，其中输入是张量或张量的嵌套结构(例如张量列表)，而kwargs是非张量参数(非输入)
（2）call(self, inputs, training=None, **kwargs)，其中训练是一个布尔值，指示该层是否应该在训练模式和推理模式下运行。
（3）call(self, inputs, mask=None, **kwargs)，其中掩码是一个布尔掩码张量(例如，对于RNNs很有用)。
（4）call(self, inputs, training=None, mask=None, **kwargs)，当然，您可以同时具有掩蔽和特定于训练的行为。
此外，如果您在自定义层或模型上实现get_config方法，使用它创建的功能模型仍然是可序列化和可克隆的。下面是一个快速的例子，我们使用一个自定义的RNN从头开始写在一个功能模型:

units = 32
timesteps = 10
input_dim = 5
batch_size = 16


class CustomRNN(layers.Layer):

  def __init__(self):
    super(CustomRNN, self).__init__()
    self.units = units
    self.projection_1 = layers.Dense(units=units, activation='tanh')
    self.projection_2 = layers.Dense(units=units, activation='tanh')
    self.classifier = layers.Dense(1, activation='sigmoid')

  def call(self, inputs):
    outputs = []
    state = tf.zeros(shape=(inputs.shape[0], self.units))
    for t in range(inputs.shape[1]):
      x = inputs[:, t, :]
      h = self.projection_1(x)
      y = h + self.projection_2(state)
      state = y
      outputs.append(y)
    features = tf.stack(outputs, axis=1)
    return self.classifier(features)

# Note that we specify a static batch size for the inputs with the `batch_shape`
# arg, because the inner computation of `CustomRNN` requires a static batch size
# (when we create the `state` zeros tensor).
inputs = keras.Input(batch_shape=(batch_size, timesteps, input_dim))
x = layers.Conv1D(32, 3)(inputs)
outputs = CustomRNN()(x)

model = keras.Model(inputs, outputs)

rnn_model = CustomRNN()
_ = rnn_model(tf.zeros((1, 10, 5)))