首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >用训练模型预测不同input_shape数据的类

用训练模型预测不同input_shape数据的类
EN

Stack Overflow用户
提问于 2019-11-15 17:10:54
回答 1查看 764关注 0票数 0

我有一个保存的模型,我对一个小的文本(消息)数据语料库进行了训练,并且我试图使用相同的模型来预测另一个语料库上的积极或消极情绪(即二进制分类)。我基于GOOGLE指南建立了NLP模型,您可以在这里查看它(如果您认为有用的话--我使用了选项A)。

我不断地得到一个输入形状错误,我知道这个错误意味着我必须重塑输入以适应预期的形状。然而,我想预测的数据并不是这么大。错误声明是:

代码语言:javascript
复制
ValueError: Error when checking input: expected dropout_8_input to have shape (519,) but got array with shape (184,)

模型期望形状(519,)的原因是因为在训练过程中,输入到第一个辍学层(以TfidfVectorized形式)的语料库是print(x_train.shape) #(454, 519)

我对ML并不陌生,但我认为,在优化模型之后,我试图预测的所有数据都应该与用于训练模型的数据的形状相同。有没有经历过类似的问题?在如何训练模型以便可以预测不同大小的输入方面,我是否遗漏了什么?或者,我是否误解了如何使用模型来进行类预测?

我是以下列职能为基础进行模特培训的:

代码语言:javascript
复制
from tensorflow.python.keras import models
from tensorflow.python.keras.layers import Dense, Dropout, Activation, Flatten
from tensorflow.python.keras.layers import Convolution2D, MaxPooling2D

def mlp_model(layers, units, dropout_rate, input_shape, num_classes):
    """Creates an instance of a multi-layer perceptron model.

    # Arguments
        layers: int, number of `Dense` layers in the model.
        units: int, output dimension of the layers.
        dropout_rate: float, percentage of input to drop at Dropout layers.
        input_shape: tuple, shape of input to the model.
        num_classes: int, number of output classes.

    # Returns
        An MLP model instance.
    """
    op_units, op_activation = _get_last_layer_units_and_activation(num_classes)
    model = models.Sequential()
    model.add(Dropout(rate=dropout_rate, input_shape=input_shape))

#     print(input_shape)

    for _ in range(layers-1):
        model.add(Dense(units=units, activation='relu'))
        model.add(Dropout(rate=dropout_rate))

    model.add(Dense(units=op_units, activation=op_activation))
    return mode





def train_ngram_model(data,
                      learning_rate=1e-3,
                      epochs=1000,
                      batch_size=128,
                      layers=2,
                      units=64,
                      dropout_rate=0.2):
    """Trains n-gram model on the given dataset.

    # Arguments
        data: tuples of training and test texts and labels.
        learning_rate: float, learning rate for training model.
        epochs: int, number of epochs.
        batch_size: int, number of samples per batch.
        layers: int, number of `Dense` layers in the model.
        units: int, output dimension of Dense layers in the model.
        dropout_rate: float: percentage of input to drop at Dropout layers.

    # Raises
        ValueError: If validation data has label values which were not seen
            in the training data.

    # Reference
        For tuning hyperparameters, please visit the following page for
        further explanation of each argument:
        https://developers.google.com/machine-learning/guides/text-classification/step-5
    """
    # Get the data.
    (train_texts, train_labels), (val_texts, val_labels) = data

    # Verify that validation labels are in the same range as training labels.
    num_classes = get_num_classes(train_labels)
    unexpected_labels = [v for v in val_labels if v not in range(num_classes)]
    if len(unexpected_labels):
        raise ValueError('Unexpected label values found in the validation set:'
                         ' {unexpected_labels}. Please make sure that the '
                         'labels in the validation set are in the same range '
                         'as training labels.'.format(
                             unexpected_labels=unexpected_labels))

    # Vectorize texts.
    x_train, x_val = ngram_vectorize(
        train_texts, train_labels, val_texts)

    # Create model instance.
    model = mlp_model(layers=layers,
                                  units=units,
                                  dropout_rate=dropout_rate,
                                  input_shape=x_train.shape[1:],
                                  num_classes=num_classes) 
                                # num_classes determine which activation fn to use

    # Compile model with learning parameters.
    if num_classes == 2:
        loss = 'binary_crossentropy'
    else:
        loss = 'sparse_categorical_crossentropy'
    optimizer = tf.keras.optimizers.Adam(lr=learning_rate)
    model.compile(optimizer=optimizer, loss=loss, metrics=['acc'])

    # Create callback for early stopping on validation loss. If the loss does
    # not decrease in two consecutive tries, stop training.
    callbacks = [tf.keras.callbacks.EarlyStopping(
        monitor='val_loss', patience=2)]

    # Train and validate model.
    history = model.fit(
            x_train,
            train_labels,
            epochs=epochs,
            callbacks=callbacks,
            validation_data=(x_val, val_labels),
            verbose=2,  # Logs once per epoch.
            batch_size=batch_size)

    # Print results.
    history = history.history
    print('Validation accuracy: {acc}, loss: {loss}'.format(
            acc=history['val_acc'][-1], loss=history['val_loss'][-1]))

    # Save model.
    model.save('MCTR2.h5')
    return history['val_acc'][-1], history['val_loss'][-1]

由此,我得出模型的体系结构如下:

代码语言:javascript
复制
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
dropout (Dropout)            (None, 519)               0         
_________________________________________________________________
dense (Dense)                (None, 64)                33280     
_________________________________________________________________
dropout_1 (Dropout)          (None, 64)                0         
_________________________________________________________________
dense_1 (Dense)              (None, 1)                 65        
=================================================================
Total params: 33,345
Trainable params: 33,345
Non-trainable params: 0
_________________________________________________________________
EN

回答 1

Stack Overflow用户

发布于 2019-11-15 17:24:52

要使维度在tensorflow中是可变的,则需要将它们指定为None

第一个维度是batch_size,这就是为什么通常总是None,但通常一批序列数据将具有形状(batch_size, sequence_length, num_features)。因此,单个序列通常是2D的,长度是可变的,但是每个“令牌”的特征数是固定的。

看起来你是在给你的模型一维矢量,Dense层有固定的输入形状。如果您想要对可变长度序列建模,则必须使用能够满足此要求的层(例如卷积、LSTM)来构建模型。

票数 1
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/58881719

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档