首页
学习
活动
专区
圈层
工具
发布
社区首页 >问答首页 >CNN收敛到相同的精度,不管是什么超参数,这意味着什么?

CNN收敛到相同的精度,不管是什么超参数,这意味着什么?
EN

Stack Overflow用户
提问于 2017-09-19 12:11:55
回答 1查看 380关注 0票数 0

我编写了tensorflow代码,其基础是:

http://www.wildml.com/2015/12/implementing-a-cnn-for-text-classification-in-tensorflow/

但是使用来自GoogleNews word2vec 300维模型的预先计算的字嵌入。

我从UCML新闻聚合器数据集中创建了自己的数据,其中分析了新闻文章的内容并创建了自己的标签。

由于文章的大小,我使用TF-以色列国防军过滤掉前120个字的每篇文章,并嵌入到300个维度。

当我运行CNN时,我创建了一个不考虑参数的超参数,它收敛到一个小的总体精度,大约38%。

超参数发生了变化:

各种过滤器尺寸:

我尝试了一个1,2,3组合的过滤器,3,4,5,1,3,4

学习率:

我从很低到很高,很低并不收敛到38%,但是0.0001到0.4之间的任何东西都是这样。

批次大小:

尝试了5到100之间的许多范围。

权值和偏差初始化:

将重量设定在0.4到0.01之间。将偏差初始值设置在0到0.1之间。尝试使用conv2d权重的泽维尔初始化器。

数据集大小:

我只尝试了两个部分数据集,一个有15000个训练数据,另一个在5000份测试数据上。我总共有263 000个数据可供培训。无论是对15000个训练数据进行训练和评估,还是使用5000个测试数据作为训练数据(以节省测试时间),都没有准确性差异。

我在15000/ 5000的分割上成功地进行了分类,使用了一个带有BoW输入的前馈网络(93%的准确性),使用支持向量机的TF-IDF (92%),以及使用本地Bayes (91.5%)的TF-下手。所以我不认为这是数据。

这意味着什么?对于这个任务,这个模型仅仅是一个糟糕的模型吗?我的工作有错误吗?

我觉得我的do_eval函数在评估某一时代数据的准确性/损失时是不正确的:

代码语言:javascript
复制
        def do_eval(data_set,
                label_set,
                batch_size):
            """
            Runs one evaluation against the full epoch of data.
            data_set: The set of embeddings to eval
            label_set: the set of labels to eval
            """
            # And run one epoch of eval.

            true_count = 0  # Counts the number of correct predictions.
            steps_per_epoch = len(label_set) // batch_size
            num_examples = steps_per_epoch * batch_size
            totalLoss = 0
            # Need to compute eval accuracy
            for evalStep in xrange(steps_per_epoch):
                input_batch, label_batch = nextBatch(data_set, labels_set, batchSize)
                evalAcc, evalLoss = eval_step(input_batch, label_batch)
                true_count += evalAcc * batchSize
                totalLoss += evalLoss
            precision = float(true_count) / num_examples
            print('  Num examples: %d  Num correct: %d  Precision @ 1: %0.04f' % (num_examples, true_count, precision))
            print("Eval Loss: " + str(totalLoss))

整个模式如下:

代码语言:javascript
复制
class TextCNN(object):
"""
A CNN for text classification
Uses a convolutional, max-pooling and softmax layer.
"""

    def __init__(
            self, batchSize, numWords, num_classes,
            embedding_size, filter_sizes, num_filters):

        # Set place holders
        self.input_placeholder = tf.placeholder(tf.float32,[batchSize,numWords,embedding_size,1])
        self.labels = tf.placeholder(tf.int32, [batchSize,num_classes])
        self.pKeep = tf.placeholder(tf.float32)

        # Inference
        '''
        Ready to build conv layers followed by max pooling layers
        Each conv layer produces a different shaped output so need to loop over
        them and create a layer for each and then merge the results
        '''
        pooled_outputs = []
        for i, filter_size in enumerate(filter_sizes):
            with tf.name_scope("conv-maxpool-%s" % filter_size):
                # Convolution Layer
                filter_shape = [filter_size, embedding_size, 1, num_filters]

                # W: Filter matrix
                W = tf.Variable(tf.truncated_normal(filter_shape,stddev=0.01), name='W')
                b = tf.Variable(tf.constant(0.0,shape=[num_filters]),name="b")


                # Valid padding: Narrow convolution (no edge padded so filter slides over everything)
                # Output size = (input_size (numWords in this case) + 2 * padding (0 in this case) - filter_size) + 1
                conv = tf.nn.conv2d(
                    self.input_placeholder,
                    W,
                    strides=[1, 1, 1, 1],
                    padding="VALID",
                    name="conv")

                # Apply nonlinearity i.e add the bias to Wx + b
                # Where Wx is the conv layer above
                # Then run it through the activation function
                h = tf.nn.relu(tf.nn.bias_add(conv, b),name='relu')

                # Max-pooling over the outputs
                # Max-pool to control the output size
                # By taking only the best features determined by the filter
                # Ksize is the size of the window of the input tensor
                pooled = tf.nn.max_pool(
                    h,
                    ksize=[1, numWords - filter_size + 1, 1, 1],
                    strides=[1, 1, 1, 1],
                    padding='VALID',
                    name="pool")

                # Each pooled outputs a tensor of size
                # [batchSize, 1, 1, num_filters] where num_filters represents the
                # Number of features we wanted pooled
                pooled_outputs.append(pooled)

        # Combine all pooled features
        num_filters_total = num_filters * len(filter_sizes)
        # Concat the pool output along the 3rd (num_filters / feature size) dimension
        self.h_pool = tf.concat(pooled_outputs, 3)
        # Flatten
        self.h_pool_flat = tf.reshape(self.h_pool, [-1, num_filters_total])

        # Add drop out to regularize the learning curve / accuracy
        with tf.name_scope("dropout"):
            self.h_drop = tf.nn.dropout(self.h_pool_flat,self.pKeep)

        # Fully connected output layer
        with tf.name_scope("output"):
            W = tf.Variable(tf.truncated_normal([num_filters_total,num_classes],stddev=0.01),name="W")
            b = tf.Variable(tf.constant(0.0,shape=[num_classes]), name='b')
            self.logits = tf.nn.xw_plus_b(self.h_drop, W, b, name='logits')
            self.predictions = tf.argmax(self.logits, 1, name='predictions')

        # Loss
        with tf.name_scope("loss"):
            losses = tf.nn.softmax_cross_entropy_with_logits(labels=self.labels,logits=self.logits, name="xentropy")
            self.loss = tf.reduce_mean(losses)

        # Accuracy
        with tf.name_scope("accuracy"):
            correct_predictions = tf.equal(self.predictions, tf.argmax(self.labels,1))
            self.accuracy = tf.reduce_mean(tf.cast(correct_predictions, "float"), name="accuracy")

     ##################################################################################################################
# Running the training
# Define various parameters for network

batchSize = 100
numWords = 120
embedding_size = 300
num_classes = 4
filter_sizes = [3,4,5] # slide over a the number of words, i.e 3 words, 4     words etc...
num_filters = 126
maxSteps = 5000
initial_learning_rate = 0.001
dropoutRate = 1


data_set = np.load("/home/kevin/Documents/NSERC_2017/articles/classifyDataSet/TestSmaller_CNN_inputMat_0.npy")
labels_set = np.load("Test_NN_target_smaller.npy")


with tf.Graph().as_default():

    sess = tf.Session()

    with sess.as_default():
    cnn = TextCNN(batchSize=batchSize,
                  numWords=numWords,
                  num_classes=num_classes,
                  num_filters=num_filters,
                  embedding_size=embedding_size,
                  filter_sizes=filter_sizes)

        # Define training operation
        # Pick an optimizer, set it's learning rate, and tell it what to minimize

        global_step = tf.Variable(0,name='global_step', trainable=False)
        optimizer = tf.train.AdamOptimizer(initial_learning_rate)
        grads_and_vars = optimizer.compute_gradients(cnn.loss)
        train_op = optimizer.apply_gradients(grads_and_vars, global_step=global_step)

        # Summaries to save for tensor board

        # Set directory
        out_dir = "/home/kevin/Documents/NSERC_2017/articles/classifyDataSet/tf_logs/CNN_Embedding/"

        # Loss and accuracy summaries
        loss_summary = tf.summary.scalar("loss",cnn.loss)
        acc_summary = tf.summary.scalar("accuracy", cnn.accuracy)

        # Train summaries
        train_summary_op = tf.summary.merge([loss_summary,acc_summary])
        train_summary_dir = out_dir + "train/"
        train_summary_writer = tf.summary.FileWriter(train_summary_dir, sess.graph)

        # Test summaries
        test_summary_op = tf.summary.merge([loss_summary, acc_summary])
        test_summary_dir = out_dir + "test/"
        test_summary_write = tf.summary.FileWriter(test_summary_dir, sess.graph)

        # Init all variables

        init = tf.global_variables_initializer()
        sess.run(init)

    ############################################################################################

        def train_step(input_data, labels_data):
            '''
            Single training step
            :param input_data: input
            :param labels_data: labels to train to
            '''
            feed_dict = {
                cnn.input_placeholder: input_data,
                cnn.labels: labels_data,
                cnn.pKeep: dropoutRate
            }
            _, step, summaries, loss, accuracy = sess.run(
                [train_op, global_step, train_summary_op, cnn.loss, cnn.accuracy],
            feed_dict=feed_dict)
            train_summary_writer.add_summary(summaries, step)


    ###############################################################################################

        def eval_step(input_data, labels_data, writer=None):
            """
            Evaluates model on a test set
            Single step
            """
            feed_dict = {
            cnn.input_placeholder: input_data,
            cnn.labels: labels_data,
            cnn.pKeep: 1.0
            }

            step, summaries, loss, accuracy = sess.run(
            [global_step, test_summary_op, cnn.loss, cnn.accuracy],
            feed_dict)
            if writer:
                writer.add_summary(summaries, step)
        return accuracy, loss

    ###############################################################################

        def nextBatch(data_set, labels_set, batchSize):
            '''
            Get the next batch of data
            :param data_set: entire training or test data set
            :param labels_set: entire training or test label set
            :param batchSize: batch size
            :return: a batch of the data and it's corresponding labels
            '''
            # Generate random row indices for the documents
            rand_index = np.random.choice(data_set.shape[0], size=batchSize)

            # Grab the data to give to the feed dicts
            data_batch, labels_batch = data_set[rand_index, :, :], labels_set[rand_index, :]

            # Resize for tensorflow
            data_batch = data_batch.reshape([data_batch.shape[0],data_batch.shape[1],data_batch.shape[2],1])
            return data_batch, labels_batch
 ################################################################################

        def do_eval(data_set,
                label_set,
                batch_size):
            """
            Runs one evaluation against the full epoch of data.
            data_set: The set of embeddings to eval
            label_set: the set of labels to eval
            """
            # And run one epoch of eval.

            true_count = 0  # Counts the number of correct predictions.
            steps_per_epoch = len(label_set) // batch_size
            num_examples = steps_per_epoch * batch_size
            totalLoss = 0
            # Need to compute eval accuracy
            for evalStep in xrange(steps_per_epoch):
                input_batch, label_batch = nextBatch(data_set, labels_set, batchSize)
                evalAcc, evalLoss = eval_step(input_batch, label_batch)
                true_count += evalAcc * batchSize
                totalLoss += evalLoss
            precision = float(true_count) / num_examples
            print('  Num examples: %d  Num correct: %d  Precision @ 1: %0.04f' % (num_examples, true_count, precision))
            print("Eval Loss: " + str(totalLoss))

    ######################################################################################################
        # Training Loop

        for step in range(maxSteps):
            input_batch, label_batch = nextBatch(data_set,labels_set,batchSize)
            train_step(input_batch,label_batch)

        # Evaluate over the entire data set on last eval
            if step  % 100 == 0:
                print "On Step : " + str(step) + " of " + str(maxSteps)
                do_eval(data_set, labels_set,batchSize)

嵌入在模型之前完成:

代码语言:javascript
复制
def createInputEmbeddedMatrix(corpusPath, maxWords, svName):
    # Create a [docNum, Words per Art, Embedding Size] matrix to fill

    genDocsPath = "gen_docs_classifyData_smallerTest_TFIDF.npy"
    # corpus = "newsCorpus_word2vec_All_Corpus.mm"
    dictPath = 'news_word2vec_smallerDict.dict'
    tf_idf_path = "news_tfIdf_word2vec_All.tfidf_model"

    gen_docs = np.load(genDocsPath)
    dictionary = gensim.corpora.dictionary.Dictionary.load(dictPath)
    tf_idf = gensim.models.tfidfmodel.TfidfModel.load(tf_idf_path)

    corpus = corpora.MmCorpus(corpusPath)
    numOfDocs = len(corpus)
    embedding_size = 300

    id2embedding = np.load("smallerID2embedding.npy").item()

    # Need to process in batches as takes up a ton of memory

    step = 5000
    totalSteps = int(np.ceil(numOfDocs / step))

    for i in range(totalSteps):
        # inputMatrix = scipy.sparse.csr_matrix([step,maxWords,embedding_size])
        inputMatrix = np.zeros([step, maxWords, embedding_size])
        start = i * step
        end = start + step
        for docNum in range(start, end):
            print "On docNum " + str(docNum) + " of " + str(numOfDocs)
            # Extract the top N words
            topWords, wordVal = tf_idfTopWords(docNum, gen_docs, dictionary, tf_idf, maxWords)
            # doc = corpus[docNum]
            # Need to track word dex and doc dex seperate
            # Doc dex because of the batch processing
            wordDex = 0
            docDex = 0
            for wordID in wordVal:
                inputMatrix[docDex, wordDex, :] = id2embedding[wordID]
                wordDex += 1
            docDex += 1

        # Save the batch of input data
        # scipy.sparse.save_npz(svName + "_%d"  % i, inputMatrix)
        np.save(svName + "_%d.npy" % i, inputMatrix)


#####################################################################################
EN

回答 1

Stack Overflow用户

回答已采纳

发布于 2017-09-20 00:19:20

原来我的错误是在输入矩阵的创建中。

代码语言:javascript
复制
for i in range(totalSteps):
    # inputMatrix = scipy.sparse.csr_matrix([step,maxWords,embedding_size])
    inputMatrix = np.zeros([step, maxWords, embedding_size])
    start = i * step
    end = start + step
    for docNum in range(start, end):
        print "On docNum " + str(docNum) + " of " + str(numOfDocs)
        # Extract the top N words
        topWords, wordVal = tf_idfTopWords(docNum, gen_docs, dictionary, tf_idf, maxWords)
        # doc = corpus[docNum]
        # Need to track word dex and doc dex seperate
        # Doc dex because of the batch processing
        wordDex = 0
        docDex = 0
        for wordID in wordVal:
            inputMatrix[docDex, wordDex, :] = id2embedding[wordID]
            wordDex += 1
        docDex += 1

docDex不应该在内部循环的每一次迭代中重置为0,我实际上是覆盖了输入矩阵的第一行,因此其余的都是0。

票数 0
EN
页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持
原文链接:

https://stackoverflow.com/questions/46300155

复制
相关文章

相似问题

领券
问题归档专栏文章快讯文章归档关键词归档开发者手册归档开发者手册 Section 归档