
Update!H5支持摘要折叠,体验更佳!点击阅读原文访问arxivdaily.com,涵盖CS|物理|数学|经济|统计|金融|生物|电气领域,更有搜索、收藏等功能!
cs.AI人工智能,共计37篇
【1】 Communication-Efficient Federated Learning via Robust Distributed Mean Estimation 标题:基于稳健分布均值估计的通信高效联邦学习 链接:https://arxiv.org/abs/2108.08842
作者:Shay Vargaftik,Ran Ben Basat,Amit Portnoy,Gal Mendelson,Yaniv Ben-Itzhak,Michael Mitzenmacher 机构:VMware Research, University College London, Ben-Gurion University, Stanford University, Harvard University 备注:A technical report that extends arXiv:2105.08339 摘要:联合学习通常依赖于分布式(小批量)SGD等算法,其中多个客户端计算其梯度,并将其发送给中心协调器,以平均和更新模型。为了优化传输时间和训练过程的可伸缩性,客户端通常使用有损压缩来减少消息大小。DRIVE是一种最新的算法,它使用每个坐标一位来压缩梯度(具有一些较低的阶开销)。在本技术报告中,我们概括了DRIVE以支持任何带宽限制,并将其扩展以支持异构客户端资源,使其对数据包丢失具有鲁棒性。 摘要:Federated learning commonly relies on algorithms such as distributed (mini-batch) SGD, where multiple clients compute their gradients and send them to a central coordinator for averaging and updating the model. To optimize the transmission time and the scalability of the training process, clients often use lossy compression to reduce the message sizes. DRIVE is a recent state of the art algorithm that compresses gradients using one bit per coordinate (with some lower-order overhead). In this technical report, we generalize DRIVE to support any bandwidth constraint as well as extend it to support heterogeneous client resources and make it robust to packet loss.
【2】 PoinTr: Diverse Point Cloud Completion with Geometry-Aware Transformers 标题:PoinTr:使用几何感知转换器完成不同的点云 链接:https://arxiv.org/abs/2108.08839
作者:Xumin Yu,Yongming Rao,Ziyi Wang,Zuyan Liu,Jiwen Lu,Jie Zhou 机构:Department of Automation, Tsinghua University, China, State Key Lab of Intelligent Technologies and Systems, China, Beijing National Research Center for Information Science and Technology, China 备注:Accepted to ICCV 2021 (Oral Presentation) 摘要:由于传感器分辨率、单视点和遮挡的限制,在实际应用中捕获的点云通常是不完整的。因此,从局部点云中恢复完整点云成为许多实际应用中不可缺少的任务。在本文中,我们提出了一种新的方法,将点云完成转化为集对集转换问题,并设计了一种新的模型,称为PoinTr,该模型采用transformer编码器-解码器体系结构来完成点云完成。通过将点云表示为一组具有位置嵌入的无序点组,我们将点云转换为一系列点代理,并使用转换器生成点云。为了便于Transformer更好地利用点云三维几何结构的感应偏差,我们进一步设计了一个几何感知块,明确地建模局部几何关系。transformers的迁移使我们的模型能够更好地学习结构知识,并保留详细信息以完成点云计算。此外,我们还提出了两个更具挑战性的基准,其中包含更多不同的不完整点云,可以更好地反映现实世界的场景,以促进未来的研究。实验结果表明,无论是在新的基准测试还是在现有的基准测试中,我们的方法都大大优于现有的方法。代码可在https://github.com/yuxumin/PoinTr 摘要:Point clouds captured in real-world applications are often incomplete due to the limited sensor resolution, single viewpoint, and occlusion. Therefore, recovering the complete point clouds from partial ones becomes an indispensable task in many practical applications. In this paper, we present a new method that reformulates point cloud completion as a set-to-set translation problem and design a new model, called PoinTr that adopts a transformer encoder-decoder architecture for point cloud completion. By representing the point cloud as a set of unordered groups of points with position embeddings, we convert the point cloud to a sequence of point proxies and employ the transformers for point cloud generation. To facilitate transformers to better leverage the inductive bias about 3D geometric structures of point clouds, we further devise a geometry-aware block that models the local geometric relationships explicitly. The migration of transformers enables our model to better learn structural knowledge and preserve detailed information for point cloud completion. Furthermore, we propose two more challenging benchmarks with more diverse incomplete point clouds that can better reflect the real-world scenarios to promote future research. Experimental results show that our method outperforms state-of-the-art methods by a large margin on both the new benchmarks and the existing ones. Code is available at https://github.com/yuxumin/PoinTr
【3】 Click to Move: Controlling Video Generation with Sparse Motion 标题:点击移动:使用稀疏运动控制视频生成 链接:https://arxiv.org/abs/2108.08815
作者:Pierfrancesco Ardino,Marco De Nadai,Bruno Lepri,Elisa Ricci,Stéphane Lathuilière 机构:University of Trento, Fondazione Bruno Kessler, LTCI, T´el´ecom Paris, Institut Polytechnique de Paris 备注:Accepted by International Conference on Computer Vision (ICCV 2021) 摘要:本文介绍了一种新的视频生成框架——点击移动(C2M),用户可以通过鼠标点击指定场景中关键对象的简单对象轨迹来控制合成视频的运动。我们的模型接收一个初始帧、其相应的分割图和编码用户输入的稀疏运动矢量作为输入。它从给定的帧开始输出一个看似合理的视频序列,并且运动与用户输入一致。值得注意的是,我们提出的deep体系结构结合了一个图形卷积网络(GCN),以整体方式对场景中所有对象的运动进行建模,并有效地结合了稀疏的用户运动信息和图像特征。实验结果表明,C2M在两个公开可用的数据集上优于现有方法,从而证明了我们的GCN框架在建模对象交互方面的有效性。源代码可在https://github.com/PierfrancescoArdino/C2M. 摘要:This paper introduces Click to Move (C2M), a novel framework for video generation where the user can control the motion of the synthesized video through mouse clicks specifying simple object trajectories of the key objects in the scene. Our model receives as input an initial frame, its corresponding segmentation map and the sparse motion vectors encoding the input provided by the user. It outputs a plausible video sequence starting from the given frame and with a motion that is consistent with user input. Notably, our proposed deep architecture incorporates a Graph Convolution Network (GCN) modelling the movements of all the objects in the scene in a holistic manner and effectively combining the sparse user motion information and image features. Experimental results show that C2M outperforms existing methods on two publicly available datasets, thus demonstrating the effectiveness of our GCN framework at modelling object interactions. The source code is publicly available at https://github.com/PierfrancescoArdino/C2M.
【4】 Do Vision Transformers See Like Convolutional Neural Networks? 标题:视觉Transformer看起来像卷积神经网络吗? 链接:https://arxiv.org/abs/2108.08810
作者:Maithra Raghu,Thomas Unterthiner,Simon Kornblith,Chiyuan Zhang,Alexey Dosovitskiy 机构:Dosovitskiy, Google Research, Brain Team 摘要:迄今为止,卷积神经网络(CNN)已成为视觉数据的事实模型。最近的工作表明,(视觉)变换器模型(ViT)可以在图像分类任务上实现相当甚至更高的性能。这就提出了一个中心问题:视觉转换器是如何解决这些任务的?他们是像卷积网络一样工作,还是学习完全不同的视觉表现?通过分析ViT和CNN在图像分类基准上的内部表示结构,我们发现这两种体系结构之间存在显著差异,例如ViT在所有层上都有更统一的表示。我们探索了这些差异是如何产生的,发现了自我注意所起的关键作用,自我注意使全局信息得以早期聚合,以及ViT残余连接,它强烈地将特征从较低层传播到较高层。我们研究了空间定位的影响,证明VIT成功地保留了输入的空间信息,不同分类方法的效果显著。最后,我们研究了(预训练)数据集规模对中间特征和迁移学习的影响,最后讨论了与新体系结构(如MLP混合器)的连接。 摘要:Convolutional neural networks (CNNs) have so far been the de-facto model for visual data. Recent work has shown that (Vision) Transformer models (ViT) can achieve comparable or even superior performance on image classification tasks. This raises a central question: how are Vision Transformers solving these tasks? Are they acting like convolutional networks, or learning entirely different visual representations? Analyzing the internal representation structure of ViTs and CNNs on image classification benchmarks, we find striking differences between the two architectures, such as ViT having more uniform representations across all layers. We explore how these differences arise, finding crucial roles played by self-attention, which enables early aggregation of global information, and ViT residual connections, which strongly propagate features from lower to higher layers. We study the ramifications for spatial localization, demonstrating ViTs successfully preserve input spatial information, with noticeable effects from different classification methods. Finally, we study the effect of (pretraining) dataset scale on intermediate features and transfer learning, and conclude with a discussion on connections to new architectures such as the MLP-Mixer.
【5】 EqGNN: Equalized Node Opportunity in Graphs 标题:EqGNN:图中均衡的节点机会 链接:https://arxiv.org/abs/2108.08800
作者:Uriel Singer,Kira Radinsky 机构:Technion, Israel Institute of Technology, Haifa, Israel 备注:10 pages, 3 figures, 4 tables, 2 algorithms 摘要:图形神经网络(GNNs)已被广泛用于图形中的监督学习任务,以达到最先进的结果。然而,很少有人致力于创建无偏见的GNN,即分类与敏感属性(如种族或性别)不相关的GNN。有些人忽略了敏感属性,或优化了公平性的统计平价标准。然而,研究表明,这两种方法都不能保证公平性,反而会削弱预测任务的效用。在这项工作中,我们提出了一个GNN框架,该框架允许优化均衡赔率公平标准概念的表示。该体系结构由三部分组成:(1)预测效用类的GNN分类器,(2)学习给定标签的节点敏感属性分布的采样器。它生成的样本被送入(3)鉴别器,该鉴别器使用一种新的“置换损失”函数来区分真实和采样的敏感属性。使用这些组件,我们训练一个模型忽略敏感属性的相关信息,只考虑其标签。据我们所知,我们是第一个为均等赔率标准优化GNN的公司。我们在几个图形数据集和敏感属性上评估了我们的分类器,并表明我们的算法达到了最先进的结果。 摘要:Graph neural networks (GNNs), has been widely used for supervised learning tasks in graphs reaching state-of-the-art results. However, little work was dedicated to creating unbiased GNNs, i.e., where the classification is uncorrelated with sensitive attributes, such as race or gender. Some ignore the sensitive attributes or optimize for the criteria of statistical parity for fairness. However, it has been shown that neither approaches ensure fairness, but rather cripple the utility of the prediction task. In this work, we present a GNN framework that allows optimizing representations for the notion of Equalized Odds fairness criteria. The architecture is composed of three components: (1) a GNN classifier predicting the utility class, (2) a sampler learning the distribution of the sensitive attributes of the nodes given their labels. It generates samples fed into a (3) discriminator that discriminates between true and sampled sensitive attributes using a novel "permutation loss" function. Using these components, we train a model to neglect information regarding the sensitive attribute only with respect to its label. To the best of our knowledge, we are the first to optimize GNNs for the equalized odds criteria. We evaluate our classifier over several graph datasets and sensitive attributes and show our algorithm reaches state-of-the-art results.
【6】 Provably Efficient Generative Adversarial Imitation Learning for Online and Offline Setting with Linear Function Approximation 标题:基于线性函数逼近的在线和离线环境下可证明有效的生成性对抗性模仿学习 链接:https://arxiv.org/abs/2108.08765
作者:Zhihan Liu,Yufeng Zhang,Zuyue Fu,Zhuoran Yang,Zhaoran Wang 备注:54 pages, in submission 摘要:在生成性对抗性模仿学习(GAIL)中,agent的目标是从专家演示中学习策略,以便在某个预定义的奖励集上不能将其性能与专家策略区分开来。在本文中,我们使用线性函数近似研究了在线和离线环境下的GAIL,其中特征映射中的转移函数和奖励函数都是线性的。除了专家演示之外,在联机设置中,代理可以与环境交互,而在脱机设置中,代理仅访问先前用户收集的附加数据集。对于在线GAIL,我们提出了一种乐观生成对抗策略优化算法(OGAP),并证明了OGAP实现了$\widetilde{\mathcal{O}(H^2d{3/2}K^{1/2}+KH^{3/2}dN u 1^{-1/2})$遗憾。这里,$N_1$表示专家演示的轨迹数,$d$表示特征维度,$K$表示剧集数。对于离线GAIL,我们提出了一种悲观生成对抗策略优化算法(PGAP)。对于一个任意的附加数据集,我们得到了PGAP的最优性缺口,实现了附加数据集利用率的极小极大下界。假设在附加数据集上有足够的覆盖率,我们表明PGAP实现了$\widetilde{\mathcal{O}(H^{2}dK^{-1/2}+H^2d^{3/2}Nè2^{-1/2}+H^{3/2}dNè1^{-1/2})$最优性缺口。此处$N_2$表示具有足够覆盖率的附加数据集的轨迹数。 摘要:In generative adversarial imitation learning (GAIL), the agent aims to learn a policy from an expert demonstration so that its performance cannot be discriminated from the expert policy on a certain predefined reward set. In this paper, we study GAIL in both online and offline settings with linear function approximation, where both the transition and reward function are linear in the feature maps. Besides the expert demonstration, in the online setting the agent can interact with the environment, while in the offline setting the agent only accesses an additional dataset collected by a prior. For online GAIL, we propose an optimistic generative adversarial policy optimization algorithm (OGAP) and prove that OGAP achieves $\widetilde{\mathcal{O}}(H^2 d^{3/2}K^{1/2}+KH^{3/2}dN_1^{-1/2})$ regret. Here $N_1$ represents the number of trajectories of the expert demonstration, $d$ is the feature dimension, and $K$ is the number of episodes. For offline GAIL, we propose a pessimistic generative adversarial policy optimization algorithm (PGAP). For an arbitrary additional dataset, we obtain the optimality gap of PGAP, achieving the minimax lower bound in the utilization of the additional dataset. Assuming sufficient coverage on the additional dataset, we show that PGAP achieves $\widetilde{\mathcal{O}}(H^{2}dK^{-1/2} +H^2d^{3/2}N_2^{-1/2}+H^{3/2}dN_1^{-1/2} \ )$ optimality gap. Here $N_2$ represents the number of trajectories of the additional dataset with sufficient coverage.
【7】 Dynamic Difficulty Adjustment in Virtual Reality Exergames through Experience-driven Procedural Content Generation 标题:虚拟现实中动态难度调整通过体验驱动的过程性内容生成进行游戏 链接:https://arxiv.org/abs/2108.08762
作者:Tobias Huber,Silvan Mertes,Stanislava Rangelova,Simon Flutura,Elisabeth André 机构:University of Augsburg, Augsburg, Germany, Elisabeth Andr´e 摘要:以体育活动为特色的虚拟现实(VR)游戏已被证明能提高玩家进行体育锻炼的动机。然而,为了使这些运动产生积极的保健效果,必须每周重复几次。为了在较长时间内保持玩家的积极性,游戏通常采用动态难度调整(DDA)来根据玩家的能力调整游戏的挑战。对于运动游戏,这主要是通过调整特定的游戏内参数来完成的,如对象的速度。在这项工作中,我们建议在VR运动游戏中使用经验驱动的DDA程序内容生成,通过程序生成与玩家当前能力相匹配的级别。不仅微调特定参数,而且创造全新的水平有可能减少较长时间内的重复,并允许同时适应游戏的认知和身体挑战。作为概念证明,我们实现了一个初始原型,在该原型中,玩家必须穿过一个包含多个练习室的迷宫,通过神经网络生成迷宫。通过这些健身室需要运动员进行身体活动。为了匹配玩家的能力,我们使用深度强化学习来调整迷宫的结构,并决定迷宫中包括哪些练习室。我们利用biodata和主观问卷对我们的原型进行了探索性用户研究。 摘要:Virtual Reality (VR) games that feature physical activities have been shown to increase players' motivation to do physical exercise. However, for such exercises to have a positive healthcare effect, they have to be repeated several times a week. To maintain player motivation over longer periods of time, games often employ Dynamic Difficulty Adjustment (DDA) to adapt the game's challenge according to the player's capabilities. For exercise games, this is mostly done by tuning specific in-game parameters like the speed of objects. In this work, we propose to use experience-driven Procedural Content Generation for DDA in VR exercise games by procedurally generating levels that match the player's current capabilities. Not only finetuning specific parameters but creating completely new levels has the potential to decrease repetition over longer time periods and allows for the simultaneous adaptation of the cognitive and physical challenge of the exergame. As a proof-of-concept, we implement an initial prototype in which the player must traverse a maze that includes several exercise rooms, whereby the generation of the maze is realized by a neural network. Passing those exercise rooms requires the player to perform physical activities. To match the player's capabilities, we use Deep Reinforcement Learning to adjust the structure of the maze and to decide which exercise rooms to include in the maze. We evaluate our prototype in an exploratory user study utilizing both biodata and subjective questionnaires.
【8】 Neural Predictive Control for the Optimization of Smart Grid Flexibility Schedules 标题:智能电网柔性调度优化的神经预测控制 链接:https://arxiv.org/abs/2108.08739
作者:Steven de Jongh,Sina Steinle,Anna Hlawatsch,Felicitas Mueller,Michael Suriyah,Thomas Leibfried 机构:Karlsruhe Institute of Technology (KIT), Karlsruhe, Germany, elena international GmbH, Berlin, Germany 备注:978-1-6654-4389-0/21/$31.00 C 2021 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective 摘要:模型预测控制(MPC)是一种用数学方法描述电网柔性优化调度问题的方法。由此产生的时间约束优化问题可以使用经典优化方法(如二阶锥规划(SOCP)或内点法(IPOPT))在每个优化时间步中重新求解。将MPC应用于滚动计划时,可减少预测中的不确定性对最优进度的影响。虽然MPC方法能够为时间约束的电网优化提供准确的结果,但它们固有地受到大型复杂电力系统模型所需计算时间的限制。通过函数逼近学习最优控制行为,可以在较短的计算时间内确定接近最优的控制行为。提出了一种神经预测控制(NPC)方案,通过仿真学习线性和非线性电力系统的最优控制策略。结果表明,该方法可以找到近似最优解,同时将计算时间缩短了一个数量级。学习的控制器使用基准智能电网进行验证。 摘要:Model predictive control (MPC) is a method to formulate the optimal scheduling problem for grid flexibilities in a mathematical manner. The resulting time-constrained optimization problem can be re-solved in each optimization time step using classical optimization methods such as Second Order Cone Programming (SOCP) or Interior Point Methods (IPOPT). When applying MPC in a rolling horizon scheme, the impact of uncertainty in forecasts on the optimal schedule is reduced. While MPC methods promise accurate results for time-constrained grid optimization they are inherently limited by the calculation time needed for large and complex power system models. Learning the optimal control behaviour using function approximation offers the possibility to determine near-optimal control actions with short calculation time. A Neural Predictive Control (NPC) scheme is proposed to learn optimal control policies for linear and nonlinear power systems through imitation. It is demonstrated that this procedure can find near-optimal solutions, while reducing the calculation time by an order of magnitude. The learned controllers are validated using a benchmark smart grid.
【9】 SiReN: Sign-Aware Recommendation Using Graph Neural Networks 标题:SEREN:基于图神经网络的手势感知推荐 链接:https://arxiv.org/abs/2108.08735
作者:Changwon Seo,Kyeong-Joong Jeong,Sungsu Lim,Won-Yong Shin 备注:14 pages, 5 figures, 6 tables 摘要:近年来,许多采用网络嵌入(NE)的推荐系统,如图神经网络(GNNs)在提高推荐精度方面得到了广泛的研究。然而,这些尝试主要集中于仅利用高评分的积极用户项目交互信息。因此,如何利用低评分来表示用户的偏好是一个挑战,因为在设计基于NE的推荐系统时,低评分仍然可以提供信息。在本研究中,我们提出了SiReN,一种新的基于GNN模型的符号感知推荐系统。具体来说,SiReN有三个关键组成部分:1)构造一个有符号的二部图,以更精确地表示用户的偏好,该图被拆分为两个边不相交的图,每个边都有正边和负边,2)分别通过GNN模型和多层感知器(MLP)为具有正边和负边的分区图生成两个嵌入,然后使用注意模型获得最终嵌入;3)在优化过程中建立符号感知贝叶斯个性化排序(BPR)损失函数。通过综合实验,我们实证证明SiReN始终优于最先进的NE辅助推荐方法。 摘要:In recent years, many recommender systems using network embedding (NE) such as graph neural networks (GNNs) have been extensively studied in the sense of improving recommendation accuracy. However, such attempts have focused mostly on utilizing only the information of positive user-item interactions with high ratings. Thus, there is a challenge on how to make use of low rating scores for representing users' preferences since low ratings can be still informative in designing NE-based recommender systems. In this study, we present SiReN, a new sign-aware recommender system based on GNN models. Specifically, SiReN has three key components: 1) constructing a signed bipartite graph for more precisely representing users' preferences, which is split into two edge-disjoint graphs with positive and negative edges each, 2) generating two embeddings for the partitioned graphs with positive and negative edges via a GNN model and a multi-layer perceptron (MLP), respectively, and then using an attention model to obtain the final embeddings, and 3) establishing a sign-aware Bayesian personalized ranking (BPR) loss function in the process of optimization. Through comprehensive experiments, we empirically demonstrate that SiReN consistently outperforms state-of-the-art NE-aided recommendation methods.
【10】 Counterfactual Attention Learning for Fine-Grained Visual Categorization and Re-identification 标题:基于反事实注意学习的细粒度视觉分类与再识别 链接:https://arxiv.org/abs/2108.08728
作者:Yongming Rao,Guangyi Chen,Jiwen Lu,Jie Zhou 机构:Department of Automation, Tsinghua University, China, State Key Lab of Intelligent Technologies and Systems, China, Beijing National Research Center for Information Science and Technology, China 备注:Accepted to ICCV 2021 摘要:注意机制在细粒度视觉识别任务中显示出巨大的潜力。本文提出了一种基于因果推理的反事实注意学习方法来学习更有效的注意。与大多数现有的基于传统似然理论的视觉注意学习方法不同,我们提出用反事实因果关系来学习注意,它提供了一种测量注意质量的工具,并提供了一个强大的监督信号来指导学习过程。具体来说,我们通过反事实干预来分析学习到的视觉注意对网络预测的影响,并最大限度地提高影响,以鼓励网络学习更多有用的注意,用于细粒度图像识别。根据经验,我们在广泛的细粒度识别任务中评估了我们的方法,其中注意力起着至关重要的作用,包括细粒度图像分类、人员重新识别和车辆重新识别。所有基准的持续改进证明了我们方法的有效性。代码可在https://github.com/raoyongming/CAL 摘要:Attention mechanism has demonstrated great potential in fine-grained visual recognition tasks. In this paper, we present a counterfactual attention learning method to learn more effective attention based on causal inference. Unlike most existing methods that learn visual attention based on conventional likelihood, we propose to learn the attention with counterfactual causality, which provides a tool to measure the attention quality and a powerful supervisory signal to guide the learning process. Specifically, we analyze the effect of the learned visual attention on network prediction through counterfactual intervention and maximize the effect to encourage the network to learn more useful attention for fine-grained image recognition. Empirically, we evaluate our method on a wide range of fine-grained recognition tasks where attention plays a crucial role, including fine-grained image categorization, person re-identification, and vehicle re-identification. The consistent improvement on all benchmarks demonstrates the effectiveness of our method. Code is available at https://github.com/raoyongming/CAL
【11】 Feature-weighted Stacking for Nonseasonal Time Series Forecasts: A Case Study of the COVID-19 Epidemic Curves 标题:特征加权叠加在非季节性时间序列预测中的应用--以冠状病毒流行曲线为例 链接:https://arxiv.org/abs/2108.08723
作者:Pieter Cawood,Terence L. van Zyl 机构:School of, Computer Science and Applied Mathematics, University of the Witwatersrand, Johannesburg, South Africa, Institute for Intelligent Systems, University of Johannesburg 摘要:我们研究了在预测中的置乱技术,并检验了它们在非季节性时间序列中的应用潜力,这些时间序列与新冠病毒-19大流行早期的时间序列相似。开发改进的预测方法至关重要,因为它们在关键阶段为组织和决策者提供数据驱动的决策。我们建议使用后期数据融合,使用两个预测模型和两个元特征的叠加集成,在初步预测阶段证明其预测能力。最终的集成包括Prophet和长短时记忆(LSTM)神经网络作为基础模型。基本模型由多层感知器(MLP)组合而成,同时考虑到元特征,这些元特征表明与每个基本模型的预测精度具有最高的相关性。我们进一步表明,元特征的加入通常可以提高集合在7天和14天两个预测期内的预测精度。这项研究加强了以前的工作,并证明了将传统统计模型与深度学习模型相结合的价值,从而为跨领域的时间序列生成更精确的预测模型。 摘要:We investigate ensembling techniques in forecasting and examine their potential for use in nonseasonal time-series similar to those in the early days of the COVID-19 pandemic. Developing improved forecast methods is essential as they provide data-driven decisions to organisations and decision-makers during critical phases. We propose using late data fusion, using a stacked ensemble of two forecasting models and two meta-features that prove their predictive power during a preliminary forecasting stage. The final ensembles include a Prophet and long short term memory (LSTM) neural network as base models. The base models are combined by a multilayer perceptron (MLP), taking into account meta-features that indicate the highest correlation with each base model's forecast accuracy. We further show that the inclusion of meta-features generally improves the ensemble's forecast accuracy across two forecast horizons of seven and fourteen days. This research reinforces previous work and demonstrates the value of combining traditional statistical models with deep learning models to produce more accurate forecast models for time-series across domains.
【12】 Czech News Dataset for Semanic Textual Similarity 标题:面向语义文本相似度的捷克语新闻数据集 链接:https://arxiv.org/abs/2108.08708
作者:Jakub Sido,Michal Seják,Ondřej Pražák,Miloslav Konopík,Václav Moravec 机构: NTIS – New Technologies for the Information Society, Department of Computer Science and Engineering, University of West Bohemia, Czech Republic, Department of Journalism, Charles University, Czech Republic 摘要:本文描述了一个由具有语义相似性注释的句子组成的新数据集。数据来源于捷克语的新闻领域。我们详细描述了收集和注释数据的过程。该数据集包含138556条人类注释,分为训练集和测试集。总共有485名新闻专业学生参与了创作过程。为了提高测试集的可靠性,我们将注释计算为9个单独注释的平均值。我们通过测量注释间和注释内注释者的一致性来评估数据集的质量。除了协议编号之外,我们还提供了所收集数据集的详细统计信息。我们以一个基线实验来结束我们的论文,该实验构建了一个预测句子语义相似性的系统。由于大量的训练注释(116 956),该模型的性能明显优于平均注释者(人的相关系数分别为0,92和0,86)。 摘要:This paper describes a novel dataset consisting of sentences with semantic similarity annotations. The data originate from the journalistic domain in the Czech language. We describe the process of collecting and annotating the data in detail. The dataset contains 138,556 human annotations divided into train and test sets. In total, 485 journalism students participated in the creation process. To increase the reliability of the test set, we compute the annotation as an average of 9 individual annotations. We evaluate the quality of the dataset by measuring inter and intra annotation annotators' agreements. Beside agreement numbers, we provide detailed statistics of the collected dataset. We conclude our paper with a baseline experiment of building a system for predicting the semantic similarity of sentences. Due to the massive number of training annotations (116 956), the model can perform significantly better than an average annotator (0,92 versus 0,86 of Person's correlation coefficients).
【13】 Attribute-based Explanations of Non-Linear Embeddings of High-Dimensional Data 标题:基于属性的高维数据非线性嵌入解释 链接:https://arxiv.org/abs/2108.08706
作者:Jan-Tobias Sohns,Michaela Schmitt,Fabian Jirasek,Hans Hasse,Heike Leitte 机构: Hasse are with Laboratory of EngineeringThermodynamics (LTD) at TU Kaiserslautern 备注:IEEE VIS (InfoVis/VAST/SciVis) 2021 摘要:高维数据的嵌入广泛用于探索数据、验证分析结果和交流信息。它们的解释,特别是关于输入属性的解释,通常是困难的。对于PCA等线性项目,轴仍然可以进行有意义的注释。对于非线性投影,这已不再可能,需要基于属性的颜色编码等替代策略。在本文中,我们回顾了现有的增强技术,并讨论了它们的局限性。我们提出了一种非线性嵌入测量器(NoLiES),它将一种新的投影数据增强策略(范围集)与小倍数环境下的交互式分析相结合。Rangesets对装箱属性值使用基于集合的可视化方法,使用户能够快速观察结构并检测异常值。我们详细说明了代数拓扑和范围集之间的联系,并展示了NoLiES在具有各种挑战(复杂属性值分布、多属性、多数据点)的案例研究中的效用,以及在理解热力学中矩阵完备的潜在特征方面的实际应用。 摘要:Embeddings of high-dimensional data are widely used to explore data, to verify analysis results, and to communicate information. Their explanation, in particular with respect to the input attributes, is often difficult. With linear projects like PCA the axes can still be annotated meaningfully. With non-linear projections this is no longer possible and alternative strategies such as attribute-based color coding are required. In this paper, we review existing augmentation techniques and discuss their limitations. We present the Non-Linear Embeddings Surveyor (NoLiES) that combines a novel augmentation strategy for projected data (rangesets) with interactive analysis in a small multiples setting. Rangesets use a set-based visualization approach for binned attribute values that enable the user to quickly observe structure and detect outliers. We detail the link between algebraic topology and rangesets and demonstrate the utility of NoLiES in case studies with various challenges (complex attribute value distribution, many attributes, many data points) and a real-world application to understand latent features of matrix completion in thermodynamics.
【14】 IT2CFNN: An Interval Type-2 Correlation-Aware Fuzzy Neural Network to Construct Non-Separable Fuzzy Rules with Uncertain and Adaptive Shapes for Nonlinear Function Approximation 标题:IT2CFNN:一种构造非线性函数逼近的不确定自适应形状不可分模糊规则的区间2型相关性模糊神经网络 链接:https://arxiv.org/abs/2108.08704
作者:Armin Salimi-Badr 机构:Shahid Beheshti University, Tehran, Iran 摘要:本文介绍了一种新的区间2型模糊神经网络,它能够构造形状自适应的不可分模糊规则。为了反映不确定性,模糊集的形状被认为是不确定的。因此,提出了一种基于一般高斯模型的区间2型模糊集的新形式,该模型能够构造不同的形状(包括三角形、钟形、梯形)。为了考虑输入变量之间的相互作用,将输入向量变换为不相关变量的新特征空间,用于定义每个模糊规则。然后,使用所提出的具有自适应形状的区间2型模糊集将新特征反馈到模糊化层。因此,考虑到变量的局部交互作用和不确定性,形成具有适当形状的区间型2不可分离模糊规则。对于类型约简,分别自适应地选择每个模糊规则的上下射击强度的贡献。为了训练不同的网络参数,采用了Levenberg-Markadt优化方法。所提出的方法的性能研究清洁和嘈杂的数据集,以显示考虑不确定性的能力。此外,所提出的范式已成功地应用于现实世界的时间序列预测、回归问题和非线性系统辨识。根据实验结果,我们提出的模型的性能优于其他结构更简洁的方法。 摘要:In this paper, a new interval type-2 fuzzy neural network able to construct non-separable fuzzy rules with adaptive shapes is introduced. To reflect the uncertainty, the shape of fuzzy sets considered to be uncertain. Therefore, a new form of interval type-2 fuzzy sets based on a general Gaussian model able to construct different shapes (including triangular, bell-shaped, trapezoidal) is proposed. To consider the interactions among input variables, input vectors are transformed to new feature spaces with uncorrelated variables proper for defining each fuzzy rule. Next, the new features are fed to a fuzzification layer using proposed interval type-2 fuzzy sets with adaptive shape. Consequently, interval type-2 non-separable fuzzy rules with proper shapes, considering the local interactions of variables and the uncertainty are formed. For type reduction the contribution of the upper and lower firing strengths of each fuzzy rule are adaptively selected separately. To train different parameters of the network, the Levenberg-Marquadt optimization method is utilized. The performance of the proposed method is investigated on clean and noisy datasets to show the ability to consider the uncertainty. Moreover, the proposed paradigm, is successfully applied to real-world time-series predictions, regression problems, and nonlinear system identification. According to the experimental results, the performance of our proposed model outperforms other methods with a more parsimonious structure.
【15】 Analyze and Design Network Architectures by Recursion Formulas 标题:用递归公式分析和设计网络结构 链接:https://arxiv.org/abs/2108.08689
作者:Yilin Liao,Hao Wang,Zhaoran Liu,Haozhe Li,Xinggao Liu 机构:State Key Laboratory of Industry Control Technology, College of Control Science and Engineering, Zhejiang University, Hangzhou , P.R. China 备注:It is hoped that the new network architecture is derived according to a specific purpose 摘要:捷径/跳跃连接的有效性已得到广泛验证,这激发了对神经结构设计的大量探索。这项工作试图找到一种有效的方法来设计新的网络架构。研究发现,网络体系结构之间的主要差异可以反映在它们的递归公式中。在此基础上,提出了一种从数学公式的角度设计新型网络体系结构的方法。然后,通过一个案例分析,提出了一种基于ResNet的改进体系结构。此外,将新的体系结构与ResNet进行了比较,并在基于ResNet的网络上进行了测试。在CIFAR和ImageNet上进行了大量实验,见证了该体系结构提供的显著性能改进。 摘要:The effectiveness of shortcut/skip-connection has been widely verified, which inspires massive explorations on neural architecture design. This work attempts to find an effective way to design new network architectures. It is discovered that the main difference between network architectures can be reflected in their recursion formulas. Based on this, a methodology is proposed to design novel network architectures from the perspective of mathematical formulas. Afterwards, a case study is provided to generate an improved architecture based on ResNet. Furthermore, the new architecture is compared with ResNet and then tested on ResNet-based networks. Massive experiments are conducted on CIFAR and ImageNet, which witnesses the significant performance improvements provided by the architecture.
【16】 Probability Estimation of Uncertain Process Traces 标题:不确定过程轨迹的概率估计 链接:https://arxiv.org/abs/2108.08615
作者:Marco Pegoraro,Bianka Bakullari,Merih Seran Uysal,Wil M. P. van der Aalst 机构:Uysal[,−,−,−,], and Wil M.P. van der Aalst[,−,−,−,], Process and Data Science Group (PADS), Department of Computer Science, RWTH Aachen University, Aachen, Germany 备注:12 pages, 7 figures, 4 tables, 10 references 摘要:流程挖掘是一门分析事件数据的科学学科,通常收集在称为事件日志的数据库中。最近,不确定事件日志引起了人们的兴趣,它包含了非确定性和随机事件属性,可能代表许多可能的真实场景。在本文中,我们提出了一种方法来可靠地估计每种情况的概率,并允许对其进行分析。实验表明,用我们的方法计算的概率与特定结果发生的真实概率非常接近,从而能够对不确定数据进行更可信的分析。 摘要:Process mining is a scientific discipline that analyzes event data, often collected in databases called event logs. Recently, uncertain event logs have become of interest, which contain non-deterministic and stochastic event attributes that may represent many possible real-life scenarios. In this paper, we present a method to reliably estimate the probability of each of such scenarios, allowing their analysis. Experiments show that the probabilities calculated with our method closely match the true chances of occurrence of specific outcomes, enabling more trustworthy analyses on uncertain data.
【17】 Settling the Variance of Multi-Agent Policy Gradients 标题:解决多Agent策略梯度差异的方法 链接:https://arxiv.org/abs/2108.08612
作者:Jakub Grudzien Kuba,Muning Wen,Yaodong Yang,Linghui Meng,Shangding Gu,Haifeng Zhang,David Henry Mguni,Jun Wang 机构: 3Shanghai Jiao Tong University, 5Institute of Automation, 6University College London 摘要:策略梯度(PG)方法是一种流行的强化学习(RL)方法,通常使用基线来减少梯度估计的方差。在多agent RL(MARL)中,虽然PG定理可以自然扩展,但随着梯度估计方差随agent数量的增加而迅速增加,多agent PG(MAPG)方法的有效性会下降。在本文中,我们对MAPG方法进行了严格的分析,首先,通过量化代理数量和代理探索对MAPG估计量方差的贡献。基于此分析,我们推导出了达到最小方差的最佳基线(OB)。与OB相比,我们测量了现有MARL算法(如vanilla MAPG和COMA)的超额方差。考虑到使用深度神经网络,我们还提出了OB的替代版本,它可以无缝地插入MARL中任何现有的PG方法。在多代理MuJoCo和星际争霸挑战的基准上,我们的OB技术有效地稳定了训练,并显著提高了多代理PPO和COMA算法的性能。 摘要:Policy gradient (PG) methods are popular reinforcement learning (RL) methods where a baseline is often applied to reduce the variance of gradient estimates. In multi-agent RL (MARL), although the PG theorem can be naturally extended, the effectiveness of multi-agent PG (MAPG) methods degrades as the variance of gradient estimates increases rapidly with the number of agents. In this paper, we offer a rigorous analysis of MAPG methods by, firstly, quantifying the contributions of the number of agents and agents' explorations to the variance of MAPG estimators. Based on this analysis, we derive the optimal baseline (OB) that achieves the minimal variance. In comparison to the OB, we measure the excess variance of existing MARL algorithms such as vanilla MAPG and COMA. Considering using deep neural networks, we also propose a surrogate version of OB, which can be seamlessly plugged into any existing PG methods in MARL. On benchmarks of Multi-Agent MuJoCo and StarCraft challenges, our OB technique effectively stabilises training and improves the performance of multi-agent PPO and COMA algorithms by a significant margin.
【18】 Prof. Schönhage's Mysterious Machines 标题:Schönhage教授的神秘机器 链接:https://arxiv.org/abs/2108.08606
作者:J. -M. Chauvet 备注:5 pages, 3 figures 摘要:我们给出了一个简单的Sch \“onhage存储修改机,它模拟了Rule 110细胞自动机的一次迭代。这为Sch \“onhage关于同名机器图灵完整性的原始证明提供了一种替代结构。 摘要:We give a simple Sch\"onhage Storage Modification Machine that simulates one iteration of the Rule 110 cellular automaton. This provides an alternative construction to Sch\"onhage's original proof of the Turing completeness of the eponymous machines.
【19】 Forgetting Formulas and Signature Elements in Epistemic States 标题:认知态中的遗忘公式和签名元素 链接:https://arxiv.org/abs/2108.08603
作者:A. Becker,G. Kern-Isberner,K. Sauerwald,C. Beierle 机构: Christoph Beierle 2 1TU Dortmund University, Germany 2FernUniversit¨at in Hagen 备注:Accepted at NMR 2021 摘要:Delgrande关于遗忘的知识级描述提供了一种从公式集合中遗忘语法元素的通用方法,该公式集合与许多其他遗忘操作(特别是布尔的变量消除)相关联。另一方面,认知状态的边缘化是在更复杂的语义框架中积极减少签名的一种具体方法,也旨在遗忘概率论中非常著名的原子。在本文中,我们将这两种遗忘的观点结合在一起,表明边缘化可以被视为德尔格兰德方法在认知状态水平上的延伸。更准确地说,我们将德尔格兰德的遗忘公理推广到认知状态下的遗忘,并证明边缘化是满足这些公理的最具体、信息最丰富的遗忘算子。此外,通过将公理的基本思想从认知状态转移到遗忘公式,我们阐述了德尔格兰德公式遗忘概念的恰当表述。然而,在这里我们表明,这导致了遗忘公式的琐碎方法。这一发现支持了遗忘语法元素本质上不同于信念收缩的说法,如AGM信念变化框架中的公理化。 摘要:Delgrande's knowledge level account of forgetting provides a general approach to forgetting syntax elements from sets of formulas with links to many other forgetting operations, in particular, to Boole's variable elimination. On the other hand, marginalisation of epistemic states is a specific approach to actively reduce signatures in more complex semantic frameworks, also aiming at forgetting atoms that is very well known from probability theory. In this paper, we bring these two perspectives of forgetting together by showing that marginalisation can be considered as an extension of Delgrande's approach to the level of epistemic states. More precisely, we generalize Delgrande's axioms of forgetting to forgetting in epistemic states, and show that marginalisation is the most specific and informative forgetting operator that satisfies these axioms. Moreover, we elaborate suitable phrasings of Delgrande's concept of forgetting for formulas by transferring the basic ideas of the axioms to forgetting formulas from epistemic states. However, here we show that this results in trivial approaches to forgetting formulas. This finding supports the claim that forgetting syntax elements is essentially different from belief contraction, as e.g. axiomatized in the AGM belief change framework.
【20】 Towards More Efficient Federated Learning with Better Optimization Objects 标题:用更好的优化目标走向更有效的联邦学习 链接:https://arxiv.org/abs/2108.08577
作者:Zirui Zhu,Ziyi Ye 机构:Dept. of Computer Science and Technology, Tsinghua University Beijing, China 摘要:联邦学习(FL)是一种受隐私保护的机器学习范式,它允许在边缘直接训练模型,而无需上传数据。FL在实际应用中面临的最大挑战之一是边缘节点数据的异构性,这将减慢收敛速度并降低模型的性能。对于上述问题,一个有代表性的解决方案是在本地训练中添加附加约束,如FedProx、FedCurv和FedCL。然而,上述算法仍有改进的余地。我们建议使用过去获得的所有模型的聚合作为新的约束目标,以进一步提高此类算法的性能。在不同环境下的实验表明,该方法显著提高了模型的收敛速度和性能。 摘要:Federated Learning (FL) is a privacy-protected machine learning paradigm that allows model to be trained directly at the edge without uploading data. One of the biggest challenges faced by FL in practical applications is the heterogeneity of edge node data, which will slow down the convergence speed and degrade the performance of the model. For the above problems, a representative solution is to add additional constraints in the local training, such as FedProx, FedCurv and FedCL. However, the above algorithms still have room for improvement. We propose to use the aggregation of all models obtained in the past as new constraint target to further improve the performance of such algorithms. Experiments in various settings demonstrate that our method significantly improves the convergence speed and performance of the model.
【21】 Monitoring weeder robots and anticipating their functioning by using advanced topological data analysis 标题:使用高级拓扑数据分析监控除草机器人并预测其功能 链接:https://arxiv.org/abs/2108.08570
作者:Tarek Frahi,Abel Sancarlos,Matthieu Galle,Xavier Beaulieu,Anne Chambard,Antonio Falco,Elias Cueto,Francisco Chinesta 机构:ESI Group chair. PIMM Lab. ENSAM Institute of Technology. Paris, France., ESI Group,bis rue Saarinen, Rungis CEDEX, France, VITIROVER, lieu-dit, Simard, Saint-Emilion, France 摘要:本文旨在分析除草机器人在作业过程中所遵循的复杂轨迹的拓扑内容。我们将证明这些轨迹的拓扑描述符受机器人环境以及与维护操作相关的机器人状态的影响。拓扑数据分析将用于提取基于同源持久性的轨迹描述符。然后,将应用适当的度量来比较轨迹的拓扑表示,以便对其进行分类或进行有效的模式识别。 摘要:The present paper aims at analyzing the topological content of the complex trajectories that weeder-autonomous robots follow in operation. We will prove that the topological descriptors of these trajectories are affected by the robot environment as well as by the robot state, with respect to maintenance operations. Topological Data Analysis will be used for extracting the trajectory descriptors, based on homology persistence. Then, appropriate metrics will be applied in order to compare that topological representation of the trajectories, for classifying them or for making efficient pattern recognition.
【22】 Understanding and Mitigating Annotation Bias in Facial Expression Recognition 标题:面部表情识别中标注偏差的理解与缓解 链接:https://arxiv.org/abs/2108.08504
作者:Yunliang Chen,Jungseock Joo 机构:University of California, Los Angeles 备注:To appear in ICCV 2021 摘要:计算机视觉模型的性能取决于其训练数据的大小和质量。最近的研究揭示了常见图像数据集中先前未知的合成偏差,这些偏差会导致模型输出出现偏差,并提出了缓解这些偏差的方法。然而,大多数现有的工作都假设人工生成的注释可以被视为金标准和无偏见的。在本文中,我们揭示了这个假设可能是有问题的,并且应该特别注意防止模型学习这种注释偏差。我们专注于面部表情识别,并比较实验室控制和野生数据集之间的标签偏差。我们证明了许多表达数据集在性别之间存在显著的注释偏差,特别是当涉及到快乐和愤怒的表达时,并且传统方法无法完全缓解训练模型中的这种偏差。为了消除表情标注偏差,我们提出了一种AU校准的面部表情识别(AUC-FER)框架,该框架利用面部动作单元(AUs)并将三重态丢失纳入目标函数。实验结果表明,与现有的方法相比,该方法在消除表达式标注偏差方面更为有效。 摘要:The performance of a computer vision model depends on the size and quality of its training data. Recent studies have unveiled previously-unknown composition biases in common image datasets which then lead to skewed model outputs, and have proposed methods to mitigate these biases. However, most existing works assume that human-generated annotations can be considered gold-standard and unbiased. In this paper, we reveal that this assumption can be problematic, and that special care should be taken to prevent models from learning such annotation biases. We focus on facial expression recognition and compare the label biases between lab-controlled and in-the-wild datasets. We demonstrate that many expression datasets contain significant annotation biases between genders, especially when it comes to the happy and angry expressions, and that traditional methods cannot fully mitigate such biases in trained models. To remove expression annotation bias, we propose an AU-Calibrated Facial Expression Recognition (AUC-FER) framework that utilizes facial action units (AUs) and incorporates the triplet loss into the objective function. Experimental results suggest that the proposed method is more effective in removing expression annotation bias than existing techniques.
【23】 A relaxed technical assumption for posterior sampling-based reinforcement learning for control of unknown linear systems 标题:未知线性系统后验采样强化学习的一种松弛技术假设 链接:https://arxiv.org/abs/2108.08502
作者:Mukul Gagrani,Sagar Sudhakara,Aditya Mahajan,Ashutosh Nayyar,Yi Ouyang 机构:com)Sagar Sudhakara and Ashutosh Nayyar are with the Department of Elec-trical and Computer Engineering, University of Southern California 摘要:我们回顾了Ouyang等人(arXiv:1709.04047)最近提出的控制未知线性二次(LQ)系统的Thompson采样算法。在闭环系统诱导范数的技术假设下,推导了该算法的遗憾界。在本技术说明中,我们表明,通过对算法进行轻微修改(特别是确保事件不会过早结束),可以用闭环系统光谱半径方面的较温和假设来取代诱导范数的技术假设。修改后的算法具有相同的$\tilde{\mathcal{O}(\sqrt{T})$贝叶斯遗憾,其中$T$是时间范围,$\tilde{\mathcal{O}(\cdot)$符号隐藏了~$T$中的对数项。 摘要:We revisit the Thompson sampling algorithm to control an unknown linear quadratic (LQ) system recently proposed by Ouyang et al (arXiv:1709.04047). The regret bound of the algorithm was derived under a technical assumption on the induced norm of the closed loop system. In this technical note, we show that by making a minor modification in the algorithm (in particular, ensuring that an episode does not end too soon), this technical assumption on the induced norm can be replaced by a milder assumption in terms of the spectral radius of the closed loop system. The modified algorithm has the same Bayesian regret of $\tilde{\mathcal{O}}(\sqrt{T})$, where $T$ is the time-horizon and the $\tilde{\mathcal{O}}(\cdot)$ notation hides logarithmic terms in~$T$.
【24】 Inverse design optimization framework via a two-step deep learning approach: application to a wind turbine airfoil 标题:基于两步深度学习的逆向设计优化框架:在风力机翼型中的应用 链接:https://arxiv.org/abs/2108.08500
作者:Sunwoong Yang,Sanga Lee,Kwanjung Yee 机构:a Seoul National University, Seoul , Republic of Korea, b Korea Institute of Industrial Technology, Incheon , Republic of Korea 备注:This manuscript is being reviewed in the journal "Engineering with Computers" 摘要:尽管逆方法在气动设计中计算效率高,因为指定了期望的目标性能分布,但它有一些显著的限制,无法实现完全效率。首先,当指定的目标分布发生变化时,应重复迭代过程。可以执行目标分布优化,以澄清指定此分布时的模糊性,但在此过程中会出现一些其他问题,如分布参数化导致的表示能力损失、现实分布的过度约束、,由于理论/经验预测导致的感兴趣数量不准确,以及无法明确施加几何约束。为了解决这些问题,提出了一种基于两步深度学习的逆向设计优化框架。使用变分自动编码器和多层感知器生成真实的目标分布,并根据生成的分布分别预测感兴趣的数量和形状参数。然后,将目标分布优化作为逆设计优化进行。该框架采用主动学习和迁移学习技术来提高学习的准确性和效率。最后,该框架通过风力涡轮机叶片翼型的气动形状优化得到验证,其中逆向设计正在积极应用。优化结果表明,该框架具有足够的精度、效率和灵活性,可应用于其他逆向设计工程应用。 摘要:Though inverse approach is computationally efficient in aerodynamic design as the desired target performance distribution is specified, it has some significant limitations that prevent full efficiency from being achieved. First, the iterative procedure should be repeated whenever the specified target distribution changes. Target distribution optimization can be performed to clarify the ambiguity in specifying this distribution, but several additional problems arise in this process such as loss of the representation capacity due to parameterization of the distribution, excessive constraints for a realistic distribution, inaccuracy of quantities of interest due to theoretical/empirical predictions, and the impossibility of explicitly imposing geometric constraints. To deal with these issues, a novel inverse design optimization framework with a two-step deep learning approach is proposed. A variational autoencoder and multi-layer perceptron are used to generate a realistic target distribution and predict the quantities of interest and shape parameters from the generated distribution, respectively. Then, target distribution optimization is performed as the inverse design optimization. The proposed framework applies active learning and transfer learning techniques to improve accuracy and efficiency. Finally, the framework is validated through aerodynamic shape optimizations of the airfoil of a wind turbine blade, where inverse design is actively being applied. The results of the optimizations show that this framework is sufficiently accurate, efficient, and flexible to be applied to other inverse design engineering applications.
【25】 Proceedings of the 1st International Workshop on Adaptive Cyber Defense 标题:首届自适应网络防御国际研讨会论文集 链接:https://arxiv.org/abs/2108.08476
作者:Damian Marriott,Kimberly Ferguson-Walter,Sunny Fugate,Marco Carvalho 摘要:第一届自适应网络防御国际研讨会是2021年国际人工智能联合会议的一部分。本次研讨会旨在分享研究成果,探索人工智能(AI)和机器学习(ML)的独特应用,作为实现自适应网络防御的基础能力。如果不广泛依赖人类专家,网络领域目前无法得到可靠和有效的保护。熟练的网络防御者供不应求,往往无法对网络威胁作出足够快的反应。在AI和ML最新进展的基础上,网络防御研究社区通过在网络和非网络环境中采用AI和ML技术,积极开发新的动态和可持续防御。弥合人工智能与网络研究人员和从业者之间的关键差距,可以加快建立半自主网络防御的努力,从而学会识别和应对网络攻击,或与其他网络操作系统和人类专家合作发现和缓解弱点。此外,这些防御措施预计是自适应的,并且能够随着时间的推移而演变,以阻止攻击者行为的变化、系统健康状况和准备状态的变化以及用户行为随时间的自然变化。研讨会(于2021年8月19日和20日在蒙特利尔举行,主题为虚拟现实)由技术演示和小组讨论组成,重点讨论开放性问题和潜在的研究解决方案。一个领域专家小组对讲习班提交的材料进行了同行审查,会议记录由10篇技术文章组成,探讨了对国家和全球安全至关重要的挑战性问题。参加本次研讨会为促进自适应和自主网络防御这一新兴领域的研究和创新提供了新的机会。 摘要:The 1st International Workshop on Adaptive Cyber Defense was held as part of the 2021 International Joint Conference on Artificial Intelligence. This workshop was organized to share research that explores unique applications of Artificial Intelligence (AI) and Machine Learning (ML) as foundational capabilities for the pursuit of adaptive cyber defense. The cyber domain cannot currently be reliably and effectively defended without extensive reliance on human experts. Skilled cyber defenders are in short supply and often cannot respond fast enough to cyber threats. Building on recent advances in AI and ML the Cyber defense research community has been motivated to develop new dynamic and sustainable defenses through the adoption of AI and ML techniques to both cyber and non-cyber settings. Bridging critical gaps between AI and Cyber researchers and practitioners can accelerate efforts to create semi-autonomous cyber defenses that can learn to recognize and respond to cyber attacks or discover and mitigate weaknesses in cooperation with other cyber operation systems and human experts. Furthermore, these defenses are expected to be adaptive and able to evolve over time to thwart changes in attacker behavior, changes in the system health and readiness, and natural shifts in user behavior over time. The Workshop (held on August 19th and 20th 2021 in Montreal-themed virtual reality) was comprised of technical presentations and a panel discussion focused on open problems and potential research solutions. Workshop submissions were peer reviewed by a panel of domain experts with a proceedings consisting of 10 technical articles exploring challenging problems of critical importance to national and global security. Participation in this workshop offered new opportunities to stimulate research and innovation in the emerging domain of adaptive and autonomous cyber defense.
【26】 QUEACO: Borrowing Treasures from Weakly-labeled Behavior Data for Query Attribute Value Extraction 标题:QUEACO:从弱标签行为数据中借用宝库进行查询属性值提取 链接:https://arxiv.org/abs/2108.08468
作者:Danqing Zhang,Zheng Li,Tianyu Cao,Chen Luo,Tony Wu,Hanqing Lu,Yiwei Song,Bing Yin,Tuo Zhao,Qiang Yang 机构:Georgia Institute of Technology, GA, USA, Hong Kong University of Science and Technology, HK, China 备注:None 摘要:我们研究了查询属性值提取问题,其目的是将用户查询中的命名实体识别为不同的表面形式属性值,然后将其转换为正式的规范形式。这个问题包括两个阶段:{命名实体识别(NER)}和{属性值规范化(AVN)}。然而,现有的工作只关注NER阶段,而忽略了同样重要的AVN。为了弥补这一差距,本文提出了一个电子商务搜索中统一的查询属性值提取系统QUEACO,该系统包括两个阶段。此外,通过利用大规模弱标记行为数据,我们进一步提高了提取性能,同时降低了监督成本。具体而言,对于NER阶段,QUEACO采用了一种新型的教师-学生网络,其中在强标记数据上训练的教师网络生成伪标记以细化弱标记数据以训练学生网络。同时,教师网络可以通过学生对强标签数据的反馈动态调整,以最大限度地消除弱标签带来的噪声。对于AVN阶段,我们还利用弱标记的查询到属性行为数据将查询中的表面形式属性值规范化为产品中的规范形式。在真实世界的大规模电子商务数据集上进行的大量实验证明了QUEACO的有效性。 摘要:We study the problem of query attribute value extraction, which aims to identify named entities from user queries as diverse surface form attribute values and afterward transform them into formally canonical forms. Such a problem consists of two phases: {named entity recognition (NER)} and {attribute value normalization (AVN)}. However, existing works only focus on the NER phase but neglect equally important AVN. To bridge this gap, this paper proposes a unified query attribute value extraction system in e-commerce search named QUEACO, which involves both two phases. Moreover, by leveraging large-scale weakly-labeled behavior data, we further improve the extraction performance with less supervision cost. Specifically, for the NER phase, QUEACO adopts a novel teacher-student network, where a teacher network that is trained on the strongly-labeled data generates pseudo-labels to refine the weakly-labeled data for training a student network. Meanwhile, the teacher network can be dynamically adapted by the feedback of the student's performance on strongly-labeled data to maximally denoise the noisy supervisions from the weak labels. For the AVN phase, we also leverage the weakly-labeled query-to-attribute behavior data to normalize surface form attribute values from queries into canonical forms from products. Extensive experiments on a real-world large-scale E-commerce dataset demonstrate the effectiveness of QUEACO.
【27】 Semantic Reinforced Attention Learning for Visual Place Recognition 标题:语义强化的注意学习在视觉位置识别中的应用 链接:https://arxiv.org/abs/2108.08443
作者:Guohao Peng,Yufeng Yue,Jun Zhang,Zhenyu Wu,Xiaoyu Tang,Danwei Wang 机构: Wang are with School ofElectrical and Electronic Engineering, Nanyang Technological University, Yue is with the School of Automation, Beijing Institute of Technology 摘要:大规模视觉位置识别(VPR)具有内在的挑战性,因为图像中并非所有的视觉线索都有利于任务的完成。为了在特征嵌入中突出任务相关的视觉线索,现有的注意机制要么基于人工规则,要么以彻底的数据驱动方式进行训练。为了填补这两种类型之间的空白,我们提出了一种新的语义强化注意学习网络(SRALNet),其中推断出的注意可以同时受益于语义先验和数据驱动的微调。其贡献有两个方面(1) 为了抑制误导性的局部特征,提出了一种基于分层特征分布的可解释局部加权方案(2) 通过利用局部加权方案的可解释性,提出了一种语义约束的初始化方法,以便通过语义先验加强局部注意。实验表明,在城市规模的VPR基准数据集上,我们的方法优于最先进的技术。 摘要:Large-scale visual place recognition (VPR) is inherently challenging because not all visual cues in the image are beneficial to the task. In order to highlight the task-relevant visual cues in the feature embedding, the existing attention mechanisms are either based on artificial rules or trained in a thorough data-driven manner. To fill the gap between the two types, we propose a novel Semantic Reinforced Attention Learning Network (SRALNet), in which the inferred attention can benefit from both semantic priors and data-driven fine-tuning. The contribution lies in two-folds. (1) To suppress misleading local features, an interpretable local weighting scheme is proposed based on hierarchical feature distribution. (2) By exploiting the interpretability of the local weighting scheme, a semantic constrained initialization is proposed so that the local attention can be reinforced by semantic priors. Experiments demonstrate that our method outperforms state-of-the-art techniques on city-scale VPR benchmark datasets.
【28】 Self-Supervised Video Representation Learning with Meta-Contrastive Network 标题:基于元对比网络的自监督视频表示学习 链接:https://arxiv.org/abs/2108.08426
作者:Yuanze Lin,Xun Guo,Yan Lu 机构:University of Washington, Microsoft Research Asia 备注:Accepted to ICCV 2021 摘要:自监督学习已成功地应用于训练前视频表示,其目的是有效地适应训练前域到下游任务。现有的方法仅仅利用对比损失来学习实例级别的区分。然而,类别信息的缺乏将导致难以确定的正问题,从而限制了这类方法的泛化能力。我们发现元学习的多任务过程可以解决这个问题。在本文中,我们提出了一个元对比网络(MCN),它将对比学习和元学习结合起来,以增强现有自监督方法的学习能力。我们的方法包含两个基于模型不可知元学习(MAML)的训练阶段,每个阶段包括一个对比分支和一个元分支。广泛的评估证明了我们方法的有效性。对于两个下游任务,即视频动作识别和视频检索,MCN在UCF101和HMDB51数据集上优于最先进的方法。更具体地说,使用R(2+1)D主干,MCN在视频动作识别方面达到了84.8%和54.5%的顶级精度,在视频检索方面达到了52.5%和23.7%的顶级精度。 摘要:Self-supervised learning has been successfully applied to pre-train video representations, which aims at efficient adaptation from pre-training domain to downstream tasks. Existing approaches merely leverage contrastive loss to learn instance-level discrimination. However, lack of category information will lead to hard-positive problem that constrains the generalization ability of this kind of methods. We find that the multi-task process of meta learning can provide a solution to this problem. In this paper, we propose a Meta-Contrastive Network (MCN), which combines the contrastive learning and meta learning, to enhance the learning ability of existing self-supervised approaches. Our method contains two training stages based on model-agnostic meta learning (MAML), each of which consists of a contrastive branch and a meta branch. Extensive evaluations demonstrate the effectiveness of our method. For two downstream tasks, i.e., video action recognition and video retrieval, MCN outperforms state-of-the-art approaches on UCF101 and HMDB51 datasets. To be more specific, with R(2+1)D backbone, MCN achieves Top-1 accuracies of 84.8% and 54.5% for video action recognition, as well as 52.5% and 23.7% for video retrieval.
【29】 Second-Order Specifications and Quantifier Elimination for Consistent Query Answering in Databases 标题:数据库查询一致性回答的二阶规范和量词剔除 链接:https://arxiv.org/abs/2108.08423
作者:Leopoldo Bertossi 机构:Universidad Adolfo Ib´a˜nez, and, Millennium Institute for Foundational Research on Data (IMFD), Santiago, Chile 备注:This is a slightly extended and updated version of a paper published in the Proc. of the Alberto Mendelzon International Workshop of Foundations of Data Management which will also serve as extended version for a forthcoming extended abstract based on the former 摘要:来自可能不一致的数据库的查询的一致答案是同时从数据库的每个可能修复中检索的答案。修复是与原始不一致实例差异最小的一致实例。以前已经证明,数据库修复可以指定为析取逻辑程序的稳定模型。在本文中,我们展示了如何使用修复程序将一致性查询应答问题转化为推理问题,这是一个用二阶谓词逻辑编写的理论。它还研究了如何通过应用二阶量词消除技术获得一阶理论。 摘要:Consistent answers to a query from a possibly inconsistent database are answers that are simultaneously retrieved from every possible repair of the database. Repairs are consistent instances that minimally differ from the original inconsistent instance. It has been shown before that database repairs can be specified as the stable models of a disjunctive logic program. In this paper we show how to use the repair programs to transform the problem of consistent query answering into a problem of reasoning w.r.t. a theory written in second-order predicate logic. It also investigated how a first-order theory can be obtained instead by applying second-order quantifier elimination techniques.
【30】 The Multi-Modal Video Reasoning and Analyzing Competition 标题:多模态视频推理分析比赛 链接:https://arxiv.org/abs/2108.08344
作者:Haoran Peng,He Huang,Li Xu,Tianjiao Li,Jun Liu,Hossein Rahmani,Qiuhong Ke,Zhicheng Guo,Cong Wu,Rongchang Li,Mang Ye,Jiahao Wang,Jiaxu Zhang,Yuanzhong Liu,Tao He,Fuwei Zhang,Xianbin Liu,Tao Lin 机构:Singapore University of Technology and Design, Lancaster University, University of Melbourne, Xidian University, Jiangnan University, Wuhan University, Tsinghua University, Sun Yat-sen University, BOE Technology Group Co., Ltd 备注:Accepted to ICCV 2021 Workshops 摘要:在本文中,我们结合ICCV 2021介绍了多模式视频推理和分析竞赛(MMVRAC)研讨会。该竞赛由四个不同的轨道组成,即视频问答、基于骨架的动作识别、基于鱼眼视频的动作识别和人物再识别,这是基于两个数据集:SUTD TrafficQA和UAV Human。我们总结了参赛者提交的最佳表现方法,并展示了他们在比赛中取得的成绩。 摘要:In this paper, we introduce the Multi-Modal Video Reasoning and Analyzing Competition (MMVRAC) workshop in conjunction with ICCV 2021. This competition is composed of four different tracks, namely, video question answering, skeleton-based action recognition, fisheye video-based action recognition, and person re-identification, which are based on two datasets: SUTD-TrafficQA and UAV-Human. We summarize the top-performing methods submitted by the participants in this competition and show their results achieved in the competition.
【31】 End-to-End License Plate Recognition Pipeline for Real-time Low Resource Video Based Applications 标题:基于实时低资源视频应用的端到端车牌识别流水线 链接:https://arxiv.org/abs/2108.08339
作者:Alif Ashrafee,Akib Mohammed Khan,Mohammad Sabik Irbaz,MD Abdullah Al Nasim 机构:Department of Computer Science and Engineering, Islamic University of Technology, Machine Learning Team, Pioneer Alpha Ltd., A PREPRINT 备注:Under Review 摘要:自动车牌识别系统旨在为检测、定位和识别视频帧中车辆的车牌字符提供端到端解决方案。然而,在现实世界中部署此类系统需要在低资源环境中的实时性能。在我们的论文中,我们提出了一种新的两阶段检测管道与视觉API相结合,旨在提供实时推理速度以及一致准确的检测和识别性能。我们在主干MobileNet SSDv2检测模型上使用了一个haar级联分类器作为过滤器。这通过只关注高置信度检测并将其用于识别来减少推理时间。我们还采用了一种时间帧分离策略来识别同一剪辑中的多个车牌。此外,没有公开可用的孟加拉语车牌数据集,为此我们创建了一个图像数据集和一个视频数据集,其中包含野外的车牌。我们在图像数据集上训练了我们的模型,获得了86%的AP(0.5)分数,并在视频数据集上测试了我们的管道,观察到了合理的检测和识别性能(82.7%的检测率和60.8%的OCR F1分数),实时处理速度(27.2帧/秒)。 摘要:Automatic License Plate Recognition systems aim to provide an end-to-end solution towards detecting, localizing, and recognizing license plate characters from vehicles appearing in video frames. However, deploying such systems in the real world requires real-time performance in low-resource environments. In our paper, we propose a novel two-stage detection pipeline paired with Vision API that aims to provide real-time inference speed along with consistently accurate detection and recognition performance. We used a haar-cascade classifier as a filter on top of our backbone MobileNet SSDv2 detection model. This reduces inference time by only focusing on high confidence detections and using them for recognition. We also impose a temporal frame separation strategy to identify multiple vehicle license plates in the same clip. Furthermore, there are no publicly available Bangla license plate datasets, for which we created an image dataset and a video dataset containing license plates in the wild. We trained our models on the image dataset and achieved an AP(0.5) score of 86% and tested our pipeline on the video dataset and observed reasonable detection and recognition performance (82.7% detection rate, and 60.8% OCR F1 score) with real-time processing speed (27.2 frames per second).
【32】 TFRD: A Benchmark Dataset for Research on Temperature Field Reconstruction of Heat-Source Systems 标题:TFRD:热源系统温度场重建研究的基准数据集 链接:https://arxiv.org/abs/2108.08298
作者:Xiaoqian Chen,Zhiqiang Gong,Xiaoyu Zhao,Wen Yao 机构:Received: date Accepted: date 摘要:热管理在工程中起着重要的作用。用有限的监测张量重建热源系统(TFR-HSS)的温度场在热管理中起着至关重要的作用。然而,现有的常用插值方法通常不能提供精确的重建。此外,目前还没有公共数据集可用于广泛研究重建方法,以进一步推动工程领域的现场重建。为了克服这一问题,本工作以常用的插值方法和基于代理模型的方法为基线,为TFR-HSS任务构建了一个特定的数据集,即TFRD,以推进温度场重建的研究。首先,TFR-HSS任务从实际工程问题进行数学建模,并构建了三种类型的数值建模,以将问题转化为离散映射形式。此外,本文选取了四个典型的具有不同热源信息和边界条件的重构问题,并生成标准样本作为训练样本和测试样本进行进一步的研究。最后,对TFR-HSS任务的先前方法以及最近广泛使用的深度学习方法进行了全面回顾,并对TFRD的典型方法进行了性能分析,可作为该基准的基线结果。 摘要:Heat management plays an important role in engineering. Temperature field reconstruction of heat source systems (TFR-HSS) with limited monitoring tensors, performs an essential role in heat management. However, prior methods with common interpolations usually cannot provide accurate reconstruction. In addition, there exists no public dataset for widely research of reconstruction methods to further boost the field reconstruction in engineering. To overcome this problem, this work construct a specific dataset, namely TFRD, for TFR-HSS task with commonly used methods, including the interpolation methods and the surrogate model based methods, as baselines to advance the research over temperature field reconstruction. First, the TFR-HSS task is mathematically modelled from real-world engineering problem and three types of numerically modellings have been constructed to transform the problem into discrete mapping forms. Besides, this work selects four typical reconstruction problem with different heat source information and boundary conditions and generate the standard samples as training and testing samples for further research. Finally, a comprehensive review of the prior methods for TFR-HSS task as well as recent widely used deep learning methods is given and we provide a performance analysis of typical methods on TFRD, which can be served as the baseline results on this benchmark.
【33】 Fact-Tree Reasoning for N-ary Question Answering over Knowledge Graphs 标题:知识图上N元问答的事实树推理 链接:https://arxiv.org/abs/2108.08297
作者:Yao Zhang,Peiyao Li,Hongru Liang,Adam Jatowt,Zhenglu Yang 机构:Nankai University, China, Sichuan University, China, University of Innsbruck, Austria 备注:11 pages, 6 figures and 4 tables 摘要:在问答(QA)任务中,多跳推理框架近年来得到了广泛的研究,以在知识图(KG)上实现更高效和可解释的答案推理。然而,多跳推理由于其线性推理性质,不适用于回答n元事实问题。我们发现有两个可行的改进:1)将基本推理单元从实体或关系升级到事实;2)将推理结构从链式升级为树状。在此基础上,我们提出了一种新的事实树推理框架,通过将问题转化为事实树并对其进行迭代事实推理来预测正确答案。通过对本文引入的n元事实KGQA数据集的综合评价,我们证明了所提出的事实树推理框架具有较高的答案预测精度。此外,我们还对两个二进制KGQA数据集上的事实树推理框架进行了评估,结果表明,与一些优秀的基线相比,我们的方法也具有很强的推理能力。这项工作对探索复杂的推理场景有直接的影响,并提供了初步的基线方法。 摘要:In the question answering(QA) task, multi-hop reasoning framework has been extensively studied in recent years to perform more efficient and interpretable answer reasoning on the Knowledge Graph(KG). However, multi-hop reasoning is inapplicable for answering n-ary fact questions due to its linear reasoning nature. We discover that there are two feasible improvements: 1) upgrade the basic reasoning unit from entity or relation to fact; and 2) upgrade the reasoning structure from chain to tree. Based on these, we propose a novel fact-tree reasoning framework, through transforming the question into a fact tree and performing iterative fact reasoning on it to predict the correct answer. Through a comprehensive evaluation on the n-ary fact KGQA dataset introduced by this work, we demonstrate that the proposed fact-tree reasoning framework has the desired advantage of high answer prediction accuracy. In addition, we also evaluate the fact-tree reasoning framework on two binary KGQA datasets and show that our approach also has a strong reasoning ability compared with several excellent baselines. This work has direct implications for exploring complex reasoning scenarios and provides a preliminary baseline approach.
【34】 Deep Contrastive Learning for Multi-View Network Embedding 标题:多视点网络嵌入的深度对比学习 链接:https://arxiv.org/abs/2108.08296
作者:Mengqi Zhang,Yanqiao Zhu,Shu Wu,Liang Wang 机构: Center for Research on Intelligent Perception and Computing, Institute of Automation, Chinese Academy of Sciences, School of Artificial Intelligence, University of Chinese Academy of Sciences 备注:Work in progress 摘要:多视图网络嵌入的目的是将网络中的节点投影到低维向量上,同时保留其多个关系和属性信息。基于对比学习的方法在这项任务中已初步显示出良好的性能。然而,大多数基于对比学习的方法大多依赖于高质量的图形嵌入,而对不同图形视图之间关系的研究较少。针对这些不足,我们设计了一个新的多视图网络嵌入节点对节点对比学习框架(CREME),该框架主要包含两个对比目标:多视图融合InfoMax和视图间InfoMin。前者从不同的图视图生成的嵌入中提取信息,而后者更好地区分不同的图视图以捕获它们之间的互补信息。具体来说,我们首先应用视图编码器来生成每个图形视图表示,并利用多视图聚合器来融合这些表示。然后,我们将两个对比目标统一为一个学习目标进行训练。在三个真实数据集上的大量实验表明,CREME的性能始终优于现有的方法。 摘要:Multi-view network embedding aims at projecting nodes in the network to low-dimensional vectors, while preserving their multiple relations and attribute information. Contrastive learning-based methods have preliminarily shown promising performance in this task. However, most contrastive learning-based methods mostly rely on high-quality graph embedding and explore less on the relationships between different graph views. To deal with these deficiencies, we design a novel node-to-node Contrastive learning framework for Multi-view network Embedding (CREME), which mainly contains two contrastive objectives: Multi-view fusion InfoMax and Inter-view InfoMin. The former objective distills information from embeddings generated from different graph views, while the latter distinguishes different graph views better to capture the complementary information between them. Specifically, we first apply a view encoder to generate each graph view representation and utilize a multi-view aggregator to fuse these representations. Then, we unify the two contrastive objectives into one learning objective for training. Extensive experiments on three real-world datasets show that CREME outperforms existing methods consistently.
【35】 AIRCHITECT: Learning Custom Architecture Design and Mapping Space 标题:AIRCHITECT:学习定制建筑设计和绘图空间 链接:https://arxiv.org/abs/2108.08295
作者:Ananda Samajdar,Jan Moritz Joseph,Matthew Denton,Tushar Krishna 机构:Georgia Tech, Atlanta, GA, RWTH Aachen University, Aachen, Germany 摘要:设计空间探索是定制体系结构设计/部署过程中一个重要但代价高昂的步骤,目的是尽可能提高性能和能效。传统上,优化需要使用模拟或启发式工具对设计空间进行迭代采样。在本文中,我们研究了使用机器学习学习优化任务的可能性,从而使用学习到的模型预测定制架构的设计和映射空间的最佳参数,绕过任何探索步骤。我们使用了三个案例研究,涉及基于脉动阵列的定制架构设计和映射空间的最佳阵列设计、SRAM缓冲区大小、映射和调度确定。在这些案例研究的范围内,我们表明,当使用工作量和设计约束进行查询时,可以捕获设计空间并训练模型来“概括”预测最优设计和映射参数。我们为我们的案例研究对优化空间进行系统的设计感知和统计分析,并突出设计空间中的模式。我们将架构设计和映射描述为一个机器学习问题,它允许我们利用现有的ML模型进行训练和推理。我们设计并训练了一个名为AIRCHITECT的定制网络体系结构,该体系结构能够以高达94.3%的测试精度学习体系结构设计空间,并预测最佳配置,在一个工作负载为10^5$GEMM的测试数据集上,平均(GeoMean)达到99.9%的最佳性能。 摘要:Design space exploration is an important but costly step involved in the design/deployment of custom architectures to squeeze out maximum possible performance and energy efficiency. Conventionally, optimizations require iterative sampling of the design space using simulation or heuristic tools. In this paper we investigate the possibility of learning the optimization task using machine learning and hence using the learnt model to predict optimal parameters for the design and mapping space of custom architectures, bypassing any exploration step. We use three case studies involving the optimal array design, SRAM buffer sizing, mapping, and schedule determination for systolic-array-based custom architecture design and mapping space. Within the purview of these case studies, we show that it is possible to capture the design space and train a model to "generalize" prediction the optimal design and mapping parameters when queried with workload and design constraints. We perform systematic design-aware and statistical analysis of the optimization space for our case studies and highlight the patterns in the design space. We formulate the architecture design and mapping as a machine learning problem that allows us to leverage existing ML models for training and inference. We design and train a custom network architecture called AIRCHITECT, which is capable of learning the architecture design space with as high as 94.3% test accuracy and predicting optimal configurations which achieve on average (GeoMean) of 99.9% the best possible performance on a test dataset with $10^5$ GEMM workloads.
【36】 ChMusic: A Traditional Chinese Music Dataset for Evaluation of Instrument Recognition 标题:ChMusic:一种用于乐器识别评价的中国传统音乐数据集 链接:https://arxiv.org/abs/2108.08470
作者:Xia Gong,Yuxiang Zhu,Haidi Zhu,Haoran Wei 机构:School of Music, Shandong University of Technology, Zibo, Chia, Shanghai Institute of Microsystem and Information Technology, Chinese Academy of Sciences, Shanghai, China, No., High School (Baoshan) of, East China Normal University 摘要:乐器识别是音乐信息检索的一个广泛应用。由于以往的乐器识别数据大多集中在西方乐器上,研究者很难对中国传统乐器识别领域进行研究和评价。本文提出了一个用于训练模型和性能评估的中国传统音乐数据集ChMusic。该数据集是免费和公开的,其中记录了11种中国传统乐器和55种中国传统音乐节选。然后提出了基于ChMusic数据集的评价标准。有了这个标准,研究人员可以按照相同的规则比较他们的结果,不同研究人员的结果将变得具有可比性。 摘要:Musical instruments recognition is a widely used application for music information retrieval. As most of previous musical instruments recognition dataset focus on western musical instruments, it is difficult for researcher to study and evaluate the area of traditional Chinese musical instrument recognition. This paper propose a traditional Chinese music dataset for training model and performance evaluation, named ChMusic. This dataset is free and publicly available, 11 traditional Chinese musical instruments and 55 traditional Chinese music excerpts are recorded in this dataset. Then an evaluation standard is proposed based on ChMusic dataset. With this standard, researchers can compare their results following the same rule, and results from different researchers will become comparable.
【37】 Temporal Kernel Consistency for Blind Video Super-Resolution 标题:盲视频超分辨率的时间核一致性研究 链接:https://arxiv.org/abs/2108.08305
作者:Lichuan Xiang,Royson Lee,Mohamed S. Abdelfattah,Nicholas D. Lane,Hongkai Wen 机构:University of Warwick, University of Cambridge, Samsung AI Center, Cambridge 摘要:基于深度学习的盲超分辨率(SR)方法最近在未知退化的放大帧中取得了前所未有的性能。这些模型能够从给定的低分辨率(LR)图像中准确估计未知的降尺度核,以便在恢复过程中利用核。尽管这些方法在很大程度上取得了成功,但它们主要基于图像,因此不利用多个视频帧中内核的时间特性。在本文中,我们研究了核的时间特性,并强调了它在盲视频超分辨率任务中的重要性。具体地说,我们测量了真实世界视频的内核时间一致性,并说明了在场景及其对象的动态性不同的视频中,估计的内核在每帧中是如何变化的。有了这一新的见解,我们回顾了以前流行的视频SR方法,并表明以前在整个恢复过程中使用固定内核的假设在放大真实世界的视频时会导致视觉伪影。为了解决这个问题,我们定制了现有的单图像和视频SR技术,以在内核估计和视频放大过程中利用内核一致性。对合成视频和真实视频的大量实验表明,从数量和质量上都有很大的恢复收益,实现了盲视频SR的最新技术,并强调了利用内核时间一致性的潜力。 摘要:Deep learning-based blind super-resolution (SR) methods have recently achieved unprecedented performance in upscaling frames with unknown degradation. These models are able to accurately estimate the unknown downscaling kernel from a given low-resolution (LR) image in order to leverage the kernel during restoration. Although these approaches have largely been successful, they are predominantly image-based and therefore do not exploit the temporal properties of the kernels across multiple video frames. In this paper, we investigated the temporal properties of the kernels and highlighted its importance in the task of blind video super-resolution. Specifically, we measured the kernel temporal consistency of real-world videos and illustrated how the estimated kernels might change per frame in videos of varying dynamicity of the scene and its objects. With this new insight, we revisited previous popular video SR approaches, and showed that previous assumptions of using a fixed kernel throughout the restoration process can lead to visual artifacts when upscaling real-world videos. In order to counteract this, we tailored existing single-image and video SR techniques to leverage kernel consistency during both kernel estimation and video upscaling processes. Extensive experiments on synthetic and real-world videos show substantial restoration gains quantitatively and qualitatively, achieving the new state-of-the-art in blind video SR and underlining the potential of exploiting kernel temporal consistency.