论文笔记——early convolutions help transformers see better

一、motivation二、solution

Triton安

442人浏览 · 2021-12-21 11:06:24

Triton安 · 2021-12-21 11:06:24 发布

一、motivation

Vision transformer (ViT) models exhibit substandard optimizability. In particular,

they are sensitive to the choice of optimizer (AdamW vs. SGD), optimizer hyperpa

rameters, and training schedule length. In comparison, modern convolutional neural

networks are easier to optimize。

问题假设为：

In this work, we conjecture

that the issue lies with the patchify stem of ViT models, which is implemented by

a stride-p p×p convolution (p = 16 by default) applied to the input image. This

large-kernel plus large-stride convolution runs counter to typical design choices

of convolutional layers in neural networks.

In this paper we hypothesize that the issues lies primarily in the early visual processing performed by ViT.

二、solution

个人认为文章里面有一个很好的思路：

AI Agent技术社区

Agent 垂直技术社区，欢迎活跃、内容共建。

更多推荐

如何卸载openclaw

OpenClaw（俗称"龙虾"）是一个本地 AI 智能体平台，用于在电脑上部署自主运行的 AI 代理。

AI Agent技术社区

（已解决）安装openclaw龙虾[特殊字符]npm权限问题EACCES

先安装升级完成node和homebrew后。安装就很快了。但是遇到EACCESS问题！！！发现报错了。问题错误：核心问题是sharp解决（90%人遇到的）安装 macOS 编译工具很多人缺少，导致sharp无法编译。运行：xcode-select --install安装完成后重新执行：sharp编译需要 C++ 编译器和 node-gyp，这些都来自 Xcode CLI。