🤺Universal and Transferable Adversarial Attacks on😊Aligned Language Models

【🤺Universal and Transferable Adversarial Attacks on😊Aligned Language Models】的更多相关文章

Mind the Box: $\ell_1$-APGD for Sparse Adversarial Attacks on Image Classifiers

目录概主要内容 Croce F. and Hein M. Mind the box: $\ell_1$-APGD for sparse adversarial attacks on image classifiers. In International Conference on Machine Learning (ICML), 2021. 概以往的$\ell_1$攻击, 为了保证 \[\|x' - x\|_1 \le \epsilon, x' \in [0, 1]^d, \] 其…

Defending Adversarial Attacks by Correcting logits

目录概主要内容实验 Li Y., Xie L., Zhang Y., Zhang R., Wang Y., Tian Q., Defending Adversarial Attacks by Correcting logits[J]. arXiv: Learning, 2019. 概作者认为, adversarial samples 和 natural samples的分布是不同, 结果二者的输出logits的分布也是不同的, 那么能否通过此来还原正确的类别呢? 主要内容思路是这样子的…

DEFENSE-GAN: PROTECTING CLASSIFIERS AGAINST ADVERSARIAL ATTACKS USING GENERATIVE MODELS

目录概主要内容 Samangouei P, Kabkab M, Chellappa R, et al. Defense-GAN: Protecting Classifiers Against Adversarial Attacks Using Generative Models.[J]. arXiv: Computer Vision and Pattern Recognition, 2018. @article{samangouei2018defense-gan:, title={Defen…

Towards Deep Learning Models Resistant to Adversarial Attacks

目录概主要内容 Note Madry A, Makelov A, Schmidt L, et al. Towards Deep Learning Models Resistant to Adversarial Attacks.[J]. arXiv: Machine Learning, 2017. @article{madry2017towards, title={Towards Deep Learning Models Resistant to Adversarial Attacks.},…

【NLP】Conditional Language Models

Language Model estimates the probs that the sequences of words can be a sentence said by a human. Training it, we can get the embeddings of the whole vocabulary. UnConditional Language Model just assigns probs to sequences of words. That's to say, gi…

【NLP】Recurrent Neural Network and Language Models

0. Overview What is language models? A time series prediction problem. It assigns a probility to a sequence of words,and the total prob of all the sequence equal one. Many Natural Language Processing can be structured as (conditional) language modell…

论文阅读 | Real-Time Adversarial Attacks

摘要以前的对抗攻击关注于静态输入,这些方法对流输入的目标模型并不适用.攻击者只能通过观察过去样本点在剩余样本点中添加扰动. 这篇文章提出了针对于具有流输入的机器学习模型的实时对抗攻击. 1 介绍在实时处理场景中,攻击者只能观察数据样本的过去部分,并且只能向数据样本的未来部分添加扰动,而目标模型的决策将基于整个数据样本. 当攻击实时系统时,攻击者面临着观察空间和操作空间之间的权衡.也就是说,假设目标系统接受顺序输入x,攻击者可以选择在开始时设计对抗性扰动.然而,在这种情况下,攻击者对x没有任何…

0-4评价一个语言模型Evaluating Language Models:Perplexity

有了一个语言模型,就要判断这个模型的好坏. 现在假设: 我们有一些测试数据,test data.测试数据中有m个句子;s1,s2,s3-,sm 我们可以查看在某个模型下面的概率: 我们也知道,如果计算相乘是非常麻烦的,可以在此基础上,以另一种形式来计算模型的好坏程度. 在相乘的基础上,运用Log,来把乘法转换成加法来计算. 补充一下,在这里的p(Si)其实就等于我们前面所介绍的q(the|*,*)*q(dog|*,the)*q(-)- 有了上面的式子,评价一个模型是否好坏的原理在于: a g…

论文阅读 | Transformer-XL: Attentive Language Models beyond a Fixed-Length Context

0 简述 Transformer最大的问题:在语言建模时的设置受到固定长度上下文的限制. 本文提出的Transformer-XL,使学习不再仅仅依赖于定长,且不破坏时间的相关性. Transformer-XL包含segment-level 循环机制和positional编码框架.不仅可以捕捉长时依赖,还可以解决上下文断片问题 fragmentation problem.可以学到比RNNs长80%的依赖,比vanilla Transformers长450%.在长短序列上都取得了更好的结果.与van…

论文笔记 - Calibrate Before Use: Improving Few-Shot Performance of Language Models

Motivation 无需参数更新的 In-Context Learning 允许使用者在无参数的更新的情况下完成新的下游任务,交互界面是纯粹的自然语言,无 NLP 技术基础的用户也可以创建 NLP 系统: ICL 存在的主要问题是模性能的不稳定性(与 Prompt 的设计强相关),也就是高方差.主要包括三个影响因素: Template: Example 的选取: Example 的排列顺序(Permutation). Analysis 导致不稳定性的原因: majority label bia…