https://arstechnica.com/information-technology/2023/03/you-can-now-run-a-gpt-3-level-ai-model-on-your-laptop-phone-and-raspberry-pi/

Things are moving at lightning speed in AI Land. On Friday, a software developer named Georgi Gerganov created a tool called "llama.cpp" that can run Meta's new GPT-3-class AI large language model, LLaMA, locally on a Mac laptop. Soon thereafter, people worked out how to run LLaMA on Windows as well. Then someone showed it running on a Pixel 6 phone, and next came a Raspberry Pi (albeit running very slowly).

If this keeps up, we may be looking at a pocket-sized ChatGPT competitor before we know it.

 
Enter your email to get the Ars Technica newsletter
Join Ars Technica and
Get Our Best Tech Stories
DELIVERED STRAIGHT TO YOUR INBOX.
SIGN ME UP
By signing up, you agree to our user agreement (including the class action waiver and arbitration provisions), our privacy policy and cookie statement, and to receive marketing and account-related emails from Ars Technica. You can unsubscribe at any time.

But let's back up a minute, because we're not quite there yet. (At least not today—as in literally today, March 13, 2023.) But what will arrive next week, no one knows.

Since ChatGPT launched, some people have been frustrated by the AI model's built-in limits that prevent it from discussing topics that OpenAI has deemed sensitive. Thus began the dream—in some quarters—of an open source large language model (LLM) that anyone could run locally without censorship and without paying API fees to OpenAI.

Open source solutions do exist (such as GPT-J), but they require a lot of GPU RAM and storage space. Other open source alternatives could not boast GPT-3-level performance on readily available consumer-level hardware.

Enter LLaMA, an LLM available in parameter sizes ranging from 7B to 65B (that's "B" as in "billion parameters," which are floating point numbers stored in matrices that represent what the model "knows"). LLaMA made a heady claim: that its smaller-sized models could match OpenAI's GPT-3, the foundational model that powers ChatGPT, in the quality and speed of its output. There was just one problem—Meta released the LLaMA code open source, but it held back the "weights" (the trained "knowledge" stored in a neural network) for qualified researchers only.

Advertisement

 

Flying at the speed of LLaMA

Meta's restrictions on LLaMA didn't last long, because on March 2, someone leaked the LLaMA weights on BitTorrent. Since then, there has been an explosion of development surrounding LLaMA. Independent AI researcher Simon Willison has compared this situation to the release of Stable Diffusion, an open source image synthesis model that launched last August. Here's what he wrote in a post on his blog:

It feels to me like that Stable Diffusion moment back in August kick-started the entire new wave of interest in generative AI—which was then pushed into over-drive by the release of ChatGPT at the end of November.

That Stable Diffusion moment is happening again right now, for large language models—the technology behind ChatGPT itself. This morning I ran a GPT-3 class language model on my own personal laptop for the first time!

AI stuff was weird already. It’s about to get a whole lot weirder.

Typically, running GPT-3 requires several datacenter-class A100 GPUs (also, the weights for GPT-3 are not public), but LLaMA made waves because it could run on a single beefy consumer GPU. And now, with optimizations that reduce the model size using a technique called quantization, LLaMA can run on an M1 Mac or a lesser Nvidia consumer GPU (although "llama.cpp" only runs on CPU at the moment—which is impressive and surprising in its own way).

Things are moving so quickly that it's sometimes difficult to keep up with the latest developments. (Regarding AI's rate of progress, a fellow AI reporter told Ars, "It's like those videos of dogs where you upend a crate of tennis balls on them. [They] don't know where to chase first and get lost in the confusion.")

For example, here's a list of notable LLaMA-related events based on a timeline Willison laid out in a Hacker News comment:

  • February 24, 2023: Meta AI announces LLaMA.
  • March 2, 2023: Someone leaks the LLaMA models via BitTorrent.
  • March 10, 2023: Georgi Gerganov creates llama.cpp, which can run on an M1 Mac.
  • March 11, 2023: Artem Andreenko runs LLaMA 7B (slowly) on a Raspberry Pi 4, 4GB RAM, 10 sec/token.
  • March 12, 2023: LLaMA 7B running on NPX, a node.js execution tool.
  • March 13, 2023: Someone gets llama.cpp running on a Pixel 6 phone, also very slowly.
  • March 13, 2023, 2023: Stanford releases Alpaca 7B, an instruction-tuned version of LLaMA 7B that "behaves similarly to OpenAI's "text-davinci-003" but runs on much less powerful hardware.

Advertisement

After obtaining the LLaMA weights ourselves, we followed Willison's instructions and got the 7B parameter version running on an M1 Macbook Air, and it runs at a reasonable rate of speed. You call it as a script on the command line with a prompt, and LLaMA does its best to complete it in a reasonable way.

Enlarge / A screenshot of LLaMA 7B in action on a MacBook Air running llama.cpp.
Benj Edwards / Ars Technica

There's still the question of how much the quantization affects the quality of the output. In our tests, LLaMA 7B trimmed down to 4-bit quantization was very impressive for running on a MacBook Air—but still not on par with what you might expect from ChatGPT. It's entirely possible that better prompting techniques might generate better results.

Also, optimizations and fine-tunings come quickly when everyone has their hands on the code and the weights—even though LLaMA is still saddled with some fairly restrictive terms of use. The release of Alpaca today by Stanford proves that fine tuning (additional training with a specific goal in mind) can improve performance, and it's still early days after LLaMA's release.

As of this writing, running LLaMA on a Mac remains a fairly technical exercise. You have to install Python and Xcode and be familiar with working on the command line. Willison has good step-by-step instructions for anyone who would like to attempt it. But that may soon change as developers continue to code away.

As for the implications of having this tech out in the wild—no one knows yet. While some worry about AI's impact as a tool for spam and misinformation, Willison says, "It’s not going to be un-invented, so I think our priority should be figuring out the most constructive possible ways to use it."

Right now, our only guarantee is that things will change rapidly.

【转帖】You can now run a GPT-3-level AI model on your laptop, phone, and Raspberry Pi的更多相关文章

  1. Raspberry Pi 3 --- Kernel Building and Run in A New Version Kernal

    ABSTRACT There are two main methods for building the kernel. You can build locally on a Raspberry Pi ...

  2. OSMC Vs. OpenELEC Vs. LibreELEC – Kodi Operating System Comparison

    Kodi's two slim-and-trim kid brothers LibreELEC and OpenELEC were once great solutions for getting t ...

  3. Jenkins报错Caused: java.io.IOException: Cannot run program "sh" (in directory "D:\Jenkins\Jenkins_home\workspace\jmeter_test"): CreateProcess error=2, 系统找不到指定的文件。

    想在本地执行我的python文件,我本地搭建了一个Jenkins,使用了execute shell来运行我的脚本,发现报错 [jmeter_test] $ sh -xe D:\tomcat\apach ...

  4. FAT-fs (mmcblk0p1): Volume was not properly unmounted. Some data may be corrupt. Please run fsck.

    /******************************************************************************** * FAT-fs (mmcblk0p ...

  5. (转载)如何创建一个以管理员权限运行exe 的快捷方式? How To Create a Shortcut That Lets a Standard User Run An Application as Administrator

    How To Create a Shortcut That Lets a Standard User Run An Application as Administrator by Chris Hoff ...

  6. java.io.IOException: Cannot run program "/opt/jdk1.8.0_191/bin/java" (in directory "/var/lib/jenkins/workspace/xinguan"): error=2, No such file or directory

    测试jenkins构建,报错如下 Parsing POMs Established TCP socket on 44463 [xinguan] $ /opt/jdk1.8.0_191/bin/java ...

  7. 【协作式原创】自己动手写docker之run代码解析

    linux预备知识 urfave cli预备知识 准备工作 阿里云抢占式实例:centos7.4 每次实例释放后都要重新安装go wget https://dl.google.com/go/go1.1 ...

  8. FreeBSD 10 发布

    发行注记:http://www.freebsd.org/releases/10.0R/relnotes.html 下文翻译中... 主要有安全问题修复.新的驱动与硬件支持.新的命名/选项.主要bug修 ...

  9. 【Xamarin挖墙脚系列:Mono项目的图标为啥叫Mono】

    因为发起人大Boss :Miguel de lcaza 是西班牙人,喜欢猴子.................就跟Hadoop的创始人的闺女喜欢大象一样...................... 历 ...

  10. Win8.1重装win7或win10中途无法安装

    一.有的是usb识别不了,因为新的机器可能都是USB3.0的,安装盘是Usb2.0的. F12更改系统BIOS设置,我改了三个地方: 1.设置启动顺序为U盘启动 2.关闭了USB3.0 control ...

随机推荐

  1. [Python急救站]定时关机程序

    收到朋友的请求,让我帮他做一个电脑关机程序,其实非常简单.代码如下: import tkinter as tk # 导入tkinter模块 from tkinter import ttk # 导入tt ...

  2. 2023-11-15:用go语言,如果一个正方形矩阵上下对称并且左右对称,对称的意思是互为镜像, 那么称这个正方形矩阵叫做神奇矩阵, 比如 : 1 5 5 1 6 3 3 6 6 3 3 6 1 5

    2023-11-15:用go语言,如果一个正方形矩阵上下对称并且左右对称,对称的意思是互为镜像, 那么称这个正方形矩阵叫做神奇矩阵, 比如 : 1 5 5 1 6 3 3 6 6 3 3 6 1 5 ...

  3. 平衡树——AVL算法

    平衡树--AVL算法 平衡树建立在二叉搜索树的基础上,加入了两侧子树大小相对平衡的特性而避免了很多情况下的算法退化.这里AVL算法实现的AVL树就是平衡树的一种. 1.二叉搜索树 在说平衡树之前我们得 ...

  4. 经验说丨华为云视频Cloud Native架构下实践

    摘要:来自华为云直播的段亮详细介绍华为云视频在Cloud Native的转型实践中遇到的问题.挑战以及解决之道. 随着云基础设施服务以及边缘计算技术的发展,Cloud Native,即云原生,架构理念 ...

  5. 实践GoF的设计模式:单例模式

    摘要:单例模式虽然简单易用,但也是最容易被滥用的设计模式.它并不是"银弹",在实际使用时,还需根据具体的业务场景谨慎使用. 本文分享自华为云社区<[Go实现]实践GoF的23 ...

  6. java反射机制原理剖析

    当程序运行时,允许改变程序结构或变量类型,这种语言称为动态语言.我们认为java并不是动态语言,但是java有一个非常突出的动态相关机制,俗称:反射. IT行业里这么说,没有反射也就没有框架,现有的框 ...

  7. 百度高德地图行政区域边界GeoJSON数据获取并绘制行政区域

    highcharts 是提供地图数据包的:https://www.highcharts.com/docs/maps/map-collection echart矢量地图或者地图绘制矢量图层,GeoJSO ...

  8. PPT 动画-旋转唱片

    插入图片.同心圆 按Shift 先点击背景图片,再点击 同心圆 合并形状,选择相交 设置动画,选择 陀螺旋,持续时间为 8秒, 打开计时窗口,重复为:直到幻灯片末尾

  9. STM32CubeMX教程15 ADC - 多重ADC转换

    1.准备材料 开发板(正点原子stm32f407探索者开发板V2.4) STM32CubeMX软件(Version 6.10.0) keil µVision5 IDE(MDK-Arm) ST-LINK ...

  10. .NET 6 整合 Autofac 依赖注入容器

    前言 一行业务代码还没写,框架代码一大堆,不利于学习. 常看到java的学习资料或博客,标题一般为<SpringBoot 整合 XXX>,所以仿照着写了<.NET 6 整合 Auto ...