网站首页 > 技术教程正文

手把手教大家在普通英特尔笔记本上运行ChatGLM3-6B大模型(四)

xnh888 2025-01-11 18:11:00 技术教程 67 ℃ 0 评论

背景：

在本系列第一篇文章中介绍了本地原生运行ChatGLM3-6B是需要有有显卡的，经过实际实验最低的电脑配置要求如下：

Int4 版本的 ChatGLM3-6B最低的配置要求：内存：>= 8GB NVIDIA显存: >= 5GB（1060 6GB,2060 6GB）
Int16 版本的 ChatGLM3-6B最低的配置要求内存：>= 16GB NVIDIA显存: >= 13GB（4080 16GB）（4080 16GB）

但是，并不是所有人都有独立NVIDIA显卡的机器，尤其一个RTX 4080 16GB显卡8000元多，不是普通人可以承受的。

那是否有其他的方法可以在普通笔记本上便可以在本地运行ChatGLM3-6B大模型呢，今天这篇文章就来手把手教大家在普通笔记本电脑上本地运行大模型。

首先为大家介绍英特尔专门推出的开源框架BigDL-LLM，针对Intel硬件的低比特量化进行了专门设计，从而使大模型可以在普通的笔记本上进行正常运行

BigDL-LLM框架介绍

BigDL-LLM是基于英特尔? XPU（如CPU、GPU）平台的开源大模型加速库；它使用低比特优化（如FP4/INT4/NF4/FP8/INT8）及多种英特尔? CPU/GPU集成的硬件加速技术，以极低的延迟运行和微调大语言模型。

BigDL-LLM支持标准的PyTorch API（如 HuggingFace Transformers 和 LangChain）和大模型工具（如HuggingFace PEFT、DeepSpeed、vLLM等），可助力 AI 开发者和研究者在英特尔平台（笔记本、工作站、服务器和GPU）上高效开发、加速大语言模型算法和应用。

使用 BigDL-LLM 非常简单；只需更改一行代码，您就可以立即观察到显著的加速效果。大量模型（如 LLaMA/LLaM2、ChatGLM2/ChatGLM3、Mistral、Falcon、MPT、LLaVA、StarCoder、Whisper、百川/百川2、通义千问/通义千问VL、书生、悟道天鹰、MOSS等）已在BigDL-LLM上得到验证和优化。

使用 BigDL-LLM量化并部署 ChatGLM3-6B

1.第一步：安装python环境

miniconda工具安装已经设置国内加速源：详见系列一：手把手教大家在本地运行ChatGLM3-6B大模型（一）

用下面的命令创建名为 py3.9 的虚拟环境：

conda create -n py3.9 python=3.9
conda activate py3.9

2.第二步：安装BigDL-LLM框架

pip install --pre --upgrade bigdl-llm[all] -i https://mirrors.aliyun.com/pypi/simple/

3.第三步：下载ChatGLM3-6B 模型到本地：

git clone https://www.modelscope.cn/ZhipuAI/chatglm3-6b.git

4.第四步：本地运行，开始实验

命令行模式示例

范例代码如下：

import time
from bigdl.llm.transformers import AutoModel
from transformers import AutoTokenizer

CHATGLM_V3_PROMPT_FORMAT = "<|user|>\n{prompt}\n<|assistant|>"

# 请指定chatglm3-6b的本地路径
model_path = "D:/Dev/AGI/chatglm/chatglm3-6b" #替换为您下载的ChatGLM3-6B 模型目录

# 载入ChatGLM3-6B模型并实现INT4量化
model = AutoModel.from_pretrained(model_path,
                                  load_in_4bit=True,
                                  trust_remote_code=True)
# 载入tokenizer
tokenizer = AutoTokenizer.from_pretrained(model_path,
                                          trust_remote_code=True)
# 制作ChatGLM3格式提示词    
prompt = CHATGLM_V3_PROMPT_FORMAT.format(prompt="您是谁？?")
# 对提示词编码
input_ids = tokenizer.encode(prompt, return_tensors="pt")
st = time.time()
# 执行推理计算，生成Tokens
output = model.generate(input_ids,max_new_tokens=32)
end = time.time()
# 对生成Tokens解码并显示
output_str = tokenizer.decode(output[0], skip_special_tokens=True)
print(f'Inference time: {end-st} s')
print('-'*20, 'Prompt', '-'*20)
print(prompt)
print('-'*20, 'Output', '-'*20)
print(output_str)

把以上文件命名为chatglm3_infer.py，并在命令执行：
python chatglm3_infer.py

执行结果如下：
Loading checkpoint shards: 100%|█████████████████| 7/7 [00:00<00:00, 16.92it/s] 
2024-03-05 23:39:07,027 - INFO - Converting the current model to sym_int4 format......
Inference time: 7.945275545120239 s
-------------------- Prompt --------------------
<|user|>
您是谁？?
<|assistant|>
-------------------- Output --------------------
[gMASK]sop <|user|>
您是谁？?
<|assistant|> 我是一个人工智能助手，很高兴为您服务。请问有什么问题我可以帮您解答吗？

Streamlit网页示例

范例代码如下：

import streamlit as st
from bigdl.llm.transformers import AutoModel
from transformers import AutoTokenizer

# 设置页面标题、图标和布局
st.set_page_config(
    page_title="ChatGLM3-6B+BigDL-LLM演示",
    page_icon=":robot:",
    layout="wide"
)
# 请指定chatglm3-6b的本地路径
model_path = "D:/Dev/AGI/chatglm/chatglm3-6b" #替换为您下载的ChatGLM3-6B 模型目录

@st.cache_resource
def get_model():
    # 载入ChatGLM3-6B模型并实现INT4量化
    model = AutoModel.from_pretrained(model_path,
                                    load_in_4bit=True,
                                    trust_remote_code=True)
    # 载入tokenizer
    tokenizer = AutoTokenizer.from_pretrained(model_path,
                                            trust_remote_code=True)
    return tokenizer, model

# 加载Chatglm3的model和tokenizer
tokenizer, model = get_model()

# 初始化历史记录和past key values
if "history" not in st.session_state:
    st.session_state.history = []
if "past_key_values" not in st.session_state:
    st.session_state.past_key_values = None

# 设置max_length、top_p和temperature
max_length = st.sidebar.slider("max_length", 0, 32768, 8192, step=1)
top_p = st.sidebar.slider("top_p", 0.0, 1.0, 0.8, step=0.01)
temperature = st.sidebar.slider("temperature", 0.0, 1.0, 0.6, step=0.01)

# 清理会话历史
buttonClean = st.sidebar.button("清理会话历史", key="clean")
if buttonClean:
    st.session_state.history = []
    st.session_state.past_key_values = None
    st.rerun()

# 渲染聊天历史记录
for i, message in enumerate(st.session_state.history):
    if message["role"] == "user":
        with st.chat_message(name="user", avatar="user"):
            st.markdown(message["content"])
    else:
        with st.chat_message(name="assistant", avatar="assistant"):
            st.markdown(message["content"])

# 输入框和输出框
with st.chat_message(name="user", avatar="user"):
    input_placeholder = st.empty()
with st.chat_message(name="assistant", avatar="assistant"):
    message_placeholder = st.empty()

# 获取用户输入
prompt_text = st.chat_input("请输入您的问题")

# 如果用户输入了内容,则生成回复
if prompt_text:

    input_placeholder.markdown(prompt_text)
    history = st.session_state.history
    past_key_values = st.session_state.past_key_values
    for response, history, past_key_values in model.stream_chat(
        tokenizer,
        prompt_text,
        history,
        past_key_values=past_key_values,
        max_length=max_length,
        top_p=top_p,
        temperature=temperature,
        return_past_key_values=True,
    ):
        message_placeholder.markdown(response)

    # 更新历史记录和past key values
    st.session_state.history = history
    st.session_state.past_key_values = past_key_values

首先要安装Streamlit，命令如下：

pip install gradio mdtex2html streamlit -i https://mirrors.aliyun.com/pypi/simple/

安装完毕后，执行范例文件：

streamlit run chatglm3_web_demo.py

系统会显示如下信息，如下所示：

Welcome to Streamlit!

      If you’d like to receive helpful onboarding emails, news, offers, promotions,
      and the occasional swag, please enter your email address below. Otherwise,
      leave this field blank.

      Email: xxxxx@qq.com   #这个需要根据系统提示填写对应的邮箱
2024-03-05 23:47:22.365 Error saving email: ('Connection aborted.', ConnectionResetError(10054, '远程主机强迫关闭了一个现有的连接。', None, 10054, None))       

  You can find our privacy policy at https://streamlit.io/privacy-policy        

  Summary:
  - This open source library collects usage statistics.
  - We cannot see and do not store information contained inside Streamlit apps, 
    such as text, charts, images, etc.
  - Telemetry data is stored in servers in the United States.
  - If you'd like to opt out, add the following to %userprofile%/.streamlit/config.toml,
    creating that file if necessary:

    [browser]
    gatherUsageStats = false


  You can now view your Streamlit app in your browser.

  Local URL: http://localhost:8501
  Network URL: http://192.168.3.232:8501

Loading checkpoint shards: 100%|█████████████████| 7/7 [00:00<00:00, 11.67it/s]
2024-03-05 23:47:28,854 - INFO - Converting the current model to sym_int4 format......

执行完毕后，系统会自动在浏览器打开Streamlit网页，如下所示：

总结

BigDL-LLM工具可助力AI开发者和研究者在英特尔平台上加速优化大语言模型，提升大语言模型在英特尔平台上的使用体验，大大降低了大模型的硬件门槛。

上一篇：阿里云盘iOS新版本4.9.0发布:可随机抽取他人分享资源
下一篇：使用 Python 启动简易的 http 服务器

网站首页 > 技术教程正文

手把手教大家在普通英特尔笔记本上运行ChatGLM3-6B大模型(四)

背景：

BigDL-LLM框架介绍

使用 BigDL-LLM量化并部署 ChatGLM3-6B

总结

猜你喜欢

本文暂时没有评论，来添加一个吧(●'◡'●)

取消回复欢迎你发表评论:

网站首页 > 技术教程 正文

手把手教大家在普通英特尔笔记本上运行ChatGLM3-6B大模型(四)

背景：

BigDL-LLM框架介绍

使用 BigDL-LLM量化并部署 ChatGLM3-6B

总结

猜你喜欢

本文暂时没有评论，来添加一个吧(●'◡'●)

取消回复欢迎 你 发表评论:

网站首页 > 技术教程正文

取消回复欢迎你发表评论: