Meta Llama 3本地部署

Meta Llama 3本地部署

    正在检查是否收录...

感谢阅读

环境安装 收尾

环境安装

项目文件
下载完后在根目录进入命令终端(windows下cmd、linux下终端、conda的话activate)
运行

pip install -e . 

不要控制台,因为还要下载模型。这里挂着是节省时间

模型申请链接

复制如图所示的链接
然后在刚才的控制台

bash download.sh 

在验证哪里直接输入刚才链接即可
如果报错没有wget,则点我下载wget
然后放到C:\Windows\System32 下

torchrun --nproc_per_node 1 example_chat_completion.py \ --ckpt_dir Meta-Llama-3-8B-Instruct/ \ --tokenizer_path Meta-Llama-3-8B-Instruct/tokenizer.model \ --max_seq_len 512 --max_batch_size 6 

收尾

创建chat.py脚本

# Copyright (c) Meta Platforms, Inc. and affiliates. # This software may be used and distributed in accordance with the terms of the Llama 3 Community License Agreement. from typing import List, Optional import fire from llama import Dialog, Llama def main( ckpt_dir: str, tokenizer_path: str, temperature: float = 0.6, top_p: float = 0.9, max_seq_len: int = 512, max_batch_size: int = 4, max_gen_len: Optional[int] = None, ): """ Examples to run with the models finetuned for chat. Prompts correspond of chat turns between the user and assistant with the final one always being the user. An optional system prompt at the beginning to control how the model should respond is also supported. The context window of llama3 models is 8192 tokens, so `max_seq_len` needs to be <= 8192. `max_gen_len` is optional because finetuned models are able to stop generations naturally. """ generator = Llama.build( ckpt_dir=ckpt_dir, tokenizer_path=tokenizer_path, max_seq_len=max_seq_len, max_batch_size=max_batch_size, ) # Modify the dialogs list to only include user inputs dialogs: List[Dialog] = [ [{"role": "user", "content": ""}], # Initialize with an empty user input ] # Start the conversation loop while True: # Get user input user_input = input("You: ") # Exit loop if user inputs 'exit' if user_input.lower() == 'exit': break # Append user input to the dialogs list dialogs[0][0]["content"] = user_input # Use the generator to get model response result = generator.chat_completion( dialogs, max_gen_len=max_gen_len, temperature=temperature, top_p=top_p, )[0] # Print model response print(f"Model: {result['generation']['content']}") if __name__ == "__main__": fire.Fire(main) 

然后运行

torchrun --nproc_per_node 1 chat.py --ckpt_dir Meta-Llama-3-8B-Instruct/ --tokenizer_path Meta-Llama-3-8B-Instruct/tokenizer.model --max_seq_len 512 --max_batch_size 6 

llamatokenchatgeneratorpromptwindowsstemunitlinuxconversationaffiliatemunsatassistantiva节省时间promptsappunitybash
  • 本文作者:李琛
  • 本文链接: https://wapzz.net/post-15349.html
  • 版权声明:本博客所有文章除特别声明外,均默认采用 CC BY-NC-SA 4.0 许可协议。
本站部分内容来源于网络转载,仅供学习交流使用。如涉及版权问题,请及时联系我们,我们将第一时间处理。
文章很赞!支持一下吧 还没有人为TA充电
为TA充电
还没有人为TA充电
0
  • 支付宝打赏
    支付宝扫一扫
  • 微信打赏
    微信扫一扫
感谢支持
文章很赞!支持一下吧
关于作者
2.3W+
5
0
1
WAP站长官方

GPT-5倒计时!奥特曼踢走Ilya得逞,宫斗惊人内幕再曝光

上一篇

实现采集内容自动发布的智能化探索

下一篇
  • 复制图片
按住ctrl可打开默认菜单