Conversation
…odified with deltas from ggml/examples/mpt
|
quantize warns because it is looking for attn_k and not attn_qkv: |
Now fixed as well. |
…rom metadata rather than use 0.0 to indicate "no clamping" (more compliant with the current GGUF spec?)
…T_KEY macro instead of duplicate code
|
…nd rope_shift from build_mpt
|
Note that this PR does not include the modifications of convert script proposed in #3252 and referred to in #3417 (comment) yet. Since this PR is based on a pre-merge commit of #3252, it may be easier to add this change after the merge. |
…nvert-gptneox-hf-to-gguf.py in pr:3252
|
@cebtenzzre Thanks for the merge. If anyone can give this a quick try and confirms working, we should merge. |
Works for me. The PR is now almost the same as my own previous private merge attempt. The disable-n_past-assertion changes to ggml_compute_forward_alibi_f16 and ggml_compute_forward_alibi_f32 could be made syntactically more consistent - but AFAICS they are functionally equivalent. So not a show stopper for merge into master. |
…g hparams["vocab_size"]
…example * 'master' of github.com:ggerganov/llama.cpp: (34 commits) examples: support LLaVA v1.5 (multimodal model) (ggml-org#3436) docs : fix typo GOMP_CPU_AFFINITY (ggml-org#3597) cmake : fix add_compile_options on macOS typo : it is `--n-gpu-layers` not `--gpu-layers` (ggml-org#3592) ci : check if there is enough VRAM (ggml-org#3596) server : add completion mode (no chat) (ggml-org#3582) prompts : add mnemonics.txt server : fix kv cache management (ggml-org#3588) main : fix session loading bug (ggml-org#3400) server : add parameter -tb N, --threads-batch N (ggml-org#3584) common : fix mirostat state when using multiple sequences (ggml-org#3543) batched : add bench tool (ggml-org#3545) examples : add batched.swift + improve CI for swift (ggml-org#3562) Add MPT model to supported models in README.md (ggml-org#3574) Minor improvements in GPT2 tokenizer (ggml-org#3567) readme : add bloom (ggml-org#3570) llm : add bloom models (ggml-org#3553) swift : improvements and fixes (ggml-org#3564) llm : add MPT support (ggml-org#3417) infill. : fix tokenization (ggml-org#3508) ...
Co-authored-by: Cebtenzzre <cebtenzzre@gmail.com> Co-authored-by: Georgi Gerganov <ggerganov@gmail.com> (cherry picked from commit f5f9121)
|
I converted the mpt-7b-chat and the mpt-7b-storywriter. The conversion and quantization completes sucessfully and produces the .gguf files. however, the files don't work for me. When running main with them, i get an For reference, here is the full output: I already have successfull converted a bunch of falcon models that work fine, butthe mpt conversion script does not work for me. |
|
Here is a hexdump of the beginning of the files: in comparison to the openbuddy falconversion that works fine: What I notice is that after In contrast, the actual falcon model has a |
As per #1333 (comment)
Some comments regarding this initial implementation: