Swap layers tool #1

Ar57m · 2023-11-27T02:35:09Z

Ar57m
Nov 27, 2023

Suggestion: If you can make a tool that pick up a range of layers of a modelA.gguf and swap/overwrite directly where is specified in the modelB.gguf, on f16, q8_0 and other quants if possible. if it could use mmapping to do it without using too much memory.

KerfuffleV2 · 2023-11-27T14:57:46Z

KerfuffleV2
Nov 27, 2023
Maintainer

Do you mean generate a new model file, or actually modify the second one directly? Swapping tensors (or layers which are sets of tensors) in the second case would only be possible if the sizes were exactly the same. So you could only swap tensors from an f16 source to an f16 destination, etc.

9 replies

KerfuffleV2 Dec 7, 2023
Maintainer

Did some planning for the possible definition format, not much actual code yet though.

# Dictionary of tensor/metadata sources
inputs:
    testy4:
        path: /blah/testy4.gguf
        # Guessed from the filename if not specified, one of gguf, torch, safetensors
        type: gguf

        # Convert to GGUF naming convention for architectures GGUF knows
        convert_names: true

        # Only needed for non-GGUF architectures or naming conventions
        layer_name_regex: '^blk\.([0-9]+)\..*'

# Metadata output is a list and is processed in order
# Later items can overwrite earlier ones.
metadata:
    # List of literal keys/values
    - type: literal
      items:
        - key: a.b.c

          # Possibly add something like an "auto" type that can guess what GGUF
          # would normally use.
          # Only one level arrays supported, syntax something like "uint64array"
          type: uint64

          value: 123

    # Copy metadata from input (only GGUF)
    - type: copy

      # Name from inputs section
      source: testy4

      # Define one of regex or key_regex
      key_regex: '^llama\.attention\.'
      key: llama.attention.head_count

      # Defaults to false, reverse match logic
      not: false

    # Delete metadata items
    - type: delete
      # Define one of regex or key_regex
      key_regex: '^llama\.attention\.'
      key: llama.attention.head_count

      # Defaults to false, reverse match logic
      not: false

# As with metadata, is processed in order and later items can overwrite/modify earlier ones.
# This is used to build a plan for building the output, if you do something like
# copy all tensors then delete some it won't actually copy then remove some.
tensors:
    - type: copy
      source: testy4

      # One of key or key_regex
      key: abc

      key_regex: 'abc'

      # Optional, convert tensors to fp16, fp32, or q8_0, probably only works for f16, f32 or bf16 inputs.
      convert: fp16
      # Maybe add support for special stuff like the permute convert.py does for some models

      not: false

    - type: rename
      source: testy4
      regex: '^some_weird_convention.blah.([0-9]+)(.*)'
      # Python pattern syntax, so \1 is group 1, etc.
      replace: 'blk.\1\2'

    - type: copy_layers
      source: testy4
      layers: [1, 2, "10-20"]

      # Result would be 1->10, 2->11, 10->12, [...] 20->22
      destination_layers_start: 10

      # optional
      key_regex: xyz

      not: false

    # Delete from output plan
    - type: delete
      # One of key or key_regex
      key: abc

      key_regex: 'abc'

      layers: [1, 2, "10-20"]

      not: false

That looks complicated because it shows every feature. If you just wanted to do something like copy all metadata from modela.gguf and non-layer tensors + layers 0-10, then layers 11-20 from modelb.gguf it would just be something like:

inputs:
  modela: { "/path/modela.gguf" }
  modelb: { "/path/modelb.gguf" }
metadata:
  - type: copy
    source: modela
tensors:
  - type: copy
    source: modela
    regex: '^blk\.'
    not: true # copying all tensors that don't match layer naming
  - type: copy_layers
    source: modela
    layers: [ "0-10" ]
    destination_layers_start: 0
  - type: copy_layers
    source: modelb
    layers: [ "11-20" ]
    destination_layers_start: 11

Not really any actual code yet. (Example is YAML since it's more readable but JSON as a definition format will also be supported... assuming this thing actually ever exists which isn't guaranteed.)

Ar57m Dec 7, 2023
Author

I'm not very skilled in programming, but it looks good to me. Here are some repositories for you to take a look, you probably know them:
https://github.com/Gryphe/MergeMonster
https://github.com/cg123/mergekit

KerfuffleV2 Dec 7, 2023
Maintainer

Ahh, are you saying you want to actually merge the data not just copy it around? Unfortunately doing that is beyond my capabilities. Actually merging models is different from just mixing and matching the existing data.

Ar57m Dec 7, 2023
Author

I'm sorry I confused swapping with merging(because I use passthrough on mergekit to swap layers) my bad 😅, I mean swapping, forget what I say before

KerfuffleV2 Jan 14, 2024
Maintainer

Sorry, unfortunately I doubt I am ever going to get to this. I just really haven't had any time or energy to work on personal projects lately and just in general my interest level for LLM stuff is very low. I sincerely apologize for getting your hopes up.

Maybe someone else can do something with the general design.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Swap layers tool #1

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 1 comment 9 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Swap layers tool #1

Ar57m Nov 27, 2023

Replies: 1 comment · 9 replies

KerfuffleV2 Nov 27, 2023 Maintainer

KerfuffleV2 Dec 7, 2023 Maintainer

Ar57m Dec 7, 2023 Author

KerfuffleV2 Dec 7, 2023 Maintainer

Ar57m Dec 7, 2023 Author

KerfuffleV2 Jan 14, 2024 Maintainer

Ar57m
Nov 27, 2023

Replies: 1 comment 9 replies

KerfuffleV2
Nov 27, 2023
Maintainer

KerfuffleV2 Dec 7, 2023
Maintainer

Ar57m Dec 7, 2023
Author

KerfuffleV2 Dec 7, 2023
Maintainer

Ar57m Dec 7, 2023
Author

KerfuffleV2 Jan 14, 2024
Maintainer