vulkan: small mul_mat_vec optimizations (#10665)
* double the number of rows per workgroup
* Update ggml-vulkan.cpp
* Vulkan: Add VK_EXT_subgroup_size_control support to ensure full subgroups for coopmats
* only increase the number of rows for amd and subgroup size 64
* fix missing NUM_ROWS for mul_mat_vec_iq4_nl_f16_f32, untested
* use subgroup min and max to check for gcn (requires https://github.com/ggerganov/llama.cpp/pull/10721)
* manual merge ggml-vulkan.cpp
* set min and max subgroup size in any case
* Also double the number of rows for Intel GPUs