llama : use F32 precision in GLM4 attention and no FA (#9130) #14390

Job	Run time
Push Docker image to Docker Hub (light, .devops/llama-cli.Dockerfile, linux/amd64,linux/arm64)	14m 20s
Push Docker image to Docker Hub (server, .devops/llama-server.Dockerfile, linux/amd64,linux/arm64)	18m 3s
Push Docker image to Docker Hub (full, .devops/full.Dockerfile, linux/amd64,linux/arm64)	33m 44s
Push Docker image to Docker Hub (light-cuda, .devops/llama-cli-cuda.Dockerfile, linux/amd64)	1h 56m 46s
Push Docker image to Docker Hub (server-cuda, .devops/llama-server-cuda.Dockerfile, linux/amd64)	57m 19s
Push Docker image to Docker Hub (full-cuda, .devops/full-cuda.Dockerfile, linux/amd64)	2h 35m 22s
Push Docker image to Docker Hub (light-rocm, .devops/llama-cli-rocm.Dockerfile, linux/amd64,linux...	34m 23s
Push Docker image to Docker Hub (server-rocm, .devops/llama-server-rocm.Dockerfile, linux/amd64,l...	34m 42s
Push Docker image to Docker Hub (light-intel, .devops/llama-cli-intel.Dockerfile, linux/amd64)	13m 27s
Push Docker image to Docker Hub (server-intel, .devops/llama-server-intel.Dockerfile, linux/amd64)	14m 54s
	8h 13m 0s

Provide feedback