Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

common: llama_load_model_from_url using --model-url #6098

Merged
merged 53 commits into from
Mar 17, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
53 commits
Select commit Hold shift + click to select a range
3221ab0
common: introduce llama_load_model_from_url to download model from hf…
phymbert Mar 16, 2024
a0ebdfc
common: llama_load_model_from_url witch to libcurl dependency
phymbert Mar 16, 2024
42b25da
common: PR feedback, rename the definition to LLAMA_USE_CURL
phymbert Mar 16, 2024
7e78285
common: LLAMA_USE_CURL in make toolchain
phymbert Mar 16, 2024
df0d822
ci: compile the server with curl, add make option curl example in def…
phymbert Mar 16, 2024
80bec98
llama_load_model_from_url: try to make the windows build passing
phymbert Mar 16, 2024
2c3a00e
Update Makefile
phymbert Mar 16, 2024
4135d4a
llama_load_model_from_url: typo
phymbert Mar 16, 2024
5d99f32
llama_load_model_from_url: download the file only if modified based o…
phymbert Mar 16, 2024
921e4af
ci: build, fix the default build to use LLAMA_CURL
phymbert Mar 16, 2024
6633689
llama_load_model_from_url: cleanup code
phymbert Mar 16, 2024
1430e89
Merge branch 'master' into hp/download-model-from-hf
phymbert Mar 16, 2024
e84206d
Update examples/server/README.md
phymbert Mar 16, 2024
4bc47b7
Update common/common.cpp
phymbert Mar 16, 2024
8751bd0
Update common/common.cpp
phymbert Mar 16, 2024
f53bfd5
Update common/common.cpp
phymbert Mar 16, 2024
b088122
Update common/common.cpp
phymbert Mar 16, 2024
f22456d
Update common/common.cpp
phymbert Mar 16, 2024
9565ae3
Update common/common.cpp
phymbert Mar 16, 2024
330e28d
Update common/common.cpp
phymbert Mar 16, 2024
89ab37a
Update common/common.cpp
phymbert Mar 16, 2024
be561a7
Update common/common.cpp
phymbert Mar 16, 2024
eb9e52a
Update common/common.cpp
phymbert Mar 16, 2024
b0b49e0
Update examples/main/README.md
phymbert Mar 16, 2024
545fef6
llama_load_model_from_url: fix compilation warning, clearer logging
phymbert Mar 16, 2024
4fadb07
server: tests: add `--model-url` tests
phymbert Mar 16, 2024
124c474
llama_load_model_from_url: coherent clearer logging
phymbert Mar 16, 2024
064dc07
common: CMakeLists.txt fix typo in logging when lib curl is not found
phymbert Mar 16, 2024
838178a
ci: tests: windows tests add libcurl
phymbert Mar 16, 2024
176f039
ci: tests: windows tests add libcurl
phymbert Mar 16, 2024
5df5605
ci: build: add libcurl in default make toolchain step
phymbert Mar 16, 2024
78812c6
llama_load_model_from_url: PR feedback, use snprintf instead of strnc…
phymbert Mar 16, 2024
1ad5a45
ci: build: add libcurl in default make toolchain step for tests
phymbert Mar 16, 2024
22b3bb3
common: fix windows build caused by double windows.h import
phymbert Mar 16, 2024
e6848ab
build: move the make build with env LLAMA_CURL to a dedicated place
phymbert Mar 16, 2024
d81acb6
build: introduce cmake option LLAMA_CURL to trigger libcurl linking t…
phymbert Mar 16, 2024
dbd9691
build: move the make build with env LLAMA_CURL to a dedicated place
phymbert Mar 16, 2024
9da4eec
llama_load_model_from_url: minor spacing and log message changes
phymbert Mar 16, 2024
89d3483
ci: build: fix ubuntu-focal-make-curl
phymbert Mar 16, 2024
13d8817
ci: build: try to fix the windows build
phymbert Mar 16, 2024
1ddaf71
common: remove old dependency to openssl
phymbert Mar 16, 2024
73b4b44
common: fix build
phymbert Mar 16, 2024
a3ed3d4
common: fix windows build
phymbert Mar 17, 2024
5e66ec8
common: fix windows tests
phymbert Mar 17, 2024
9ca4acc
common: fix windows tests
phymbert Mar 17, 2024
c1b002e
common: llama_load_model_from_url windows set CURLOPT_SSL_OPTIONS, CU…
phymbert Mar 17, 2024
cff7faa
ci: tests: print server logs in case of scenario failure
phymbert Mar 17, 2024
4fe431d
common: llama_load_model_from_url: make it working on windows: disabl…
phymbert Mar 17, 2024
47a9e5d
ci: tests: increase timeout for windows
phymbert Mar 17, 2024
31272c6
common: fix typo
phymbert Mar 17, 2024
f902ab6
common: llama_load_model_from_url use a temporary file for downloading
phymbert Mar 17, 2024
b24f30f
common: llama_load_model_from_url delete previous file before downloa…
phymbert Mar 17, 2024
fcf327f
ci: tests: fix behavior on windows
phymbert Mar 17, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
22 changes: 22 additions & 0 deletions .github/workflows/build.yml
Original file line number Diff line number Diff line change
Expand Up @@ -48,6 +48,28 @@ jobs:
CC=gcc-8 make tests -j $(nproc)
make test -j $(nproc)

ubuntu-focal-make-curl:
runs-on: ubuntu-20.04

steps:
- name: Clone
id: checkout
uses: actions/checkout@v3

- name: Dependencies
id: depends
run: |
sudo apt-get update
sudo apt-get install build-essential gcc-8 libcurl4-openssl-dev

- name: Build
id: make_build
env:
LLAMA_FATAL_WARNINGS: 1
LLAMA_CURL: 1
run: |
CC=gcc-8 make -j $(nproc)

ubuntu-latest-cmake:
runs-on: ubuntu-latest

Expand Down
20 changes: 18 additions & 2 deletions .github/workflows/server.yml
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,8 @@ jobs:
cmake \
python3-pip \
wget \
language-pack-en
language-pack-en \
libcurl4-openssl-dev
- name: Build
id: cmake_build
Expand All @@ -67,6 +68,7 @@ jobs:
cmake .. \
-DLLAMA_NATIVE=OFF \
-DLLAMA_BUILD_SERVER=ON \
-DLLAMA_CURL=ON \
-DCMAKE_BUILD_TYPE=${{ matrix.build_type }} \
-DLLAMA_SANITIZE_${{ matrix.sanitizer }}=ON ;
cmake --build . --config ${{ matrix.build_type }} -j $(nproc) --target server
Expand Down Expand Up @@ -101,12 +103,21 @@ jobs:
with:
fetch-depth: 0

- name: libCURL
id: get_libcurl
env:
CURL_VERSION: 8.6.0_6
run: |
curl.exe -o $env:RUNNER_TEMP/curl.zip -L "https://curl.se/windows/dl-${env:CURL_VERSION}/curl-${env:CURL_VERSION}-win64-mingw.zip"
mkdir $env:RUNNER_TEMP/libcurl
tar.exe -xvf $env:RUNNER_TEMP/curl.zip --strip-components=1 -C $env:RUNNER_TEMP/libcurl
- name: Build
id: cmake_build
run: |
mkdir build
cd build
cmake .. -DLLAMA_BUILD_SERVER=ON -DCMAKE_BUILD_TYPE=Release ;
cmake .. -DLLAMA_CURL=ON -DCURL_LIBRARY="$env:RUNNER_TEMP/libcurl/lib/libcurl.dll.a" -DCURL_INCLUDE_DIR="$env:RUNNER_TEMP/libcurl/include"
cmake --build . --config Release -j ${env:NUMBER_OF_PROCESSORS} --target server
- name: Python setup
Expand All @@ -120,6 +131,11 @@ jobs:
run: |
pip install -r examples/server/tests/requirements.txt
- name: Copy Libcurl
id: prepare_libcurl
run: |
cp $env:RUNNER_TEMP/libcurl/bin/libcurl-x64.dll ./build/bin/Release/libcurl-x64.dll
- name: Tests
id: server_integration_tests
if: ${{ !matrix.disabled_on_pr || !github.event.pull_request }}
Expand Down
1 change: 1 addition & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -99,6 +99,7 @@ option(LLAMA_CUDA_F16 "llama: use 16 bit floats for some
set(LLAMA_CUDA_KQUANTS_ITER "2" CACHE STRING "llama: iters./thread per block for Q2_K/Q6_K")
set(LLAMA_CUDA_PEER_MAX_BATCH_SIZE "128" CACHE STRING
"llama: max. batch size for using peer access")
option(LLAMA_CURL "llama: use libcurl to download model from an URL" OFF)
option(LLAMA_HIPBLAS "llama: use hipBLAS" OFF)
option(LLAMA_HIP_UMA "llama: use HIP unified memory architecture" OFF)
option(LLAMA_CLBLAST "llama: use CLBlast" OFF)
Expand Down
5 changes: 5 additions & 0 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -595,6 +595,11 @@ include scripts/get-flags.mk
CUDA_CXXFLAGS := $(BASE_CXXFLAGS) $(GF_CXXFLAGS) -Wno-pedantic
endif

ifdef LLAMA_CURL
override CXXFLAGS := $(CXXFLAGS) -DLLAMA_USE_CURL
override LDFLAGS := $(LDFLAGS) -lcurl
endif

#
# Print build information
#
Expand Down
13 changes: 12 additions & 1 deletion common/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -68,6 +68,17 @@ if (BUILD_SHARED_LIBS)
set_target_properties(${TARGET} PROPERTIES POSITION_INDEPENDENT_CODE ON)
endif()

set(LLAMA_COMMON_EXTRA_LIBS build_info)

# Use curl to download model url
if (LLAMA_CURL)
find_package(CURL REQUIRED)
add_definitions(-DLLAMA_USE_CURL)
include_directories(${CURL_INCLUDE_DIRS})
find_library(CURL_LIBRARY curl REQUIRED)
set(LLAMA_COMMON_EXTRA_LIBS ${LLAMA_COMMON_EXTRA_LIBS} ${CURL_LIBRARY})
endif ()

target_include_directories(${TARGET} PUBLIC .)
target_compile_features(${TARGET} PUBLIC cxx_std_11)
target_link_libraries(${TARGET} PRIVATE build_info PUBLIC llama)
target_link_libraries(${TARGET} PRIVATE ${LLAMA_COMMON_EXTRA_LIBS} PUBLIC llama)
238 changes: 237 additions & 1 deletion common/common.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,9 @@
#include <sys/stat.h>
#include <unistd.h>
#endif
#if defined(LLAMA_USE_CURL)
#include <curl/curl.h>
#endif

#if defined(_MSC_VER)
#pragma warning(disable: 4244 4267) // possible loss of data
Expand All @@ -50,6 +53,18 @@
#define GGML_USE_CUBLAS_SYCL_VULKAN
#endif

#if defined(LLAMA_USE_CURL)
#ifdef __linux__
#include <linux/limits.h>
#elif defined(_WIN32)
#define PATH_MAX MAX_PATH
#else
#include <sys/syslimits.h>
#endif
#define LLAMA_CURL_MAX_PATH_LENGTH PATH_MAX
#define LLAMA_CURL_MAX_HEADER_LENGTH 256
#endif // LLAMA_USE_CURL

int32_t get_num_physical_cores() {
#ifdef __linux__
// enumerate the set of thread siblings, num entries is num cores
Expand Down Expand Up @@ -644,6 +659,13 @@ bool gpt_params_parse_ex(int argc, char ** argv, gpt_params & params) {
}
params.model = argv[i];
}
if (arg == "-mu" || arg == "--model-url") {
if (++i >= argc) {
invalid_param = true;
break;
}
params.model_url = argv[i];
}
if (arg == "-md" || arg == "--model-draft") {
arg_found = true;
if (++i >= argc) {
Expand Down Expand Up @@ -1368,6 +1390,8 @@ void gpt_print_usage(int /*argc*/, char ** argv, const gpt_params & params) {
printf(" layer range to apply the control vector(s) to, start and end inclusive\n");
printf(" -m FNAME, --model FNAME\n");
printf(" model path (default: %s)\n", params.model.c_str());
printf(" -mu MODEL_URL, --model-url MODEL_URL\n");
printf(" model download url (default: %s)\n", params.model_url.c_str());
printf(" -md FNAME, --model-draft FNAME\n");
printf(" draft model for speculative decoding\n");
printf(" -ld LOGDIR, --logdir LOGDIR\n");
Expand Down Expand Up @@ -1613,10 +1637,222 @@ void llama_batch_add(
batch.n_tokens++;
}

#ifdef LLAMA_USE_CURL

struct llama_model * llama_load_model_from_url(const char * model_url, const char * path_model,
struct llama_model_params params) {
// Basic validation of the model_url
if (!model_url || strlen(model_url) == 0) {
fprintf(stderr, "%s: invalid model_url\n", __func__);
return NULL;
}

// Initialize libcurl globally
auto curl = curl_easy_init();

if (!curl) {
fprintf(stderr, "%s: error initializing libcurl\n", __func__);
return NULL;
}

// Set the URL, allow to follow http redirection
curl_easy_setopt(curl, CURLOPT_URL, model_url);
curl_easy_setopt(curl, CURLOPT_FOLLOWLOCATION, 1L);
#if defined(_WIN32)
// CURLSSLOPT_NATIVE_CA tells libcurl to use standard certificate store of
// operating system. Currently implemented under MS-Windows.
curl_easy_setopt(curl, CURLOPT_SSL_OPTIONS, CURLSSLOPT_NATIVE_CA);
#endif

// Check if the file already exists locally
struct stat model_file_info;
auto file_exists = (stat(path_model, &model_file_info) == 0);

// If the file exists, check for ${path_model}.etag or ${path_model}.lastModified files
char etag[LLAMA_CURL_MAX_HEADER_LENGTH] = {0};
char etag_path[LLAMA_CURL_MAX_PATH_LENGTH] = {0};
snprintf(etag_path, sizeof(etag_path), "%s.etag", path_model);

char last_modified[LLAMA_CURL_MAX_HEADER_LENGTH] = {0};
char last_modified_path[LLAMA_CURL_MAX_PATH_LENGTH] = {0};
snprintf(last_modified_path, sizeof(last_modified_path), "%s.lastModified", path_model);

if (file_exists) {
auto * f_etag = fopen(etag_path, "r");
if (f_etag) {
if (!fgets(etag, sizeof(etag), f_etag)) {
fprintf(stderr, "%s: unable to read file %s\n", __func__, etag_path);
} else {
fprintf(stderr, "%s: previous model file found %s: %s\n", __func__, etag_path, etag);
}
fclose(f_etag);
}

auto * f_last_modified = fopen(last_modified_path, "r");
if (f_last_modified) {
if (!fgets(last_modified, sizeof(last_modified), f_last_modified)) {
fprintf(stderr, "%s: unable to read file %s\n", __func__, last_modified_path);
} else {
fprintf(stderr, "%s: previous model file found %s: %s\n", __func__, last_modified_path,
last_modified);
}
fclose(f_last_modified);
}
}

// Send a HEAD request to retrieve the etag and last-modified headers
struct llama_load_model_from_url_headers {
char etag[LLAMA_CURL_MAX_HEADER_LENGTH] = {0};
char last_modified[LLAMA_CURL_MAX_HEADER_LENGTH] = {0};
};
llama_load_model_from_url_headers headers;
{
typedef size_t(*CURLOPT_HEADERFUNCTION_PTR)(char *, size_t, size_t, void *);
auto header_callback = [](char * buffer, size_t /*size*/, size_t n_items, void * userdata) -> size_t {
llama_load_model_from_url_headers *headers = (llama_load_model_from_url_headers *) userdata;

const char * etag_prefix = "etag: ";
if (strncmp(buffer, etag_prefix, strlen(etag_prefix)) == 0) {
strncpy(headers->etag, buffer + strlen(etag_prefix), n_items - strlen(etag_prefix) - 2); // Remove CRLF
}

const char * last_modified_prefix = "last-modified: ";
if (strncmp(buffer, last_modified_prefix, strlen(last_modified_prefix)) == 0) {
strncpy(headers->last_modified, buffer + strlen(last_modified_prefix),
n_items - strlen(last_modified_prefix) - 2); // Remove CRLF
}
return n_items;
};

curl_easy_setopt(curl, CURLOPT_NOBODY, 1L); // will trigger the HEAD verb
curl_easy_setopt(curl, CURLOPT_NOPROGRESS, 1L); // hide head request progress
curl_easy_setopt(curl, CURLOPT_HEADERFUNCTION, static_cast<CURLOPT_HEADERFUNCTION_PTR>(header_callback));
curl_easy_setopt(curl, CURLOPT_HEADERDATA, &headers);

CURLcode res = curl_easy_perform(curl);
if (res != CURLE_OK) {
curl_easy_cleanup(curl);
fprintf(stderr, "%s: curl_easy_perform() failed: %s\n", __func__, curl_easy_strerror(res));
return NULL;
}

long http_code = 0;
curl_easy_getinfo(curl, CURLINFO_RESPONSE_CODE, &http_code);
if (http_code != 200) {
// HEAD not supported, we don't know if the file has changed
// force trigger downloading
file_exists = false;
fprintf(stderr, "%s: HEAD invalid http status code received: %ld\n", __func__, http_code);
}
}

// If the ETag or the Last-Modified headers are different: trigger a new download
if (!file_exists || strcmp(etag, headers.etag) != 0 || strcmp(last_modified, headers.last_modified) != 0) {
char path_model_temporary[LLAMA_CURL_MAX_PATH_LENGTH] = {0};
snprintf(path_model_temporary, sizeof(path_model_temporary), "%s.downloadInProgress", path_model);
if (file_exists) {
fprintf(stderr, "%s: deleting previous downloaded model file: %s\n", __func__, path_model);
if (remove(path_model) != 0) {
curl_easy_cleanup(curl);
fprintf(stderr, "%s: unable to delete file: %s\n", __func__, path_model);
return NULL;
}
}

// Set the output file
auto * outfile = fopen(path_model_temporary, "wb");
if (!outfile) {
curl_easy_cleanup(curl);
fprintf(stderr, "%s: error opening local file for writing: %s\n", __func__, path_model);
return NULL;
}

typedef size_t(*CURLOPT_WRITEFUNCTION_PTR)(void * data, size_t size, size_t nmemb, void * fd);
auto write_callback = [](void * data, size_t size, size_t nmemb, void * fd) -> size_t {
return fwrite(data, size, nmemb, (FILE *)fd);
};
curl_easy_setopt(curl, CURLOPT_NOBODY, 0L);
curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, static_cast<CURLOPT_WRITEFUNCTION_PTR>(write_callback));
curl_easy_setopt(curl, CURLOPT_WRITEDATA, outfile);

// display download progress
curl_easy_setopt(curl, CURLOPT_NOPROGRESS, 0L);

// start the download
fprintf(stderr, "%s: downloading model from %s to %s (server_etag:%s, server_last_modified:%s)...\n", __func__,
model_url, path_model, headers.etag, headers.last_modified);
auto res = curl_easy_perform(curl);
if (res != CURLE_OK) {
fclose(outfile);
curl_easy_cleanup(curl);
fprintf(stderr, "%s: curl_easy_perform() failed: %s\n", __func__, curl_easy_strerror(res));
return NULL;
}

long http_code = 0;
curl_easy_getinfo (curl, CURLINFO_RESPONSE_CODE, &http_code);
if (http_code < 200 || http_code >= 400) {
fclose(outfile);
curl_easy_cleanup(curl);
fprintf(stderr, "%s: invalid http status code received: %ld\n", __func__, http_code);
return NULL;
}

// Clean up
fclose(outfile);

// Write the new ETag to the .etag file
if (strlen(headers.etag) > 0) {
auto * etag_file = fopen(etag_path, "w");
if (etag_file) {
fputs(headers.etag, etag_file);
fclose(etag_file);
fprintf(stderr, "%s: model etag saved %s: %s\n", __func__, etag_path, headers.etag);
}
}

// Write the new lastModified to the .etag file
if (strlen(headers.last_modified) > 0) {
auto * last_modified_file = fopen(last_modified_path, "w");
if (last_modified_file) {
fputs(headers.last_modified, last_modified_file);
fclose(last_modified_file);
fprintf(stderr, "%s: model last modified saved %s: %s\n", __func__, last_modified_path,
headers.last_modified);
}
}

if (rename(path_model_temporary, path_model) != 0) {
curl_easy_cleanup(curl);
fprintf(stderr, "%s: unable to rename file: %s to %s\n", __func__, path_model_temporary, path_model);
return NULL;
}
}

curl_easy_cleanup(curl);

return llama_load_model_from_file(path_model, params);
}

#else

struct llama_model * llama_load_model_from_url(const char * /*model_url*/, const char * /*path_model*/,
struct llama_model_params /*params*/) {
fprintf(stderr, "%s: llama.cpp built without libcurl, downloading from an url not supported.\n", __func__);
return nullptr;
}

#endif // LLAMA_USE_CURL

std::tuple<struct llama_model *, struct llama_context *> llama_init_from_gpt_params(gpt_params & params) {
auto mparams = llama_model_params_from_gpt_params(params);

llama_model * model = llama_load_model_from_file(params.model.c_str(), mparams);
llama_model * model = nullptr;
if (!params.model_url.empty()) {
model = llama_load_model_from_url(params.model_url.c_str(), params.model.c_str(), mparams);
} else {
model = llama_load_model_from_file(params.model.c_str(), mparams);
}
if (model == NULL) {
fprintf(stderr, "%s: error: failed to load model '%s'\n", __func__, params.model.c_str());
return std::make_tuple(nullptr, nullptr);
Expand Down
Loading
Loading