Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

server : refactor #5882

Merged
merged 26 commits into from
Mar 7, 2024
Merged
Show file tree
Hide file tree
Changes from 22 commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
f4e6e7e
server : refactoring (wip)
ggerganov Mar 5, 2024
ef7eb33
server : remove llava/clip objects from build
ggerganov Mar 5, 2024
134f5fe
server : fix empty prompt handling + all slots idle logic
ggerganov Mar 5, 2024
ad1d746
server : normalize id vars
ggerganov Mar 5, 2024
fef64c5
server : code style
ggerganov Mar 5, 2024
b1b3ba8
server : simplify model chat template validation
ggerganov Mar 5, 2024
f4800d5
server : code style
ggerganov Mar 5, 2024
7635b13
server : minor
ggerganov Mar 5, 2024
f84809b
llama : llama_chat_apply_template support null buf
ggerganov Mar 5, 2024
22ae1a6
server : do not process embedding requests when disabled
ggerganov Mar 5, 2024
cb3ce0b
server : reorganize structs and enums + naming fixes
ggerganov Mar 5, 2024
4a2d5f6
server : merge oai.hpp in utils.hpp
ggerganov Mar 5, 2024
61b6370
server : refactor system prompt update at start
ggerganov Mar 5, 2024
aef02b1
server : disable cached prompts with self-extend
ggerganov Mar 6, 2024
bfb121f
server : do not process more than n_batch tokens per iter
ggerganov Mar 6, 2024
79ef3c0
server: tests: embeddings use a real embeddings model (#5908)
phymbert Mar 6, 2024
36e12f8
server, tests : bump batch to fit 1 embedding prompt
ggerganov Mar 6, 2024
59850f1
server: tests: embeddings fix build type Debug is randomly failing (#…
phymbert Mar 6, 2024
3166ccf
server: tests: embeddings, no need to wait for server idle as it can …
phymbert Mar 6, 2024
c50a510
server: refactor: clean up http code (#5912)
phymbert Mar 6, 2024
c53d84e
server : avoid n_available var
ggerganov Mar 6, 2024
9c8d3c8
server: refactor: better http codes
phymbert Mar 6, 2024
fd74b5e
server : simplify json parsing + add comment about t_last
ggerganov Mar 7, 2024
234ab58
server : rename server structs
ggerganov Mar 7, 2024
818d898
server : allow to override FQDN in tests
ggerganov Mar 7, 2024
87a4a10
server : add comments
ggerganov Mar 7, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 2 additions & 1 deletion .github/workflows/server.yml
Original file line number Diff line number Diff line change
Expand Up @@ -58,7 +58,8 @@ jobs:
cmake \
python3-pip \
wget \
psmisc
psmisc \
language-pack-en

- name: Build
id: cmake_build
Expand Down
5 changes: 2 additions & 3 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -724,10 +724,9 @@ save-load-state: examples/save-load-state/save-load-state.cpp ggml.o llama.o $(C
$(CXX) $(CXXFLAGS) -c $< -o $(call GET_OBJ_FILE, $<)
$(CXX) $(CXXFLAGS) $(filter-out %.h $<,$^) $(call GET_OBJ_FILE, $<) -o $@ $(LDFLAGS)

server: examples/server/server.cpp examples/server/oai.hpp examples/server/utils.hpp examples/server/httplib.h examples/server/json.hpp examples/server/index.html.hpp examples/server/index.js.hpp examples/server/completion.js.hpp examples/llava/clip.cpp examples/llava/clip.h examples/llava/llava.h examples/llava/llava.cpp common/stb_image.h ggml.o llama.o $(COMMON_DEPS) grammar-parser.o $(OBJS)
server: examples/server/server.cpp examples/server/utils.hpp examples/server/httplib.h examples/server/json.hpp examples/server/index.html.hpp examples/server/index.js.hpp examples/server/completion.js.hpp common/stb_image.h ggml.o llama.o $(COMMON_DEPS) grammar-parser.o $(OBJS)
$(CXX) $(CXXFLAGS) -c $< -o $(call GET_OBJ_FILE, $<)
$(CXX) $(CXXFLAGS) -c examples/llava/clip.cpp -o $(call GET_OBJ_FILE, examples/llava/clip.cpp) -Wno-cast-qual
$(CXX) $(CXXFLAGS) -Iexamples/server $(filter-out %.h %.hpp $< examples/llava/clip.cpp,$^) $(call GET_OBJ_FILE, $<) $(call GET_OBJ_FILE, examples/llava/clip.cpp) -o $@ $(LDFLAGS) $(LWINSOCK2)
$(CXX) $(CXXFLAGS) $(filter-out %.h %.hpp $<,$^) -Iexamples/server $(call GET_OBJ_FILE, $<) -o $@ $(LDFLAGS) $(LWINSOCK2)

gguf: examples/gguf/gguf.cpp ggml.o $(OBJS)
$(CXX) $(CXXFLAGS) -c $< -o $(call GET_OBJ_FILE, $<)
Expand Down
2 changes: 1 addition & 1 deletion examples/server-embd.py
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,7 @@ async def main():
model_url = "http://127.0.0.1:6900"
responses: list[requests.Response] = await asyncio.gather(*[requests_post_async(
url= f"{model_url}/embedding",
json= {"content": str(i)*1024}
json= {"content": str(0)*1024}
) for i in range(n)])

for response in responses:
Expand Down
4 changes: 2 additions & 2 deletions examples/server/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
set(TARGET server)
option(LLAMA_SERVER_VERBOSE "Build verbose logging option for Server" ON)
include_directories(${CMAKE_CURRENT_SOURCE_DIR})
add_executable(${TARGET} server.cpp oai.hpp utils.hpp json.hpp httplib.h)
add_executable(${TARGET} server.cpp utils.hpp json.hpp httplib.h)
install(TARGETS ${TARGET} RUNTIME)
target_compile_definitions(${TARGET} PRIVATE
SERVER_VERBOSE=$<BOOL:${LLAMA_SERVER_VERBOSE}>
)
target_link_libraries(${TARGET} PRIVATE common llava ${CMAKE_THREAD_LIBS_INIT})
target_link_libraries(${TARGET} PRIVATE common ${CMAKE_THREAD_LIBS_INIT})
if (WIN32)
TARGET_LINK_LIBRARIES(${TARGET} PRIVATE ws2_32)
endif()
Expand Down
2 changes: 1 addition & 1 deletion examples/server/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -436,7 +436,7 @@ Notice that each `probs` is an array of length `n_probs`.
"next_token": {
"has_next_token": true,
"n_remain": -1,
"num_tokens_predicted": 0,
"n_decoded": 0,
"stopped_eos": false,
"stopped_limit": false,
"stopped_word": false,
Expand Down
225 changes: 0 additions & 225 deletions examples/server/oai.hpp

This file was deleted.

Loading
Loading