Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable CORS requests on all routes #5781

Closed
wants to merge 10 commits into from

Conversation

StrangeBytesDev
Copy link
Contributor

Enable's CORS requests on all endpoints for all request types. Update's tests to reflect that change.

Copy link
Member

@ggerganov ggerganov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets put this feature behind a command-line flag: --enable-cors

By default, it should not be enabled

@Azeirah
Copy link
Contributor

Azeirah commented Feb 29, 2024

Ohhh so that's how you apply middleware to routes, I couldn't find it at first.

I read the discussion in the linked thread, and I'm honestly a bit unsure about what the right approach is. In my opinion CORS is not an amazing security mechanism because as was already stated, anyone can still approach the server using anything other than a browser that respects CORS.

Still, you'd be preventing sites doing something similar to crypto mining on your pc by accessing any locally running LLMs if it's not disabled.

I want to add two points

The first is, if you do the flag thing, at the very least add a short log to the terminal that CORS is disabled when an endpoint is rejected due to CORS, informing the user they can enable it, but should understand the security risks (and recommend setting an API key)

So when CORS is disabled and an OPTIONS endpoint is hit, log a message to the console

"Warning: site example.com is denied access the llama server API. Pass the --enable-cors flag to allow example.com to access this API. For security reasons, you are recommended to set an API key as well with the --api-key flag."


The alternative is to enable it by default, but work on a whitelist basis. No hosts except localhost are allowed by default. You can enable hosts in something like a cors-whitelist.txt. This is how CORS is supposed to be used by design.

This is simpler and safer but requires people to interact with the whitelist file. It's possible to maybe add a Y/N prompt to the server when an unknown host is trying to access the server, or otherwise list a similar warning message as the one above

"Warning: site example.com is denied access the llama server API. Add example.com to cors-whitelist.txt if you want it to have access to your llama.cpp API."

@StrangeBytesDev
Copy link
Contributor Author

@Azeirah I love the log idea. I think that would substantially help cut down on confused users and new github issues if we disable CORS by default.

@ggerganov @ngxson thoughts on using a cors-whitelist .txt file, or adding a --cors-whitelist flag?

@ngxson
Copy link
Collaborator

ngxson commented Mar 2, 2024

@ggerganov @Azeirah Instead of disabling CORS by default, I think we should only allow a limited number of whitelist domain instead.

(Modified and moved to a comment below)

@ngxson
Copy link
Collaborator

ngxson commented Mar 2, 2024

Also what we're currently doing:

res.set_header("Access-Control-Allow-Origin", req.get_header_value("Origin"));

This is equivalent to

res.set_header("Access-Control-Allow-Origin", "*");

That's why I say that it's redundant. What is interesting for us is to checkthat req.get_header_value("Origin")

@Azeirah
Copy link
Contributor

Azeirah commented Mar 2, 2024

Also what we're currently doing:

res.set_header("Access-Control-Allow-Origin", req.get_header_value("Origin"));

This is equivalent to

res.set_header("Access-Control-Allow-Origin", "*");

That's why I say that it's redundant. What is interesting for us is to checkthat req.get_header_value("Origin")

No, they're not equivalent. CORS requires the domain to be mirrored back explicitly in the case of credentialed requests (a credentialed request is one with the Authorization header)

@ngxson
Copy link
Collaborator

ngxson commented Mar 2, 2024

@Azeirah Ah yeah sorry I missed that part.

But even with what you say, then the fact that we currently mirror the Origin domain is just to allow credentialed requests for that specific domain right?

In this case, I think we should:

  • Either set Access-Control-Allow-Origin to a pre-defined value, not reflecting Origin (the notion of whitelist as you suggested)
  • Or, still allow Access-Control-Allow-Origin to reflect the Origin, but we internally check if Origin is in whitelist or not

Basically, what I suggest here is either use Access-Control-Allow-Origin as it supposed to be used (i.e. with a pre-defined whitelist), either checking if Origin is in whitelist, but not both.

@Azeirah
Copy link
Contributor

Azeirah commented Mar 2, 2024

@Azeirah Ah yeah sorry I missed that part.

But even with what you say, then the fact that we currently mirror the Origin domain is just to allow credentialed requests for that specific domain right?

In this case, I think we should:

* Either set `Access-Control-Allow-Origin` to a pre-defined value, not reflecting `Origin` (the notion of whitelist as you suggested)

* Or, still allow `Access-Control-Allow-Origin` to reflect the Origin, but we internally check if `Origin` is in whitelist or not

Basically, what I suggest here is either use Access-Control-Allow-Origin as it supposed to be used (i.e. with a pre-defined whitelist), either checking if Origin is in whitelist, but not both.

Yep! I think whitelisting is the best approach. The current implementation is a catchall

@ngxson
Copy link
Collaborator

ngxson commented Mar 2, 2024

Yep! I think whitelisting is the best approach. The current implementation is a catchall

Thanks, we finally have an agreement here.

I modified my proposal, my idea is to keep the Access-Control-Allow-Origin reflect Origin (keep as-is), but check Origin header:

Option Origin Access-Control-Allow-Origin When user access to any endpoint (for example /completions) Log
by default localhost (reflect Origin) OK (access as normal)
by default example.com (reflect Origin) Error code 403: { "error": "Origin not whitelisted" } ERR: Origin example.com is not accepted
--whilelist-domain example.com localhost (reflect Origin) Error code 403: { "error": "Origin not whitelisted" } ERR: Origin localhost is not accepted
--whilelist-domain example.com example.com (reflect Origin) OK (access as normal)
--whilelist-domain * (any) (reflect Origin) OK (access as normal) WARN: Allowing all origin without setting an API key, this is not recommended
--whilelist-domain * --api-key ABCDEF (any) (reflect Origin) Access as normal with API key = ABCDEF

By throwing an error instead of letting browser to block it, some frontends can show the error message Origin not whitelisted in the UI.

Do you still prefer this approach (reflect Origin) or you prefer to have Access-Control-Allow-Origin fixed to a pre-defined value? (i.e. via --whilelist-domain)

@Azeirah
Copy link
Contributor

Azeirah commented Mar 2, 2024

Only remark I have is to allow localhost regardless. I think it's a sensible default.

Am not 100% certain whether cors even applies to localhost though? So maybe not necessary. But many chat clients are just electron wrappers around some js webapp and they run into CORS issues.

StrangeBytesDev and others added 2 commits March 2, 2024 13:13
* Disable CORS requests by default.
* Add --public-domain flag to allow specifying a CORS allowed domain.
* Warn about using "*" without an API key.
@StrangeBytesDev
Copy link
Contributor Author

I've implemented the feature as specified. The origin is always reflected. I added a flag "--public-domain" which allows users to specify a domain which will allow CORS requests. Any requests which specify an origin other than the public domain will receive an error message and an http 403 status. Blocked requests are logged. The public-domain setting defaults to "http://localhost:8080". Setting the public-domain to "*" without also including an API key logs a warning.

I used "public-domain" instead of "whitelist" because a CORS header only allows for setting a single origin (as far as I understand) and I implemented the origin check as a simple string comparison, so you can only specify a single CORS enabled origin. That seemed more robust, and if users want to enable multiple origins, then setting the public domain to "*" and using an API key is probably the best way to handle that.

@@ -42,6 +42,7 @@ see https://github.com/ggerganov/llama.cpp/issues/1437
- `-to N`, `--timeout N`: Server read/write timeout in seconds. Default `600`.
- `--host`: Set the hostname or ip address to listen. Default `127.0.0.1`.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You did not change here to the hostname. Anyway I dont understand why you switched from IP to hostname? is not IP valid URL in Origin ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's currently a mixture of "127.0.0.1" and "localhost" throughout the codebase. I started to do some normalization to a single value, and moved towards localhost, as that's been the domain suggested by Azeirah and Ngxson. But then I ran into trouble with the python tests not liking localhost, so I went back, but missed this instance.
There's probably a consideration to be made about which value is more likely to be intuitive for a typical user, but maybe that's it's own issue.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also prefer 127.0.0.1 because sometimes localhost is not defined in docker environment

@@ -47,5 +47,5 @@ Feature: Security
| localhost | Access-Control-Allow-Origin | localhost |
| web.mydomain.fr | Access-Control-Allow-Origin | web.mydomain.fr |
| origin | Access-Control-Allow-Credentials | true |
| web.mydomain.fr | Access-Control-Allow-Methods | POST |
| web.mydomain.fr | Access-Control-Allow-Methods | * |
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually can we just restrict to GET and POST ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah you're right, I though it's a good idea to use *, but turns out it shouldn't be. Can you change it to GET, POST @StrangeBytesDev ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I assume we want GET, POST, OPTIONS so that preflight requests are allowed.

@@ -42,6 +42,7 @@ see https://github.com/ggerganov/llama.cpp/issues/1437
- `-to N`, `--timeout N`: Server read/write timeout in seconds. Default `600`.
- `--host`: Set the hostname or ip address to listen. Default `127.0.0.1`.
- `--port`: Set the port to listen. Default: `8080`.
- `--public-domain`: Set a public domain which will be allowed for Cross Origin Requests. If you are using the server as an API from a browser, this is required.
Copy link
Contributor

@Azeirah Azeirah Mar 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please rename it to reference CORS at the very least, public domain is not the right terminology, it might confuse people. I'd recommend something like --cors-allowed-origin or --cors-origin.

- `--cors-origin`: Set what origin (example.com) is allowed to access the API. Use * to allow all origins (insecure without --api-key).  If you are using the server as an API from a browser, this parameter is required.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

++ and the user is sending cross origin requests.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, to be coherent with other parameters, I suggest --http-cors-origin

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thought was that a typical user probably wouldn't know what CORS is or why they need it. I've updated the flag to be "--http-cors-origin". There's potentially some consideration to avoiding "http" as the origin "https://example.org" is different from "http://example.org".


if (req.has_header("Origin") && sparams.public_domain != "*") {
if (req.get_header_value("Origin") != sparams.public_domain) {
LOG_WARNING("Request from origin not allowed.", {{"origin", req.get_header_value("Origin")}});
Copy link
Contributor

@Azeirah Azeirah Mar 3, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just wanted to note, I believe this also solves the issue that the server previously had, when a request was sent that was disallowed by CORS, the server would still process the request fully. So even disallowed origins could make your PC waste cycles by calling the completion API.

This is a better way to handle CORS :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Origin header actually have the protocol + port within it. I found a simple code snippet allow you to parse URI and extract the domain name here: https://gist.github.com/RedCarrottt/c7a056695e6951415a0368a87ad1e493

StrangeBytesDev and others added 3 commits March 3, 2024 11:37
* Restrict HTTP requests to GET, POST, and OPTIONS.
* rename cors flag from "--public-domain" to "--http-cors-origin"
invalid_param = true;
break;
}
sparams.http_cors_origin = argv[i];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible here to input a list of domain names, for example example.com,mywebsite.com,example.net ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. It also made for a pretty convenient way to include both "localhost" and "127.0.0.1" by default.

StrangeBytesDev and others added 2 commits March 3, 2024 15:29
* Allow setting multiple CORS enabled origins.
* Add both "http://localhost:8080" and "http://127.0.0.1:8080" by default.
* Move CORS logging below server startup to make it more visible.
@mofosyne mofosyne added Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix server labels May 10, 2024
Copy link
Contributor

📈 llama.cpp server for bench-server-baseline on Standard_NC4as_T4_v3 for phi-2-q4_0: 537 iterations 🚀

Expand details for performance related PR only
  • Concurrent users: 8, duration: 10m
  • HTTP request : avg=8733.91ms p(95)=21757.61ms fails=, finish reason: stop=474 truncated=63
  • Prompt processing (pp): avg=92.7tk/s p(95)=366.16tk/s
  • Token generation (tg): avg=32.53tk/s p(95)=46.12tk/s
  • ggml-org/models/phi-2/ggml-model-q4_0.gguf parallel=8 ctx-size=16384 ngl=33 batch-size=2048 ubatch-size=256 pp=1024 pp+tg=2048 branch=master commit=67e60c0da4512f0a6f3d1c76448c783bf2c92aa4

prompt_tokens_seconds

More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 537 iterations"
    y-axis "llamacpp:prompt_tokens_seconds"
    x-axis "llamacpp:prompt_tokens_seconds" 1715428258 --> 1715428888
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 404.6, 404.6, 404.6, 404.6, 404.6, 949.08, 949.08, 949.08, 949.08, 949.08, 815.74, 815.74, 815.74, 815.74, 815.74, 825.56, 825.56, 825.56, 825.56, 825.56, 844.26, 844.26, 844.26, 844.26, 844.26, 893.43, 893.43, 893.43, 893.43, 893.43, 888.65, 888.65, 888.65, 888.65, 888.65, 883.28, 883.28, 883.28, 883.28, 883.28, 910.03, 910.03, 910.03, 910.03, 910.03, 904.06, 904.06, 904.06, 904.06, 904.06, 891.4, 891.4, 891.4, 891.4, 891.4, 909.73, 909.73, 909.73, 909.73, 909.73, 924.08, 924.08, 924.08, 924.08, 924.08, 899.88, 899.88, 899.88, 899.88, 899.88, 848.39, 848.39, 848.39, 848.39, 848.39, 828.86, 828.86, 828.86, 828.86, 828.86, 832.64, 832.64, 832.64, 832.64, 832.64, 829.21, 829.21, 829.21, 829.21, 829.21, 828.4, 828.4, 828.4, 828.4, 828.4, 830.11, 830.11, 830.11, 830.11, 830.11, 835.74, 835.74, 835.74, 835.74, 835.74, 832.37, 832.37, 832.37, 832.37, 832.37, 836.91, 836.91, 836.91, 836.91, 836.91, 849.82, 849.82, 849.82, 849.82, 849.82, 851.96, 851.96, 851.96, 851.96, 851.96, 852.48, 852.48, 852.48, 852.48, 852.48, 833.87, 833.87, 833.87, 833.87, 833.87, 829.9, 829.9, 829.9, 829.9, 829.9, 829.09, 829.09, 829.09, 829.09, 829.09, 828.92, 828.92, 828.92, 828.92, 828.92, 833.46, 833.46, 833.46, 833.46, 833.46, 833.29, 833.29, 833.29, 833.29, 833.29, 833.59, 833.59, 833.59, 833.59, 833.59, 842.0, 842.0, 842.0, 842.0, 842.0, 848.14, 848.14, 848.14, 848.14, 848.14, 854.48, 854.48, 854.48, 854.48, 854.48, 834.28, 834.28, 834.28, 834.28, 834.28, 832.55, 832.55, 832.55, 832.55, 832.55, 832.16, 832.16, 832.16, 832.16, 832.16, 834.64, 834.64, 834.64, 834.64, 834.64, 837.28, 837.28, 837.28, 837.28, 837.28, 837.0, 837.0, 837.0, 837.0, 837.0, 840.84, 840.84, 840.84, 840.84, 840.84, 829.89, 829.89, 829.89, 829.89, 829.89, 829.27, 829.27, 829.27, 829.27, 829.27, 827.01, 827.01, 827.01, 827.01, 827.01, 824.34, 824.34, 824.34, 824.34, 824.34, 830.23, 830.23, 830.23, 830.23, 830.23, 831.44, 831.44, 831.44, 831.44, 831.44, 831.25, 831.25, 831.25, 831.25, 831.25, 832.37, 832.37, 832.37, 832.37, 832.37, 832.95, 832.95, 832.95, 832.95, 832.95, 837.59, 837.59, 837.59, 837.59, 837.59, 838.77, 838.77, 838.77, 838.77, 838.77, 838.33, 838.33, 838.33, 838.33, 838.33, 843.89, 843.89, 843.89, 843.89, 843.89, 844.67, 844.67, 844.67, 844.67, 844.67, 844.86, 844.86, 844.86, 844.86, 844.86, 845.54, 845.54, 845.54, 845.54, 845.54, 845.51, 845.51, 845.51, 845.51, 845.51, 847.23, 847.23, 847.23, 847.23, 847.23, 847.23]
                    
Loading
predicted_tokens_seconds
More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 537 iterations"
    y-axis "llamacpp:predicted_tokens_seconds"
    x-axis "llamacpp:predicted_tokens_seconds" 1715428258 --> 1715428888
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 35.91, 35.91, 35.91, 35.91, 35.91, 28.98, 28.98, 28.98, 28.98, 28.98, 28.32, 28.32, 28.32, 28.32, 28.32, 28.88, 28.88, 28.88, 28.88, 28.88, 29.64, 29.64, 29.64, 29.64, 29.64, 30.0, 30.0, 30.0, 30.0, 30.0, 31.69, 31.69, 31.69, 31.69, 31.69, 32.41, 32.41, 32.41, 32.41, 32.41, 32.54, 32.54, 32.54, 32.54, 32.54, 32.94, 32.94, 32.94, 32.94, 32.94, 33.14, 33.14, 33.14, 33.14, 33.14, 32.95, 32.95, 32.95, 32.95, 32.95, 32.65, 32.65, 32.65, 32.65, 32.65, 32.31, 32.31, 32.31, 32.31, 32.31, 32.28, 32.28, 32.28, 32.28, 32.28, 32.12, 32.12, 32.12, 32.12, 32.12, 32.47, 32.47, 32.47, 32.47, 32.47, 32.04, 32.04, 32.04, 32.04, 32.04, 31.94, 31.94, 31.94, 31.94, 31.94, 31.56, 31.56, 31.56, 31.56, 31.56, 31.38, 31.38, 31.38, 31.38, 31.38, 31.62, 31.62, 31.62, 31.62, 31.62, 31.56, 31.56, 31.56, 31.56, 31.56, 31.75, 31.75, 31.75, 31.75, 31.75, 31.76, 31.76, 31.76, 31.76, 31.76, 31.98, 31.98, 31.98, 31.98, 31.98, 31.5, 31.5, 31.5, 31.5, 31.5, 31.05, 31.05, 31.05, 31.05, 31.05, 31.19, 31.19, 31.19, 31.19, 31.19, 31.39, 31.39, 31.39, 31.39, 31.39, 31.45, 31.45, 31.45, 31.45, 31.45, 31.61, 31.61, 31.61, 31.61, 31.61, 31.75, 31.75, 31.75, 31.75, 31.75, 31.66, 31.66, 31.66, 31.66, 31.66, 31.6, 31.6, 31.6, 31.6, 31.6, 31.34, 31.34, 31.34, 31.34, 31.34, 31.24, 31.24, 31.24, 31.24, 31.24, 31.28, 31.28, 31.28, 31.28, 31.28, 31.4, 31.4, 31.4, 31.4, 31.4, 31.56, 31.56, 31.56, 31.56, 31.56, 31.64, 31.64, 31.64, 31.64, 31.64, 31.6, 31.6, 31.6, 31.6, 31.6, 31.38, 31.38, 31.38, 31.38, 31.38, 30.98, 30.98, 30.98, 30.98, 30.98, 29.84, 29.84, 29.84, 29.84, 29.84, 29.47, 29.47, 29.47, 29.47, 29.47, 29.45, 29.45, 29.45, 29.45, 29.45, 29.49, 29.49, 29.49, 29.49, 29.49, 29.62, 29.62, 29.62, 29.62, 29.62, 29.66, 29.66, 29.66, 29.66, 29.66, 29.76, 29.76, 29.76, 29.76, 29.76, 29.82, 29.82, 29.82, 29.82, 29.82, 29.81, 29.81, 29.81, 29.81, 29.81, 29.59, 29.59, 29.59, 29.59, 29.59, 29.5, 29.5, 29.5, 29.5, 29.5, 29.54, 29.54, 29.54, 29.54, 29.54, 29.65, 29.65, 29.65, 29.65, 29.65, 29.75, 29.75, 29.75, 29.75, 29.75, 29.85, 29.85, 29.85, 29.85, 29.85, 29.94, 29.94, 29.94, 29.94, 29.94, 30.06]
                    
Loading

Details

kv_cache_usage_ratio

More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 537 iterations"
    y-axis "llamacpp:kv_cache_usage_ratio"
    x-axis "llamacpp:kv_cache_usage_ratio" 1715428258 --> 1715428888
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.06, 0.06, 0.06, 0.06, 0.06, 0.38, 0.38, 0.38, 0.38, 0.38, 0.37, 0.37, 0.37, 0.37, 0.37, 0.31, 0.31, 0.31, 0.31, 0.31, 0.15, 0.15, 0.15, 0.15, 0.15, 0.17, 0.17, 0.17, 0.17, 0.17, 0.12, 0.12, 0.12, 0.12, 0.12, 0.14, 0.14, 0.14, 0.14, 0.14, 0.15, 0.15, 0.15, 0.15, 0.15, 0.17, 0.17, 0.17, 0.17, 0.17, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.24, 0.24, 0.24, 0.24, 0.24, 0.12, 0.12, 0.12, 0.12, 0.12, 0.38, 0.38, 0.38, 0.38, 0.38, 0.17, 0.17, 0.17, 0.17, 0.17, 0.11, 0.11, 0.11, 0.11, 0.11, 0.15, 0.15, 0.15, 0.15, 0.15, 0.26, 0.26, 0.26, 0.26, 0.26, 0.31, 0.31, 0.31, 0.31, 0.31, 0.24, 0.24, 0.24, 0.24, 0.24, 0.11, 0.11, 0.11, 0.11, 0.11, 0.15, 0.15, 0.15, 0.15, 0.15, 0.09, 0.09, 0.09, 0.09, 0.09, 0.11, 0.11, 0.11, 0.11, 0.11, 0.18, 0.18, 0.18, 0.18, 0.18, 0.31, 0.31, 0.31, 0.31, 0.31, 0.26, 0.26, 0.26, 0.26, 0.26, 0.13, 0.13, 0.13, 0.13, 0.13, 0.17, 0.17, 0.17, 0.17, 0.17, 0.15, 0.15, 0.15, 0.15, 0.15, 0.09, 0.09, 0.09, 0.09, 0.09, 0.15, 0.15, 0.15, 0.15, 0.15, 0.18, 0.18, 0.18, 0.18, 0.18, 0.12, 0.12, 0.12, 0.12, 0.12, 0.15, 0.15, 0.15, 0.15, 0.15, 0.23, 0.23, 0.23, 0.23, 0.23, 0.18, 0.18, 0.18, 0.18, 0.18, 0.14, 0.14, 0.14, 0.14, 0.14, 0.11, 0.11, 0.11, 0.11, 0.11, 0.12, 0.12, 0.12, 0.12, 0.12, 0.36, 0.36, 0.36, 0.36, 0.36, 0.42, 0.42, 0.42, 0.42, 0.42, 0.56, 0.56, 0.56, 0.56, 0.56, 0.6, 0.6, 0.6, 0.6, 0.6, 0.5, 0.5, 0.5, 0.5, 0.5, 0.08, 0.08, 0.08, 0.08, 0.08, 0.14, 0.14, 0.14, 0.14, 0.14, 0.15, 0.15, 0.15, 0.15, 0.15, 0.14, 0.14, 0.14, 0.14, 0.14, 0.19, 0.19, 0.19, 0.19, 0.19, 0.15, 0.15, 0.15, 0.15, 0.15, 0.24, 0.24, 0.24, 0.24, 0.24, 0.28, 0.28, 0.28, 0.28, 0.28, 0.17, 0.17, 0.17, 0.17, 0.17, 0.13, 0.13, 0.13, 0.13, 0.13, 0.14, 0.14, 0.14, 0.14, 0.14, 0.1, 0.1, 0.1, 0.1, 0.1, 0.13, 0.13, 0.13, 0.13, 0.13, 0.09, 0.09, 0.09, 0.09, 0.09, 0.15, 0.15, 0.15, 0.15, 0.15, 0.21]
                    
Loading
requests_processing
More
---
config:
    xyChart:
        titleFontSize: 12
        width: 900
        height: 600
    themeVariables:
        xyChart:
            titleColor: "#000000"
---
xychart-beta
    title "llama.cpp bench-server-baseline on Standard_NC4as_T4_v3
 duration=10m 537 iterations"
    y-axis "llamacpp:requests_processing"
    x-axis "llamacpp:requests_processing" 1715428258 --> 1715428888
    line [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 4.0, 4.0, 4.0, 4.0, 4.0, 6.0, 6.0, 6.0, 6.0, 6.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 2.0, 2.0, 2.0, 2.0, 2.0, 7.0, 7.0, 7.0, 7.0, 7.0, 4.0, 4.0, 4.0, 4.0, 4.0, 1.0, 1.0, 1.0, 1.0, 1.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 3.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 2.0, 2.0, 2.0, 2.0, 2.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 3.0, 3.0, 3.0, 3.0, 3.0, 2.0, 2.0, 2.0, 2.0, 2.0, 4.0, 4.0, 4.0, 4.0, 4.0, 7.0, 7.0, 7.0, 7.0, 7.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 1.0, 1.0, 1.0, 1.0, 1.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 2.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 6.0, 6.0, 6.0, 6.0, 6.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 8.0, 5.0, 5.0, 5.0, 5.0, 5.0, 7.0, 7.0, 7.0, 7.0, 7.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 3.0, 3.0, 3.0, 3.0, 3.0, 2.0, 2.0, 2.0, 2.0, 2.0, 5.0, 5.0, 5.0, 5.0, 5.0, 8.0, 8.0, 8.0, 8.0, 8.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 5.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 4.0, 4.0, 4.0, 4.0, 4.0, 6.0, 6.0, 6.0, 6.0, 6.0, 5.0, 5.0, 5.0, 5.0, 5.0, 0.0]
                    
Loading

@ggerganov ggerganov removed their request for review February 5, 2025 15:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Review Complexity : Low Trivial changes to code that most beginner devs (or those who want a break) can tackle. e.g. UI fix server
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants