-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Very Long Delay Behind MS App Proxy before Data Shown #107
Comments
Good morning, to answer your first question, we use a working production version of dashing icinga behind a proxy with the master being located at let's say https://icinga2.subnet.localdomain and dashing being run in a docker container at https://monitoring.subnet.localdomain. The end user in production accesses these webpages via a proxy and dashing delivers the SSE's via the proxy to the end user as well. Apart from that, the server setup looks similar (we are using CentOS 7). This being said, the end user is still within corporate intranet, so naturally there should be no high latencies. From the logs I can't point straight at a possible error source, however I will try and look over your code changes. For now, my only guess would be that the server is somehow causing/getting too many requests from users, your icinga master etc, since this can sometimes lead to problems with ruby based evented applications. |
I have never heard about msappproxy, Google explained that this is sort of a full blown proxy in Azure cloud - https://docs.microsoft.com/en-us/azure/active-directory/manage-apps/application-proxy-add-on-premises-application I could imagine that the proxy intercepts network streams, and one of them is the event stream with Ruby and Server Side Events. Which are known to be buggy on Windows clients, and highly likely their support in their proxy is not intact either. I would suggest asking the Azure support on whether they know if their proxy supports SSE. Cannot access the URL with dashing atm, it says 503. Keep in mind though that Dashing and the underlaying Ruby framework using old web technologies, and not something like web sockets. At some point in time, you'll need to find a modern replacement for Dashing. In terms of the diffs:
is not necessary, the library code ensures to load the correct |
@mocdaniel @dnsmichi Thank you both so much for the timely responses. I wasn't aware of a potential issue with SSE but I'll do some research to see if that's supported. As far as a high request load, I'm actually currently the only person consistently accessing the web server, I had one other user testing it over a Zoom meeting with Firefox but aside from that its visibility is pretty hidden right now. Also, interestingly, @mocdaniel I did just run the stopwatch again (specifically, started the foreground service, waited about 5 seconds for backend service to initialize, entered the proxy URL in the browser and immediately started the stopwatch), and the time was essentially exactly the same, 1m56s instead of 1m55s. And my apologies, I forgot that I had terminated the service running in the foreground without starting it back up in the background before providing the URL, here it is again, it's now running as a background service: https://icinga-cofc.msappproxy.net/dashingicinga Again, thank you both! |
Just commented out both of the bottom iFrames, re-ran the timer, got the same 1m55s rendering time for the data. Thank you for pointing that out, though! |
Can you analyse in your browser's dev console which portions take long to load and then render? This trace would truly be interesting to learn more about the problem. |
@austinjhunt I didn't refer to the request load from the user's side, but the amount of requests dashing-icinga is issuing towards your master each refresh interval. If you have too many separate requests going off this might delay the response a user of the web-app might get. Still, 1.9 minutes (this is what I get when requesting https://icinga-cofc.msappproxy.net/events) is oddly high. Nonetheless, please check your icinga2 logs (should be on /var/log/icinga2/icinga2.log on your RHEL server) explicitly for all requests arriving during 1 refresh interval of your icingaweb instance. In addition, maybe measure the time one single request from dashing to the master takes to return data, you could use the api methods shipped with dashing for that purpose. |
@dnsmichi I exported an HAR file for the request to the proxy, which again took an exact 1m55s to load initial data into front end, super interesting. Can't attach the HAR file here, unfortunately, as that file type isn't supported for attachment. /events is definitely the bottleneck.
One refresh interval in icinga2.log filtering by the dashing API user:
As for the amount of time it takes for a request from dashing to the master, (not sure if this was the proper way of identifying that, but it seemed logical - not too familiar with Ruby) I calculated the elapsed time from the moment right before to right after the icinga.run line executes in the scheduled icinga2.rb job. Averaging less than a second, definitely not the 1.9m. Which leads me to think it is indeed the send_event (one or all) methods that are causing the delay. The only question is, why is it a constant delay and not variable? I really appreciate both of your feedback (feedbacks?). Let me know if you have any additional thoughts. It does look like Azure is claiming their proxy handles SSE but there's a chance it's messing up on their end. |
I did not do much with thin or Ruby as server in general, the only thing I do know is that it is implemented as single thread pattern, and whenever To nail it down further, try to comment out all the object getters, and only use one |
Wow! Okay. Very interesting results. I commented out everything except the very first send_event function (for "icinga-stats"), and now in Chrome's network tab, the /events resource is taking anywhere from 2.8 minutes to 3.1 minutes to load. No longer a constant 1.9 minutes. Adding some screenshots. I'm a bit confused about why ALL the event data is being pulled if the only one not commented out is icinga-stats, though. I would expect that cell to be the only one with content since nothing else is getting sent from the server. I guess that means I didn't comment out the send_event functions from the right file? |
Hi @austinjhunt and sorry for the long time without reply, we were occupied by getting the repository transfer done as smoothly as possible. Commenting out the To me, this is a strong indicator that the proxy is the problem, given that you already measured the time needed for dashing and Icinga2 to communicate. Also, I just reread your initial post and the fact that there is no delay when you access the dashboard via the server's FQDN directly sounds like proxy problems as well. I will look around Microsoft's forums for a bit longer but I am afraid there's nothing on our end we can do to resolve your issues. |
We have two Icinga environments, one for test and one for prod. In production, we have our Master (icingaweb2) server behind a proxy, this is the URL: https://icinga-cofc.msappproxy.net
We have the dashing dashboard located at https://icinga-cofc.msappproxy.net/dashingicinga
When you access dashing via the server name directly (say, http://icingawebserver/dashingicinga), it works like a charm.
However, when you access via the proxy URL, it also works, but data isn't loaded into the widgets until after a significant delay (1m55s according to my stopwatch).
More interestingly, running dashing in the foreground seems to indicate that data is being retrieved at the SCHEDULER interval of 10 seconds, but the front end is not updating to reflect those retrievals. At least, again, not until after a long time.
Wondering what sort of variables/intervals in the source do you think align with this time frame and is it fixable? Have you seen a working instance of dashing-icinga2 behind a proxy?
This is a captured log of running dashing in the foreground while trying to access dashing via proxy URL
Expected Behavior
Expecting dashing icinga2 to show data on front end immediately upon page refresh/load
Current Behavior
Dashing front end does not load data until 1m 55s after page refresh (so, it's succeeding, just after a long wait)
Significant to note here that the 10s scheduler interval was not changed
Possible Solution
Based on the log suggesting that the backend querying works, I imagine there's something on the front end referencing a request URL that's causing the data not to load in properly, but I could be completely wrong. Alternatively, there's a moving part linking the backend icinga2 job to the front end widgets that is malfunctioning due to the use of a non-local URL/address (proxy instead of localhost)
Steps to Reproduce (for bugs)
This is a link to our example of a malfunctioning dashing-icinga2 dashboard living behind a Microsoft App proxy: https://icinga-cofc.msappproxy.net/dashingicinga
Context
As the primary administrator of this application for the College of Charleston's IT department, my goal is to establish a reliable, nice dashboard to keep up on a shared office monitor that reveals monitoring data and alerts about the > 1000 hosts and > 3000 services that we are monitoring, from web server VMs to physical networking gear. It's really about closing the gap between problems and solutions by improving awareness.
Your Environment
gem list --local dashing
):ruby -V
):git show -1
):git diff
):Was also tested in Firefox but I don't have the version number as it was tested by a team member over a Zoom meeting
Thank you!
The text was updated successfully, but these errors were encountered: