Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix issues with Fleet desktop periodically not appearing on macOS hosts #25924

Open
ghernandez345 opened this issue Jan 31, 2025 · 23 comments
Open
Assignees
Labels
bug Something isn't working as documented #g-orchestration Orchestration product group P2 Prioritize as urgent :release Ready to write code. Scheduled in a release. See "Making changes" in handbook.

Comments

@ghernandez345
Copy link
Contributor

ghernandez345 commented Jan 31, 2025

Fleet version: Observed with the latest version of Fleet's agent (fleetd)


💥 Actual behavior

Fleet Desktop sometimes disappears from the menu bar on macOS hosts and doesn't come back unless the host is restarted.

We've seen this happen after a macOS update: #25689

🧑‍💻  Steps to reproduce

  1. Install Fleet Desktop on a macOS host
  2. Update macOS to a newer version
  3. See that Fleet Desktop doesn't appear in the menu bar after macOS update (1-5% chance / reboot fixes)

🛠️ To fix

Update Fleet Desktop to be a launchd agent.

Flows that need to be checked

  • DEP setup experience
  • Host with previously installed desktop upgraded to new version
  • Package / build / deploy to TUF questions
  • Migration workflow
  • Disk encryption
@ghernandez345 ghernandez345 added #g-mdm MDM product group :product Product Design department (shows up on 🦢 Drafting board) :release Ready to write code. Scheduled in a release. See "Making changes" in handbook. story A user story defining an entire feature labels Jan 31, 2025
@ghernandez345 ghernandez345 removed the :release Ready to write code. Scheduled in a release. See "Making changes" in handbook. label Jan 31, 2025
@georgekarrv
Copy link
Member

Adding #19172

@noahtalerman noahtalerman added :product Product Design department (shows up on 🦢 Drafting board) and removed story A user story defining an entire feature :product Product Design department (shows up on 🦢 Drafting board) #g-mdm MDM product group labels Jan 31, 2025
@lukeheath lukeheath added the story A user story defining an entire feature label Feb 7, 2025
@noahtalerman
Copy link
Member

Goal

User story
As a user of a fleet desktop,
I want to ensure that Fleet desktop is available on my macOS host
so that I can always have access to my host details on Fleet.

Key result

Ensure a more stable fleet desktop experience where Fleet desktop is always available on a macOS host.

There have been issues where Fleet desktop periodically "disappears" on macOS hosts, so users cannot interact with Fleet desktop and cannot access their host information on Fleet. The cause is unclear, but it is thought that Fleet desktop, currently running as a daemon, maybe the issue and changing it to a launch agent instead may help. We are not certain if this will fix the issues though.

We could also surface information in Fleet or on the host that the host's Fleet desktop is not running and that action needs to be taken (e.g. a restart of the host). If we decide to go this route, some product ideation will be required.

Here are the current related bugs:

@noahtalerman noahtalerman added ~feature fest Will be reviewed at next Feature Fest and removed story A user story defining an entire feature :product Product Design department (shows up on 🦢 Drafting board) labels Feb 10, 2025
@getvictor
Copy link
Member

@noahtalerman Recently I installed fleetd on a new laptop and ran into this issue -- Fleet Desktop didn't launch due to the error described in #25689.

This is a terrible experience for a new user. I've worked with a prospect who hit this issue, and we know that customers regularly hit this issue (we have the data from our analytics). I suspect we may be losing potential customers or prospects who hit this issue on their first try of Fleet.

I recommend we convert macOS Fleet Desktop into a launchd agent. This is an engineering task and does not need product input.

As a second step, we need to surface issues in the Fleet UI. In addition to the Desktop not launching, we need to surface other issues, like denylisted queries.

@hazcod
Copy link

hazcod commented Feb 17, 2025

We're running into this during our PoC of FleetDM for new + fresh macOS installs. Is there a workaround?

@noahtalerman
Copy link
Member

@hazcod sorry you're running into this! So far, the only solution we've found is restarting the host.

Please let us know if that doesn't make Fleet Desktop show up.

@noahtalerman
Copy link
Member

I recommend we convert macOS Fleet Desktop into a launchd agent. This is an engineering task and does not need product input.

@getvictor can you please track an engineering initiated story for this? Sounds like something we should prioritize ASAP.

cc @lukeheath

@lukeheath
Copy link
Member

@georgekarrv @getvictor @noahtalerman It seems like this should be tracked as a priority bug. We expect the Fleet Desktop icon to appear, so if it's not, agree we need to prioritize and fix ASAP per @getvictor's point about new user experience.

@georgekarrv georgekarrv added the P1 Prioritize as critical label Feb 19, 2025
@georgekarrv
Copy link
Member

Hey team! Please add your planning poker estimate with Zenhub @getvictor @ghernandez345 @gillespi314 @mna

@noahtalerman
Copy link
Member

  • @noahtalerman: User requested this because Fleet Desktop periodically disappears on macOS hosts, preventing users from accessing their host details in Fleet. The root cause is unclear, but running Fleet Desktop as a daemon may be a factor.
    • @noahtalerman: In the interim users can manually restart their hosts or Fleet Desktop, though there is no visibility into when the app is not running.
    • @noahtalerman: Eventually Fleet could ensure a more stable Fleet Desktop experience by running it as a launch agent instead of a daemon. After that, in a follow up iteration, Fleet could surface information on the Host details page to notify users when Fleet Desktop is not running and action is needed.

@lukeheath lukeheath added P2 Prioritize as urgent and removed P1 Prioritize as critical labels Feb 20, 2025
@noahtalerman noahtalerman added #g-orchestration Orchestration product group and removed #g-mdm MDM product group labels Feb 21, 2025
@noahtalerman
Copy link
Member

FYI @sharon-fdm, @lukeheath, @georgekarrv and I chatted and we want to get #g-orchestration's help on this bug next sprint. #g-mdm is prioritizing DigiCert (#25822).

Sharon, I assigned you and moved it to "Ready to estimate" so we can estimate it next week.

@sharon-fdm
Copy link
Collaborator

@noahtalerman, @lukeheath, @georgekarrv, NP. We can take it.

@lukeheath
Copy link
Member

Thanks @sharon-fdm and team! This is a tricky one that is very hard to reproduce. @getvictor and @georgekarrv may have some thoughts to pass on from their digging.

@getvictor
Copy link
Member

If we move forward with the architectural fix of changing Fleet Desktop from a child of orbit daemon to its own launchd agent, the current issues will be automatically gone by the nature of the change.

However, the challenge is whether we hit some new issues. So, this fix will need some in-depth QA.

@sgress454
Copy link
Contributor

sgress454 commented Feb 25, 2025

If we move forward with the architectural fix of changing Fleet Desktop from a child of orbit daemon to its own launchd agent, the current issues will be automatically gone by the nature of the change.

@getvictor this change would affect the fix I put in for #26526 as well, keep me in the loop please!

Edit - actually maybe it'd be ok as-is, if the process exits itself after detecting the missing tray icon, launchd should restart it automatically if configured correctly.

@sharon-fdm
Copy link
Collaborator

Timebox 3 points to find a simple solution.
If we can't then do the proposed solution and estimate to 8 pts.

@getvictor
Copy link
Member

If we move forward with the architectural fix of changing Fleet Desktop from a child of orbit daemon to its own launchd agent, the current issues will be automatically gone by the nature of the change.

@getvictor this change would affect the fix I put in for #26526 as well, keep me in the loop please!

Edit - actually maybe it'd be ok as-is, if the process exits itself after detecting the missing tray icon, launchd should restart it automatically if configured correctly.

@sgress454 Your fix looks to be Linux only. This issue is macOS only.

@getvictor
Copy link
Member

@sharon-fdm The 8-point estimate from the MDM team was to do a POC of the fix and run through all the related flows where macOS Fleet Desktop is involved. A full fix may be larger than 8 points.

@getvictor
Copy link
Member

More context.

We know this issue regularly happens to customers because we added telemetry for the 2 error messages we know about. Currently, processing this telemetry is a manual process. Here are some details on how to find the older error message: #19172 (comment)

@sharon-fdm sharon-fdm assigned sgress454 and unassigned sharon-fdm Mar 3, 2025
@sharon-fdm sharon-fdm added :release Ready to write code. Scheduled in a release. See "Making changes" in handbook. and removed :product Product Design department (shows up on 🦢 Drafting board) labels Mar 3, 2025
@sgress454
Copy link
Contributor

sgress454 commented Mar 5, 2025

@getvictor did we ever explore the possibility of using

launchctl asuser $uid sudo -u "$currentUser" /usr/bin/open /opt/orbit/bin/desktop/macos/stable/Fleet\ Desktop.app

as you suggested in the previous ticket?
My research led to that possibility as well, seems worth a try over a whole revamp.

@getvictor
Copy link
Member

@getvictor did we ever explore the possibility of using

launchctl asuser $uid sudo -u "$currentUser" /usr/bin/open /opt/orbit/bin/desktop/macos/stable/Fleet\ Desktop.app

as you suggested in the previous ticket? My research led to that possibility as well, seems worth a try over a whole revamp.

I did not seriously consider it. We can try and see if this command works from orbit. If we can't reproduce this issue, we can then put in this experimental fix and monitor the user stats to see if people are still seeing it.

Another thing to QA is macOS with 2 accounts -- make sure Fleet Desktop works if one user logs out and another one logs in.

@sgress454
Copy link
Contributor

If we can't reproduce this issue

So far I haven't been able to simulate this kind of error without basically crippling my computer by, for example, killing launchservicesd. Since no one who actually saw this issue reported their computer not being able to open any apps, that's not likely the problem. I read some things about corruptions in the launch services db, but the fact that at least one person was able to run open in a terminal and get it to work makes that unlikely as a cause as well.

I did notice that while Orbit does check whether a user is logged in before trying to launch the desktop, it only checks that the user isn't root -- it doesn't check for _mbsetupuser, which could be the case after an upgrade. Why Orbit would continue to see that as the logged-in user ever after it is restarted is a mystery to me, but it's worth checking for that as well and adding logs to IsUserLoggedInViaGui when root or the setup user is detected. If being stuck detecting _mbsetupuser is the issue, then attempting to launch as a different user could solve it but... how to determine that user?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working as documented #g-orchestration Orchestration product group P2 Prioritize as urgent :release Ready to write code. Scheduled in a release. See "Making changes" in handbook.
Projects
None yet
Development

No branches or pull requests

9 participants