Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: telemetry service #1243

Merged
merged 20 commits into from
Jan 13, 2025
Merged

Conversation

MCozhusheck
Copy link
Collaborator

@MCozhusheck MCozhusheck commented Dec 12, 2024

Description

Resolves #1186
Add new telemetry service which receives events and immediately sends them.

example event sent:

{
  "app_id": "1b%i31CBnblWxHYxSanG",
  "cpu_name": "11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz",
  "created_at": {
    "nanos_since_epoch": 439482034,
    "secs_since_epoch": 1734106140
  },
  "event_name": "checking-latest-version-mmproxy",
  "event_value": { "percentage": 10, "service": "mmproxy" },
  "gpu_name": "Intel(R) Iris(R) Xe Graphics [0x9a49]",
  "os": "Linux",
  "user_id": "v3,FZNcifkHKvDUJ9Luwc4xBe5HiGHV,0.8.24,VWE2AdhnnMOqNDd7fcQnqVVEBAeee68KBM2+8K7iznY=",
  "version": "0.8.24"
}

Motivation and Context

This should be useful for tracking how many users are stuck during application setup. We saw some users being stuck at 30% with now feedback. This should help in debugging issues like that.

How Has This Been Tested?

Run application and watch for error message coming from telemetry service. If no error are present that means that events were sent correctly. Also check the telemetry dashboard to check if any messages are present on the server.

To verify frontend command I created test button to invoke like this:

const sendTelemetry = () =>
    invoke('send_data_telemetry_service', {
        eventName: 'settings_opened',
        data: { service: 'frontend-test', percentage: 100 },
    });

And found those event present in the kafka dashboard:

{
  "app_id": "CX4h8%%G$xQ0hpnZSu@N",
  "cpu_name": "11th Gen Intel(R) Core(TM) i5-1135G7 @ 2.40GHz",
  "created_at": {
    "nanos_since_epoch": 795656093,
    "secs_since_epoch": 1736248665
  },
  "event_name": "settings_opened",
  "event_value": { "percentage": 100, "service": "frontend-test" },
  "gpu_name": "Intel(R) Iris(R) Xe Graphics [0x9a49]",
  "os": "Linux",
  "user_id": "v3,FZNcifkHKvDUJ9Luwc4xBe5HiGHV,0.8.40,VWE2AdhnnMOqNDd7fcQnqVVEBAeee68KBM2+8K7iznY",
  "version": "0.8.40"
}

What process can a PR reviewer use to test or verify this change?

Same as above.

Breaking Changes

  • None
  • Requires data directory on base node to be deleted
  • Requires hard fork
  • Other - Please specify

@MCozhusheck MCozhusheck marked this pull request as ready for review January 8, 2025 12:50
@MCozhusheck MCozhusheck requested a review from a team as a code owner January 8, 2025 12:50
@MCozhusheck MCozhusheck requested review from brianp and mmrrnn January 8, 2025 13:11
Copy link
Contributor

@leet4tari leet4tari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

utACK - Would need the BETA env

Copy link
Collaborator

@brianp brianp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added the telemetry address to CI.

Comment on lines 277 to 279
let telemetry_service = state.telemetry_service.clone();
let telemetry_service = telemetry_service.read().await;
let telemetry_service = telemetry_service.deref();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

interesting. We can't just clone it, or borrow it. We have to deref it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes it's unnecessary, we can just use & to get reference to rw lock guard.

@@ -265,6 +309,17 @@ async fn setup_inner(
sleep(Duration::from_secs(1));
}

drop(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we have to drop after each call?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Compiler complains if we ignore result of sending telemetry data. This workaround another possibility could be to write let _ = ...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be my preference to use let _ = ... just because having the drop makes me wonder if there's some special case happening that really requires the drop, where let _ = ... is a common pattern.

Comment on lines 180 to 191
let hardware = HardwareStatusMonitor::current();
let cpu_name = hardware.get_cpu_devices().await?;
let cpu_name = match cpu_name.first() {
Some(cpu) => cpu.public_properties.name.clone(),
None => "Unknown".to_string(),
};
let gpu_name = hardware.get_gpu_devices().await?;
let gpu_name = match gpu_name.first() {
Some(gpu) => gpu.public_properties.name.clone(),
None => "Unknown".to_string(),
};
let os = PlatformUtils::detect_current_os();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it heavy to re-request this data constantly?

Most of this won't change. Could we get it once during initialization and cache it for each request?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, moved it to initialization.

@MCozhusheck MCozhusheck force-pushed the feat/telemetry-service branch from fdb9b19 to 7c5994c Compare January 10, 2025 15:47
@brianp brianp merged commit 8a61985 into tari-project:main Jan 13, 2025
8 checks passed
@brianp brianp mentioned this pull request Jan 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add additional telemetry events
3 participants