Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add client metrics #751

Merged
merged 38 commits into from
Nov 11, 2024
Merged
Show file tree
Hide file tree
Changes from 8 commits
Commits
Show all changes
38 commits
Select commit Hold shift + click to select a range
36460a1
WIP: add client metrics
isacikgoz May 28, 2024
ca3db46
Merge remote-tracking branch 'origin/master' into add-client-metrics
isacikgoz May 28, 2024
d1ca053
remove toolchain
isacikgoz May 28, 2024
efce9ac
reflect revivew comments
isacikgoz Jun 3, 2024
b462c21
fix golangci
isacikgoz Jun 3, 2024
7e067aa
Merge remote-tracking branch 'origin/master' into add-client-metrics
isacikgoz Jun 11, 2024
6656416
remove golint
isacikgoz Jun 11, 2024
e8be503
address review comments
isacikgoz Jun 11, 2024
a7ff258
Apply suggestions from code review
isacikgoz Jul 30, 2024
1c9dbce
[MM-58179] Review networking performance at scale (#752)
streamer45 Jun 12, 2024
6b15c15
Update postgres client (#755)
streamer45 Jun 13, 2024
b1a23cd
Bump version strings in master (#757)
agarciamontoro Jun 19, 2024
eba5843
Update retransmission threshold (#760)
agnivade Jul 2, 2024
6da2986
Update coverage frequency docs (#761)
agarciamontoro Jul 3, 2024
1692691
MM-59319: Allow opensearch installations to be created (#763)
agnivade Jul 4, 2024
67dfe4d
Allow 0 shard replicas (#764)
agarciamontoro Jul 5, 2024
d205832
MM-59316: Remove cloudwatch log policy (#762)
agarciamontoro Jul 15, 2024
94df8c1
Improvements from Redis investigation (#765)
agnivade Jul 15, 2024
5e0bf36
feat: allow saml with the external auth provider (#759)
fmartingr Jul 22, 2024
7b334a4
Revert the disabling of proxy_cache_lock (#768)
agarciamontoro Jul 25, 2024
0ee03f2
Bump v1.19.0 rc2 (#770)
agarciamontoro Jul 25, 2024
738672d
Run make update-dependencies (#772)
agarciamontoro Jul 26, 2024
7638e57
Fix race condition in RoundTrip test (#774)
agarciamontoro Jul 29, 2024
2aa80fc
Bump version to v1.19.0 (#775)
agarciamontoro Jul 29, 2024
382152f
Add local file option for Mattermost download url (#767)
cpoile Jul 29, 2024
53b1c87
Fix incorrect deployer.sample.json (#776)
agnivade Jul 30, 2024
ca56401
Update the load test dashboard panels to new format (#777)
agnivade Jul 30, 2024
f91dc82
reflect reviews
isacikgoz Jul 30, 2024
a42059f
fix the TTFB measurement
isacikgoz Jul 30, 2024
016e7ac
Merge remote-tracking branch 'origin/master' into add-client-metrics
isacikgoz Jul 30, 2024
2562236
remove ttb observation from controller actions
isacikgoz Jul 31, 2024
02a451d
Merge remote-tracking branch 'origin/master' into add-client-metrics
isacikgoz Aug 12, 2024
abda30f
Merge remote-tracking branch 'origin/master' into add-client-metrics
isacikgoz Sep 9, 2024
63fd563
revert TTFB measurement to be calculated only on login
isacikgoz Sep 9, 2024
06bc18a
remove random platform/agent
isacikgoz Sep 9, 2024
7d1e7a8
Merge remote-tracking branch 'origin/master' into add-client-metrics
isacikgoz Sep 16, 2024
b4d4170
Merge remote-tracking branch 'origin/master' into add-client-metrics
isacikgoz Nov 1, 2024
f4e3947
Merge remote-tracking branch 'origin/master' into add-client-metrics
isacikgoz Nov 7, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 1 addition & 11 deletions .golangci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,6 @@ linters-settings:
gofmt:
simplify: true
govet:
check-shadowing: false # set this to true from time to time to check for possible issues
disable-all: true
enable:
- asmdecl # report mismatches between assembly files and Go declarations
Expand Down Expand Up @@ -36,19 +35,10 @@ linters:
disable-all: true
enable:
- gofmt # Checks whether code was gofmt-ed
- golint # Differs from gofmt. Gofmt reformats Go source code, whereas golint prints out style mistakes
# - revive # Needs a review of rules https://mattermost.atlassian.net/browse/MM-58690
isacikgoz marked this conversation as resolved.
Show resolved Hide resolved
- gosimple # Linter for Go source code that specializes in simplifying a code
- govet # Examines Go source code and reports suspicious constructs, such as Printf calls whose arguments do not align with the format string
- ineffassign # Detects when assignments to existing variables are not used
- unconvert # Removes unnecessary type conversions
- unused # Checks Go code for unused constants, variables, functions and types
- exportloopref # Checks for pointers to enclosing loop variables

issues:
exclude-rules:
- linters:
# Used to avoid errors regarding naming conventions
# We can remove this section once we decide to fix them
- golint
text: "should be.*ID|CamelCase$|FileJSON`$|ConsoleJSON`$|APIURL`$|calling this Response$"

isacikgoz marked this conversation as resolved.
Show resolved Hide resolved
58 changes: 58 additions & 0 deletions loadtest/control/simulcontroller/actions.go
Original file line number Diff line number Diff line change
Expand Up @@ -67,6 +67,8 @@ func (c *SimulController) disconnect() error {
}

func (c *SimulController) reload(full bool) control.UserActionResponse {
start := time.Now()

if full {
if err := c.disconnect(); err != nil {
return control.UserActionResponse{Err: control.NewUserError(err)}
Expand All @@ -80,6 +82,14 @@ func (c *SimulController) reload(full bool) control.UserActionResponse {
}
}

defer func() {
elapsed := time.Since(start).Seconds()
err := c.user.ObserveClientMetric(model.ClientPageLoadDuration, elapsed)
if err != nil {
mlog.Warn("Failed to store observation", mlog.Err(err))
}
}()

// A full reload always calls GET /api/v4/users?page=0&per_page=100,
// regardless of GraphQL enabled or not
_, err := c.user.GetUsers(0, 100)
Expand Down Expand Up @@ -139,11 +149,17 @@ func (c *SimulController) loginOrSignUp(u user.User) control.UserActionResponse
}

func (c *SimulController) login(u user.User) control.UserActionResponse {
start := time.Now()
for {
resp := control.Login(u)
if resp.Err == nil {
err := c.connect()
if err == nil {
elapsed := time.Since(start).Seconds()
err := c.user.ObserveClientMetric(model.ClientTimeToFirstByte, elapsed)
if err != nil {
mlog.Warn("Failed to store observation", mlog.Err(err))
}
Comment on lines +159 to +163
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still think this is not semantically correct. I'm not sure if it's a problem with the naming (how are we using it exactly in the webapp?) or with the code here. For reference: I would understand this if the metric would be called something like model.ClientTimeToLogin.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, answering to your comment in the other thread (sorry, just read it now):

TIL! We want to measure that whenever a user logs in (thinking of a scenario where opens a web page). But the user could also be logged in with the session so we may need to cover that scenario. The goal of the PR is to measure the load in the server with client metrics, but I think it could be a good idea to actually get realistic metrics from the agents.
Maybe we can add this to backlog, WDYT?

I'm ok adding a ticket to refine this to the backlog, but I'd still like to understand how the webapp uses this specific metric, I think I lack some context here, sorry.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An explanation can be found here: https://web.dev/articles/ttfb and turns out my interpretation is also incorrect. Just sent a canonical way of measuring it with your suggestion 😅

isacikgoz marked this conversation as resolved.
Show resolved Hide resolved
return resp
}
c.status <- c.newErrorStatus(err)
Expand Down Expand Up @@ -280,12 +296,20 @@ func loadTeam(u user.User, team *model.Team, gqlEnabled bool) control.UserAction
}

func (c *SimulController) switchTeam(u user.User) control.UserActionResponse {
start := time.Now()
team, err := u.Store().RandomTeam(store.SelectMemberOf | store.SelectNotCurrent)
if errors.Is(err, memstore.ErrTeamStoreEmpty) {
return control.UserActionResponse{Info: "no other team to switch to"}
} else if err != nil {
return control.UserActionResponse{Err: control.NewUserError(err)}
}
defer func() {
elapsed := time.Since(start).Seconds()
err := c.user.ObserveClientMetric(model.ClientTeamSwitchDuration, elapsed)
if err != nil {
mlog.Warn("Failed to store observation", mlog.Err(err))
}
}()

c.status <- c.newInfoStatus(fmt.Sprintf("switched to team %s", team.Id))

Expand Down Expand Up @@ -434,6 +458,7 @@ func fetchPostsInfo(u user.User, postsIds []string) error {
}

func viewChannel(u user.User, channel *model.Channel) control.UserActionResponse {
start := time.Now()
collapsedThreads, resp := control.CollapsedThreadsEnabled(u)
if resp.Err != nil {
return resp
Expand Down Expand Up @@ -477,6 +502,13 @@ func viewChannel(u user.User, channel *model.Channel) control.UserActionResponse
// frequencies are also calculated that way.
// This is a good enough approximation.
if rand.Float64() < 0.01 {
defer func() {
elapsed := time.Since(start).Seconds()
err := u.ObserveClientMetric(model.ClientRHSLoadDuration, elapsed)
if err != nil {
mlog.Warn("Failed to store observation", mlog.Err(err))
}
}()
excludeFileCount = false
}

Expand Down Expand Up @@ -520,6 +552,7 @@ func viewChannel(u user.User, channel *model.Channel) control.UserActionResponse
}

func switchChannel(u user.User) control.UserActionResponse {
start := time.Now()
team, err := u.Store().CurrentTeam()
if err != nil {
return control.UserActionResponse{Err: control.NewUserError(err)}
Expand All @@ -535,6 +568,13 @@ func switchChannel(u user.User) control.UserActionResponse {
if resp := viewChannel(u, &channel); resp.Err != nil {
return control.UserActionResponse{Err: control.NewUserError(resp.Err)}
}
defer func() {
elapsed := time.Since(start).Seconds()
err := u.ObserveClientMetric(model.ClientChannelSwitchDuration, elapsed)
if err != nil {
mlog.Warn("Failed to store observation", mlog.Err(err))
}
}()

if err := u.SetCurrentChannel(&channel); err != nil {
return control.UserActionResponse{Err: control.NewUserError(err)}
Expand Down Expand Up @@ -1472,6 +1512,7 @@ func sendTypingEventIfEnabled(u user.User, channelId string) error {
}

func (c *SimulController) viewGlobalThreads(u user.User) control.UserActionResponse {
start := time.Now()
collapsedThreads, resp := control.CollapsedThreadsEnabled(u)
if resp.Err != nil || !collapsedThreads {
return resp
Expand Down Expand Up @@ -1505,6 +1546,14 @@ func (c *SimulController) viewGlobalThreads(u user.User) control.UserActionRespo
}
}

defer func() {
elapsed := time.Since(start).Seconds()
err := c.user.ObserveClientMetric(model.ClientGlobalThreadsLoadDuration, elapsed)
if err != nil {
mlog.Warn("Failed to store observation", mlog.Err(err))
}
}()

oldestThreadId := threads[len(threads)-1].PostId
// scrolling between 1 and 3 times
numScrolls := rand.Intn(3) + 1
Expand Down Expand Up @@ -2154,3 +2203,12 @@ func (c *SimulController) generateUserReport(u user.User) control.UserActionResp

return control.UserActionResponse{Info: fmt.Sprintf("generated user report for %d users", totalUsers)}
}

func submitPerformanceReport(u user.User) control.UserActionResponse {
err := u.SubmitPerformanceReport()
if err != nil {
return control.UserActionResponse{Err: control.NewUserError(err)}
}

return control.UserActionResponse{Info: "submitted client performance report"}
}
20 changes: 17 additions & 3 deletions loadtest/control/simulcontroller/periodic.go
Original file line number Diff line number Diff line change
Expand Up @@ -10,19 +10,33 @@ import (

const (
getUsersStatusByIdsInterval = 60 * time.Second
submitClientMetircsInterval = 60 * time.Second
isacikgoz marked this conversation as resolved.
Show resolved Hide resolved
)

func (c *SimulController) periodicActions(wg *sync.WaitGroup) {
defer wg.Done()
st := time.NewTicker(getUsersStatusByIdsInterval)
mt := time.NewTicker(submitClientMetircsInterval)
isacikgoz marked this conversation as resolved.
Show resolved Hide resolved

defer func() {
mt.Stop()
st.Stop()
wg.Done()
}()

for {
select {
case <-time.After(getUsersStatusByIdsInterval):
case <-st.C:
if resp := c.getUsersStatuses(); resp.Err != nil {
c.status <- c.newErrorStatus(resp.Err)
} else {
c.status <- c.newInfoStatus(resp.Info)
}
// We can add more periodic actions here.
case <-mt.C:
if resp := submitPerformanceReport(c.user); resp.Err != nil {
c.status <- c.newErrorStatus(resp.Err)
} else {
c.status <- c.newInfoStatus(resp.Info)
}
case <-c.disconnectChan:
return
}
Expand Down
57 changes: 57 additions & 0 deletions loadtest/store/memstore/store.go
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,7 @@ type MemStore struct {
threadsQueue *CQueue[model.ThreadResponse]
sidebarCategories map[string]map[string]*model.SidebarCategoryWithChannels
featureFlags map[string]bool
report *model.PerformanceReport
}

// New returns a new instance of MemStore with the given config.
Expand Down Expand Up @@ -125,6 +126,7 @@ func (s *MemStore) Clear() {
s.threadsQueue.Reset()
clearMap(s.sidebarCategories)
s.sidebarCategories = map[string]map[string]*model.SidebarCategoryWithChannels{}
s.report = &model.PerformanceReport{}
}

func (s *MemStore) setupQueues(config *Config) error {
Expand Down Expand Up @@ -1185,3 +1187,58 @@ func (s *MemStore) PostsWithAckRequests() ([]string, error) {

return ids, nil
}

func (s *MemStore) SetPerformanceReport(report *model.PerformanceReport) {
s.lock.Lock()
defer s.lock.Unlock()

s.report = report
}

func (s *MemStore) PerformanceReport() (*model.PerformanceReport, error) {
s.lock.Lock()
defer s.lock.Unlock()
if s.report == nil {
return nil, nil
}

report := &model.PerformanceReport{
Version: s.report.Version,
ClientID: s.report.ClientID,
Start: s.report.Start,
End: s.report.End,
}

if s.report.Labels != nil {
report.Labels = make(map[string]string)
}
for k, v := range s.report.Labels {
report.Labels[k] = v
}

if s.report.Histograms != nil {
report.Histograms = make([]*model.MetricSample, len(s.report.Histograms))
}
for i, h := range s.report.Histograms {
report.Histograms[i] = &model.MetricSample{
Metric: h.Metric,
Value: h.Value,
Timestamp: h.Timestamp,
Labels: h.Labels,
}
}

if s.report.Counters != nil {
report.Counters = make([]*model.MetricSample, len(s.report.Counters))
}
for i, h := range s.report.Counters {
report.Counters[i] = &model.MetricSample{
Metric: h.Metric,
Value: h.Value,
Timestamp: h.Timestamp,
Labels: h.Labels,
}
}

return report, nil
}
6 changes: 6 additions & 0 deletions loadtest/store/store.go
Original file line number Diff line number Diff line change
Expand Up @@ -144,6 +144,9 @@ type UserStore interface {

// PostsWithAckRequests returns IDs of the posts that asked for acknowledgment.
PostsWithAckRequests() ([]string, error)

// PerformanceReport returns a copy of underlying performance report
PerformanceReport() (*model.PerformanceReport, error)
}

// MutableUserStore is a super-set of UserStore which, apart from providing
Expand Down Expand Up @@ -252,4 +255,7 @@ type MutableUserStore interface {

// SidebarCategories
SetCategories(teamID string, sidebarCategories *model.OrderedSidebarCategories) error

// ClientPerformance
SetPerformanceReport(report *model.PerformanceReport)
}
4 changes: 4 additions & 0 deletions loadtest/user/user.go
Original file line number Diff line number Diff line change
Expand Up @@ -317,4 +317,8 @@ type User interface {
// GraphQL
GetInitialDataGQL() error
GetChannelsAndChannelMembersGQL(teamID string, includeDeleted bool, channelsCursor, channelMembersCursor string) (string, string, error)

// Client Metrics
ObserveClientMetric(t model.MetricType, v float64) error
SubmitPerformanceReport() error
}
68 changes: 68 additions & 0 deletions loadtest/user/userentity/report.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
package userentity

import (
"context"
"math/rand"
"time"

"github.com/mattermost/mattermost/server/public/model"
)

var (
userAgents = []string{"desktop", "firefox", "chrome", "safari", "edge", "other"}
platforms = []string{"linux", "macos", "ios", "android", "windows", "other"}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's just use other/other for this. If we use all the platforms, then this becomes hard for a user to open a browser session and observe the "actual" client metrics in a live load test.

More context here: https://community.mattermost.com/core/pl/9y8cgyxxq7yq5yjorpqykbjkrw

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for bringing this up. I think this approach will strike good balance between team wanting to use load testing tool for the web client metrics and tool being used to measure impact of web client metrics with load test.

)

func (ue *UserEntity) ObserveClientMetric(t model.MetricType, v float64) error {
report, err := ue.store.PerformanceReport()
if err != nil {
return err
}
defer ue.store.SetPerformanceReport(report)

switch t {
case model.ClientTimeToFirstByte, model.ClientFirstContentfulPaint, model.ClientLargestContentfulPaint,
model.ClientInteractionToNextPaint, model.ClientCumulativeLayoutShift, model.ClientChannelSwitchDuration,
model.ClientTeamSwitchDuration, model.ClientRHSLoadDuration:
if report.Histograms == nil {
report.Histograms = make([]*model.MetricSample, 0)
}

report.Histograms = append(report.Histograms, &model.MetricSample{
Metric: t,
Value: v,
Timestamp: float64(time.Now().UnixMilli()) / 1000,
})
default:
// server also ignores the unkown typed metrics
}
return nil
}

func (ue *UserEntity) SubmitPerformanceReport() error {
report, err := ue.store.PerformanceReport()
if err != nil {
return err
}
report.End = float64(time.Now().UnixMilli() / 1000)
isacikgoz marked this conversation as resolved.
Show resolved Hide resolved

_, err = ue.client.SubmitClientMetrics(context.Background(), report)
if err != nil {
return err
}
ue.store.SetPerformanceReport(&model.PerformanceReport{
Start: float64(time.Now().UnixMilli() / 1000),
isacikgoz marked this conversation as resolved.
Show resolved Hide resolved
})

return nil
}

func randomUserAgent() string {
i := rand.Intn(len(userAgents))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whaaat? Are you saying there's an equal chance of Safari sending metrics than Chrome? :p

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Vanilla macos users are quite a lot among devs to be fair :p

return userAgents[i]
}

func randomPlatform() string {
i := rand.Intn(len(platforms))
return platforms[i]
}
9 changes: 9 additions & 0 deletions loadtest/user/userentity/user.go
Original file line number Diff line number Diff line change
Expand Up @@ -147,6 +147,15 @@ func New(setup Setup, config Config) *UserEntity {
if err != nil {
return nil
}
ue.store.SetPerformanceReport(&model.PerformanceReport{
Version: "0.1.0",
Labels: map[string]string{
"platform": randomPlatform(),
"agent": randomUserAgent(),
},
ClientID: model.NewId(),
Start: float64(time.Now().UnixMilli() / 1000),
isacikgoz marked this conversation as resolved.
Show resolved Hide resolved
})

return &ue
}
Expand Down
Loading