Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RE-5863] Resolve potential race condition in log fields #185

Open
wants to merge 9 commits into
base: master
Choose a base branch
from

Conversation

jalacardio
Copy link

@jalacardio jalacardio commented Aug 3, 2022

Description

In current version of logFields, we discovered that there is a potential risk of having racing condition when multiple goroutine try to access logFields under an identical context (ctx). As you may see from the report below, when running parallelism test on the current implementation, a race condition is emerged even with a very little sample rate.
Therefore, in this PR, we are turning the old logFields into protected logFields by applying read/write lock. Now when logFields being accessed, it will first check if some other instance is also accessing it, if yes, it will wait until the access is freed up so no simultaneous write will happen.
In effects, FromContext doesn't seems to have it's performance decreased, while AddToLogContext increased about 10% of it's process time, and it's all being said with race condition is no longer a concern.

Review Guideline

FromContext
Applied with read lock as it's a public method, now we are return a copy of the map instead of giving access directly to logFields

fromContext
Same implementation as before, but now it's only for private access

AddToLogContext
Now write access is locked when writing new data to logFields

How to Test

Parallelism Test
Previously

Reader 2 read from logfields map[k1:v1 k2:v2]
Reader 3 read from logfields map[k1:v1 k2:v2]
Reader 1 read from logfields map[k1:v1 k2:v2]
==================
WARNING: DATA RACE
Write at 0x00c00009f380 by goroutine 14:
  runtime.mapassign_faststr()
      /Users/chiahewen/.goenv/versions/1.16.8/src/runtime/map_faststr.go:202 +0x0
  github.com/carousell/Orion/utils/log/loggers.LogFields.Add()
      /Users/chiahewen/go/1.16.8/src/github.com/carousell/Orion/utils/log/loggers/fields.go:19 +0x11d
  github.com/carousell/Orion/utils/log/loggers.AddToLogContext()
      /Users/chiahewen/go/1.16.8/src/github.com/carousell/Orion/utils/log/loggers/fields.go:38 +0xcd
  github.com/carousell/Orion/utils/log/loggers/test_test.writeWorker()
      /Users/chiahewen/go/1.16.8/src/github.com/carousell/Orion/utils/log/loggers/test/parallelism_test.go:29 +0x1ce
  github.com/carousell/Orion/utils/log/loggers/test_test.TestParallelWrite.func1()
      /Users/chiahewen/go/1.16.8/src/github.com/carousell/Orion/utils/log/loggers/test/parallelism_test.go:61 +0xa4

Previous write at 0x00c00009f380 by goroutine 16:
  runtime.mapassign_faststr()
      /Users/chiahewen/.goenv/versions/1.16.8/src/runtime/map_faststr.go:202 +0x0
  github.com/carousell/Orion/utils/log/loggers.LogFields.Add()
      /Users/chiahewen/go/1.16.8/src/github.com/carousell/Orion/utils/log/loggers/fields.go:19 +0x11d
  github.com/carousell/Orion/utils/log/loggers.AddToLogContext()
      /Users/chiahewen/go/1.16.8/src/github.com/carousell/Orion/utils/log/loggers/fields.go:38 +0xcd
  github.com/carousell/Orion/utils/log/loggers/test_test.writeWorker()
      /Users/chiahewen/go/1.16.8/src/github.com/carousell/Orion/utils/log/loggers/test/parallelism_test.go:29 +0x1ce
  github.com/carousell/Orion/utils/log/loggers/test_test.TestParallelWrite.func1()
      /Users/chiahewen/go/1.16.8/src/github.com/carousell/Orion/utils/log/loggers/test/parallelism_test.go:61 +0xa4

Goroutine 14 (running) created at:
  github.com/carousell/Orion/utils/log/loggers/test_test.TestParallelWrite()
      /Users/chiahewen/go/1.16.8/src/github.com/carousell/Orion/utils/log/loggers/test/parallelism_test.go:59 +0x204
  testing.tRunner()
      /Users/chiahewen/.goenv/versions/1.16.8/src/testing/testing.go:1193 +0x202

Goroutine 16 (running) created at:
  github.com/carousell/Orion/utils/log/loggers/test_test.TestParallelWrite()
      /Users/chiahewen/go/1.16.8/src/github.com/carousell/Orion/utils/log/loggers/test/parallelism_test.go:59 +0x204
  testing.tRunner()
      /Users/chiahewen/.goenv/versions/1.16.8/src/testing/testing.go:1193 +0x202
==================
Writer 3 wrote key3:val8081
Writer 2 wrote key2:val1847
Writer 1 wrote key1:val7887
lf map[key1:val7887 key2:val1847 key3:val8081 test-key:test-value]
--- FAIL: TestParallelWrite (0.25s)
    testing.go:1092: race detected during execution of test
==================
WARNING: DATA RACE
Write at 0x00c000196090 by goroutine 23:
  runtime.mapassign_faststr()
      /Users/chiahewen/.goenv/versions/1.16.8/src/runtime/map_faststr.go:202 +0x0
  github.com/carousell/Orion/utils/log/loggers.LogFields.Add()
      /Users/chiahewen/go/1.16.8/src/github.com/carousell/Orion/utils/log/loggers/fields.go:19 +0x11d
  github.com/carousell/Orion/utils/log/loggers.AddToLogContext()
      /Users/chiahewen/go/1.16.8/src/github.com/carousell/Orion/utils/log/loggers/fields.go:38 +0xcd
  github.com/carousell/Orion/utils/log/loggers/test_test.writeWorker()
      /Users/chiahewen/go/1.16.8/src/github.com/carousell/Orion/utils/log/loggers/test/parallelism_test.go:29 +0x1ce
  github.com/carousell/Orion/utils/log/loggers/test_test.TestParallelReadAndWrite.func2()
      /Users/chiahewen/go/1.16.8/src/github.com/carousell/Orion/utils/log/loggers/test/parallelism_test.go:93 +0xa4

Previous write at 0x00c000196090 by goroutine 22:
  runtime.mapassign_faststr()
      /Users/chiahewen/.goenv/versions/1.16.8/src/runtime/map_faststr.go:202 +0x0
  github.com/carousell/Orion/utils/log/loggers.LogFields.Add()
      /Users/chiahewen/go/1.16.8/src/github.com/carousell/Orion/utils/log/loggers/fields.go:19 +0x11d
  github.com/carousell/Orion/utils/log/loggers.AddToLogContext()
      /Users/chiahewen/go/1.16.8/src/github.com/carousell/Orion/utils/log/loggers/fields.go:38 +0xcd
  github.com/carousell/Orion/utils/log/loggers/test_test.writeWorker()
      /Users/chiahewen/go/1.16.8/src/github.com/carousell/Orion/utils/log/loggers/test/parallelism_test.go:29 +0x1ce
  github.com/carousell/Orion/utils/log/loggers/test_test.TestParallelReadAndWrite.func2()
      /Users/chiahewen/go/1.16.8/src/github.com/carousell/Orion/utils/log/loggers/test/parallelism_test.go:93 +0xa4

Goroutine 23 (running) created at:
  github.com/carousell/Orion/utils/log/loggers/test_test.TestParallelReadAndWrite()
      /Users/chiahewen/go/1.16.8/src/github.com/carousell/Orion/utils/log/loggers/test/parallelism_test.go:91 +0x2c5
  testing.tRunner()
      /Users/chiahewen/.goenv/versions/1.16.8/src/testing/testing.go:1193 +0x202

Goroutine 22 (running) created at:
  github.com/carousell/Orion/utils/log/loggers/test_test.TestParallelReadAndWrite()
      /Users/chiahewen/go/1.16.8/src/github.com/carousell/Orion/utils/log/loggers/test/parallelism_test.go:91 +0x2c5
  testing.tRunner()
      /Users/chiahewen/.goenv/versions/1.16.8/src/testing/testing.go:1193 +0x202
==================
Writer 3 wrote key3:val1318
Reader 3 read from logfields map[key1:val4059 key2:val2081 key3:val1318 test-key:test-value]
==================
WARNING: DATA RACE
Read at 0x00c00019c098 by goroutine 20:
  reflect.typedmemmove()
      /Users/chiahewen/.goenv/versions/1.16.8/src/runtime/mbarrier.go:177 +0x0
  reflect.copyVal()
      /Users/chiahewen/.goenv/versions/1.16.8/src/reflect/value.go:1310 +0x7d
  reflect.(*MapIter).Value()
      /Users/chiahewen/.goenv/versions/1.16.8/src/reflect/value.go:1264 +0x186
  internal/fmtsort.Sort()
      /Users/chiahewen/.goenv/versions/1.16.8/src/internal/fmtsort/sort.go:65 +0x2d9
  fmt.(*pp).printValue()
      /Users/chiahewen/.goenv/versions/1.16.8/src/fmt/print.go:769 +0x14cf
  fmt.(*pp).printArg()
      /Users/chiahewen/.goenv/versions/1.16.8/src/fmt/print.go:712 +0x284
  fmt.(*pp).doPrintf()
      /Users/chiahewen/.goenv/versions/1.16.8/src/fmt/print.go:1026 +0x330
  fmt.Fprintf()
      /Users/chiahewen/.goenv/versions/1.16.8/src/fmt/print.go:204 +0x84
  fmt.Printf()
      /Users/chiahewen/.goenv/versions/1.16.8/src/fmt/print.go:213 +0x111
  github.com/carousell/Orion/utils/log/loggers/test_test.readWorker()
      /Users/chiahewen/go/1.16.8/src/github.com/carousell/Orion/utils/log/loggers/test/parallelism_test.go:23 +0x67
  github.com/carousell/Orion/utils/log/loggers/test_test.TestParallelReadAndWrite.func1()
      /Users/chiahewen/go/1.16.8/src/github.com/carousell/Orion/utils/log/loggers/test/parallelism_test.go:85 +0x99

Previous write at 0x00c00019c098 by goroutine 22:
  github.com/carousell/Orion/utils/log/loggers.LogFields.Add()
      /Users/chiahewen/go/1.16.8/src/github.com/carousell/Orion/utils/log/loggers/fields.go:19 +0x132
  github.com/carousell/Orion/utils/log/loggers.AddToLogContext()
      /Users/chiahewen/go/1.16.8/src/github.com/carousell/Orion/utils/log/loggers/fields.go:38 +0xcd
  github.com/carousell/Orion/utils/log/loggers/test_test.writeWorker()
      /Users/chiahewen/go/1.16.8/src/github.com/carousell/Orion/utils/log/loggers/test/parallelism_test.go:29 +0x1ce
  github.com/carousell/Orion/utils/log/loggers/test_test.TestParallelReadAndWrite.func2()
      /Users/chiahewen/go/1.16.8/src/github.com/carousell/Orion/utils/log/loggers/test/parallelism_test.go:93 +0xa4

Goroutine 20 (running) created at:
  github.com/carousell/Orion/utils/log/loggers/test_test.TestParallelReadAndWrite()
      /Users/chiahewen/go/1.16.8/src/github.com/carousell/Orion/utils/log/loggers/test/parallelism_test.go:83 +0x20e
  testing.tRunner()
      /Users/chiahewen/.goenv/versions/1.16.8/src/testing/testing.go:1193 +0x202

Goroutine 22 (running) created at:
  github.com/carousell/Orion/utils/log/loggers/test_test.TestParallelReadAndWrite()
      /Users/chiahewen/go/1.16.8/src/github.com/carousell/Orion/utils/log/loggers/test/parallelism_test.go:91 +0x2c5
  testing.tRunner()
      /Users/chiahewen/.goenv/versions/1.16.8/src/testing/testing.go:1193 +0x202
==================
Reader 2 read from logfields map[key1:val4059 key2:val2081 key3:val1318 test-key:test-value]
Reader 1 read from logfields map[key1:val4059 key2:val2081 key3:val1318 test-key:test-value]
Writer 1 wrote key1:val4059
Writer 2 wrote key2:val2081
map[key1:val4059 key2:val2081 key3:val1318 test-key:test-value]
--- FAIL: TestParallelReadAndWrite (0.25s)
    testing.go:1092: race detected during execution of test
FAIL
exit status 1
FAIL    github.com/carousell/Orion/utils/log/loggers/test       1.136s

Now

Reader 9 read from logfields map[k1:v1 k2:v2]
Reader 2 read from logfields map[k1:v1 k2:v2]
...
Writer 47 wrote key47:val4078
Writer 48 wrote key48:val6159
lf map[key1:val1847 key10:val8511 key11:val3300 key12:val694 key13:val1211 key14:val8162 key15:val3237 key16:val495 key17:val9947 key18:val4728 key19:val3274 key2:val7887 key20:val5089 key21:val1445 key22:val9106 key23:val1528 key24:val5466 key25:val6258 key26:val8047 key27:val8287 key28:val2790 key29:val2888 key3:val1318 key30:val3015 key31:val5541 key32:val408 key33:val7387 key34:val6831 key35:val5429 key36:val1737 key37:val631 key38:val1485 key39:val5194 key4:val8081 key40:val5356 key41:val6413 key42:val3090 key43:val5026 key44:val4147 key45:val563 key46:val4324 key47:val4078 key48:val6159 key49:val2433 key5:val2081 key50:val1353 key6:val4059 key7:val2540 key8:val4425 key9:val456 test-key:test-value]
Reader 29 read from logfields map[test-key:test-value]
Reader 18 read from logfields map[test-key:test-value]
...
map[key1:val1957 key10:val9355 key11:val2451 key12:val4538 key13:val8266 key14:val156 key15:val2605 key16:val5561 key17:val7202 key18:val9828 key19:val4783 key2:val3721 key20:val4376 key21:val5746 key22:val1563 key23:val9002 key24:val9718 key25:val5447 key26:val5094 key27:val1577 key28:val7463 key29:val7996 key3:val3000 key30:val6420 key31:val8623 key32:val953 key33:val1137 key34:val3133 key35:val9241 key36:val59 key37:val3033 key38:val8643 key39:val3891 key4:val7189 key40:val2002 key41:val8878 key42:val9336 key43:val2546 key44:val9107 key45:val7940 key46:val6503 key47:val552 key48:val9843 key49:val2205 key5:val2199 key50:val1598 key6:val8705 key7:val9703 key8:val2888 key9:val8510 test-key:test-value]
PASS
ok      github.com/carousell/Orion/utils/log/loggers/test       1.200s

Benchmark stat
Previously

cpu: Intel(R) Core(TM) i7-1068NG7 CPU @ 2.30GHz
BenchmarkFromContext
BenchmarkFromContext-8                  41110058                25.81 ns/op
BenchmarkFromAddToLogContext
BenchmarkFromAddToLogContext-8           2947244               409.4 ns/op

Now

cpu: Intel(R) Core(TM) i7-1068NG7 CPU @ 2.30GHz
BenchmarkFromContext
BenchmarkFromContext-8                  39115351                25.86 ns/op
BenchmarkFromAddToLogContext
BenchmarkFromAddToLogContext-8           2701092               442.9 ns/op

Checklist

  • Test on local
  • Test on stage
  • run go mod tidy and go mod vendor
  • change module's version back to master

Screenshots (Optional)

Related PRs (Optional)

Impacted Endpoints (Optional)

@codecov
Copy link

codecov bot commented Aug 12, 2022

Codecov Report

Merging #185 (b9a446c) into master (dafc9bf) will decrease coverage by 0.51%.
The diff coverage is n/a.

@@            Coverage Diff             @@
##           master     #185      +/-   ##
==========================================
- Coverage   58.49%   57.97%   -0.52%     
==========================================
  Files          13       13              
  Lines         559      564       +5     
==========================================
  Hits          327      327              
- Misses        225      230       +5     
  Partials        7        7              
Impacted Files Coverage Δ
utils/kafka/config.go 0.00% <0.00%> (ø)

📣 We’re building smart automated test selection to slash your CI/CD build times. Learn more

}(i)
}
wgRead.Wait()
wgWrite.Wait()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it seems we could use single wait group here, how about using the same wg?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

okay! changed.

@jalacardio jalacardio marked this pull request as ready for review September 15, 2022 03:12
@jalacardio jalacardio requested a review from a team as a code owner September 15, 2022 03:12
@jalacardio jalacardio changed the title [RE-5863] add test cases for fields [RE-5863] Resolve potential race condition in log fields Sep 15, 2022
utils/log/loggers/fields.go Show resolved Hide resolved
utils/log/loggers/fields.go Outdated Show resolved Hide resolved
utils/log/loggers/fields.go Outdated Show resolved Hide resolved
@achichen achichen self-assigned this Oct 11, 2022
plf is already a pointer thus no need to refetch from ctx and cast again
@ankurs
Copy link
Contributor

ankurs commented Oct 11, 2022

shouldn't you just change logfields to sync.Map instead ?

something along the lines of - https://github.com/go-coldbrew/log/pull/5/files

@jalacardio
Copy link
Author

That is in fact a more concise way to dealing with it, thanks for the info and will look into it.

}
}

func BenchmarkFromAddToLogContext(b *testing.B) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

function name should be BenchmarkAddToLogContext

@@ -12,6 +13,10 @@ var (

//LogFields contains all fields that have to be added to logs
type LogFields map[string]interface{}
type protectedLogFields struct {
content LogFields
mtx sync.RWMutex
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As @ankurs suggested, sync.Map is indeed a better option.

According to https://pkg.go.dev/sync#Map, it optimizes for our use case:

when the entry for a given key is only ever written once but read many times, as in caches that only grow

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants