Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to handle context cancellations for TCP protocol #1389

Merged
merged 13 commits into from
Sep 23, 2024

Conversation

tinybit
Copy link
Collaborator

@tinybit tinybit commented Aug 27, 2024

Summary

Issue:

Checklist

Delete items not relevant to your PR:

  • Unit and integration tests covering the common scenarios were added
  • A human-readable description of the changes was provided to include in CHANGELOG
  • For significant changes, documentation in https://github.com/ClickHouse/clickhouse-docs was updated with further explanations or tutorials

@tinybit tinybit added the bug label Aug 27, 2024
@tinybit tinybit self-assigned this Aug 27, 2024
@jkaflik jkaflik self-requested a review August 27, 2024 13:24
Copy link
Contributor

@jkaflik jkaflik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I will do a small testing on my side and then we can merge.

conn.go Outdated
Comment on lines 240 to 250
if c.closed {
err := errors.New("attempted sending on closed connection")
c.debugf("[send data] err: %v", err)
return err
}

if c.buffer == nil {
err := errors.New("attempted sending on nil buffer")
c.debugf("[send data] err: %v", err)
return err
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it something you found during debugging or presumably want to safe?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, this is outdated code, please check the latest changes
https://github.com/ClickHouse/clickhouse-go/pull/1389/files

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, safety is not presumed, it's necessary here.
previously, in conn_process.go access to c.close and c.buffer was single-threaded.
now i'm reading data in background goroutine and in foreground thread i'm waiting on select:

// do reads in background
errCh := make(chan error, 1)
doneCh := make(chan bool, 1)

go func() {
	err := c.processImpl(ctx, on) // this accesses c.reader internally (reads data)
	if err != nil {
		errCh <- err
		return
	}

	doneCh <- true
}()

// select on context or read channel (errors)
select {
case <-ctx.Done():
	c.cancel() // calls c.close(), it accesses c.reader internally (sets c.reader to nil)
	return ctx.Err()

case err := <-errCh:
	return err

case <-doneCh:
	return nil
}
  1. Closing connections must be done very carefully. Currently conn.Close() is broken because multiple calls to conn.Close() have issues deadlocking the thread it was called from

  2. Also multiple calls to c.close() from multiple threads is not thread safe, doing so will result in data races.

So, this needed to be fixed from this:

func (c *connect) close() error {
    if c.closed {
        return nil
    }
    c.closed = true
<...>

To a thread-safe version of it

func (c *connect) close() error {
	c.closeMutex.Lock()
	if c.closed {
		c.closeMutex.Unlock()
		return nil
	}
	c.closed = true
	c.closeMutex.Unlock()
<...>

This guarantees that under no circumstances c.close() can be called twice no matter what.

  1. Besides this we need to guarantee threadsafe access to c.reader from c.processImpl(), c.cancel() and c.close(), hence I've added c.readerMutex()

  2. One more thing: the order of operations during closing a connection:
    Old version:

func (c *connect) close() error {
	c.closeMutex.Lock()
	if c.closed {
		c.closeMutex.Unlock()
		return nil
	}
	c.closed = true
	c.closeMutex.Unlock()

	c.buffer = nil

	c.readerMutex.Lock()
	c.reader = nil
	c.readerMutex.Unlock()

	if err := c.conn.Close(); err != nil {
		return err
	}
	
	return nil
}

As you can see here, we're changing c.buffer and c.reader that might be currently reading/sending data, before we close the network connection. Connection must be closed first, in order to stop any i/o operations on c.buffer and c.reader (and related contexts), then it's safe to deallocale c.buffer and c.reader.

func (c *connect) close() error {
	c.closeMutex.Lock()
	if c.closed {
		c.closeMutex.Unlock()
		return nil
	}
	c.closed = true
	c.closeMutex.Unlock()

	if err := c.conn.Close(); err != nil {
		return err
	}

	c.buffer = nil

	c.readerMutex.Lock()
	c.reader = nil
	c.readerMutex.Unlock()

	
	return nil
}

conn.go Outdated
@@ -131,6 +139,8 @@ type connect struct {
readTimeout time.Duration
blockBufferSize uint8
maxCompressionBuffer int
mutex sync.Mutex
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you think it makes sense to name it explicit?

Suggested change
mutex sync.Mutex
readerMutex sync.Mutex

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, good idea

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

conn.go Outdated
Comment on lines 258 to 268
if c.isClosed() {
err := errors.New("attempted sending on closed connection")
c.debugf("[send data] err: %v", err)
return err
}

if c.buffer == nil {
err := errors.New("attempted sending on nil buffer")
c.debugf("[send data] err: %v", err)
return err
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it something you found when while debugging or only added for safety?

Copy link
Collaborator Author

@tinybit tinybit Aug 30, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm, this is outdated code, please check the latest changes
https://github.com/ClickHouse/clickhouse-go/pull/1389/files

it was all found during debugging, yes.

note: if c.buffer == nil { is already removed in latest commit, it was incorrect assumption on my side, trying to solve synchronisation issues.

if c.isClosed() { check is necessary. We can't perform read/send operations on closed connections

@jkaflik jkaflik changed the title Add ability to handle context cancellations Add ability to handle context cancellations for TCP protocol Aug 30, 2024
@jkaflik
Copy link
Contributor

jkaflik commented Aug 30, 2024

I conducted basic tests. This will be merged with next minor release in two weeks.

@jkaflik jkaflik merged commit 014acb5 into main Sep 23, 2024
14 checks passed
codeboten referenced this pull request in open-telemetry/opentelemetry-collector-contrib Sep 24, 2024
….29.0 (#35397)

This PR contains the following updates:

| Package | Change | Age | Adoption | Passing | Confidence |
|---|---|---|---|---|---|
|
[github.com/ClickHouse/clickhouse-go/v2](https://redirect.github.com/ClickHouse/clickhouse-go)
| `v2.28.3` -> `v2.29.0` |
[![age](https://developer.mend.io/api/mc/badges/age/go/github.com%2fClickHouse%2fclickhouse-go%2fv2/v2.29.0?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
[![adoption](https://developer.mend.io/api/mc/badges/adoption/go/github.com%2fClickHouse%2fclickhouse-go%2fv2/v2.29.0?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
[![passing](https://developer.mend.io/api/mc/badges/compatibility/go/github.com%2fClickHouse%2fclickhouse-go%2fv2/v2.28.3/v2.29.0?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|
[![confidence](https://developer.mend.io/api/mc/badges/confidence/go/github.com%2fClickHouse%2fclickhouse-go%2fv2/v2.28.3/v2.29.0?slim=true)](https://docs.renovatebot.com/merge-confidence/)
|

---

> [!WARNING]
> Some dependencies could not be looked up. Check the Dependency
Dashboard for more information.

---

### Release Notes

<details>
<summary>ClickHouse/clickhouse-go
(github.com/ClickHouse/clickhouse-go/v2)</summary>

###
[`v2.29.0`](https://redirect.github.com/ClickHouse/clickhouse-go/blob/HEAD/CHANGELOG.md#v2290-2024-09-24----Release-notes-generated-using-configuration-in-githubreleaseyml-at-main---)

[Compare
Source](https://redirect.github.com/ClickHouse/clickhouse-go/compare/v2.28.3...v2.29.0)

#### What's Changed

##### Enhancements 🎉

- Add ability to handle context cancellations for TCP protocol by
[@&#8203;tinybit](https://redirect.github.com/tinybit) in
[https://github.com/ClickHouse/clickhouse-go/pull/1389](https://redirect.github.com/ClickHouse/clickhouse-go/pull/1389)

##### Other Changes 🛠

- Add Examples for batch.Column(n).AppendRow in columnar_insert.go by
[@&#8203;achmad-dev](https://redirect.github.com/achmad-dev) in
[https://github.com/ClickHouse/clickhouse-go/pull/1410](https://redirect.github.com/ClickHouse/clickhouse-go/pull/1410)

#### New Contributors

- [@&#8203;achmad-dev](https://redirect.github.com/achmad-dev) made
their first contribution in
[https://github.com/ClickHouse/clickhouse-go/pull/1410](https://redirect.github.com/ClickHouse/clickhouse-go/pull/1410)
- [@&#8203;tinybit](https://redirect.github.com/tinybit) made their
first contribution in
[https://github.com/ClickHouse/clickhouse-go/pull/1389](https://redirect.github.com/ClickHouse/clickhouse-go/pull/1389)

**Full Changelog**:
ClickHouse/clickhouse-go@v2.28.3...v2.29.0

</details>

---

### Configuration

📅 **Schedule**: Branch creation - "on tuesday" (UTC), Automerge - At any
time (no schedule defined).

🚦 **Automerge**: Disabled by config. Please merge this manually once you
are satisfied.

♻ **Rebasing**: Whenever PR becomes conflicted, or you tick the
rebase/retry checkbox.

🔕 **Ignore**: Close this PR and you won't be reminded about this update
again.

---

- [ ] <!-- rebase-check -->If you want to rebase/retry this PR, check
this box

---

This PR was generated by [Mend Renovate](https://mend.io/renovate/).
View the [repository job
log](https://developer.mend.io/github/open-telemetry/opentelemetry-collector-contrib).

<!--renovate-debug:eyJjcmVhdGVkSW5WZXIiOiIzOC44MC4wIiwidXBkYXRlZEluVmVyIjoiMzguODAuMCIsInRhcmdldEJyYW5jaCI6Im1haW4iLCJsYWJlbHMiOlsiZGVwZW5kZW5jaWVzIiwicmVub3ZhdGVib3QiXX0=-->

---------

Co-authored-by: renovate[bot] <29139614+renovate[bot]@users.noreply.github.com>
Co-authored-by: opentelemetrybot <[email protected]>
Co-authored-by: Yang Song <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants