-
Notifications
You must be signed in to change notification settings - Fork 12
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Daemon process #18
Daemon process #18
Conversation
adopt grpc define protocol update deps
add nicpool event loop refactor provider
add log messages
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
除了上面几个小问题,整体上没大问题,但要注意两点:
- 通过 pool.Add, pool.Done 进行异步操作的地方较多,如果这块出现 bug 可能导致死锁。
- NicPool 的逻辑需要想办法通过单元测试验证下,
pkg/server/nicpool.go
Outdated
|
||
resoureStub *qingcloud.QCNicProvider | ||
|
||
sync.WaitGroup |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这里 pool 自己本身既是 sync.RWMutex 也是 sync.WaitGroup 看后面的代码的时候感觉会比较奇怪,不如作为独立的字段容易理解。此处仅仅建议。
pkg/server/nicpool.go
Outdated
go func() { | ||
defer pool.Done() | ||
//check if nic is in dict | ||
pool.RLock() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个 lock 主要是为了保护 pool.nicDict, 我觉得要么扩展一个 sync map 出来,要么用 func 把从 map 中获取数据的操作封装起来,否则这种 lock,unlock 在很多地方出现,容易出错。
cmd/hostnic/hostnic.go
Outdated
return err | ||
} | ||
nic, err := nicProvider.CreateNic() | ||
conn, err := grpc.Dial(conf.BindAddr,grpc.WithInsecure()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个 BindAddr 要有个默认值,如果没配置就用默认值。
validate vxnet during bootstrap
validate
validate vxnet
validate vxnet
get all items in gateway map
为了避免死锁网卡信息和网卡缓冲池是分开维护的。这里可能有不同步所以加了validationfunc确保拿到的网卡在线。加waitgroup是因为需要在关闭的时候删除返还的网卡。只有在关闭的时候才会调用wait。 |
pkg/server/nicpool.go
Outdated
cachedlist = append(cachedlist, pkg.StringPtr(nic)) | ||
log.Debugf("Got nic %s in nic pool", nic) | ||
} | ||
err := pool.nicProvider.ReclaimNic(cachedlist) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
释放空闲 nic 的操作可以提供个接口,由系统调用。不要在 shutdown pool 的时候进行操作,因为关闭 hostnic daemon 的时候是在 IaaS 操作的一个 job 中,在 job 中发起新的 job 会导致死锁。
pkg/server/nicpool.go
Outdated
pool := &NicPool{ | ||
nicDict: cmap.New(), | ||
nicpool: make(chan string, size), | ||
nicReadyPool: make(chan string, ReadyPoolSize), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
感觉 nicReadyPool size 可以直接设置成系统支持的最大 nic 数,如果 readypool 放不进去,就说明到达上限了,直接删除就行。然后有个后台的 go routine 定时清理 readypool 到一个合理的值,比如 ReadyPoolSize。 感觉这样逻辑也更清楚些,对突发的大量容器反复创建销毁的支持也更友好些。你考虑下。
add clean/stop command add prometheus endpoint
add clean/stop command add prometheus endpoint
add unit test cases
fix unit test cases
update readme
add test case for clearcache
daemon process #9
-- fill nic pool with nic id.
-- maintain nic info table
-- delete cached nic when exit.
-- manage and allocate gateway nic
-- add version number generator script
-- validate vxnet before launch server #16
-- update README
-- refactor nic pool
-- add prometheus monitoring
-- add clean cache command