Skip to content
This repository has been archived by the owner on Jan 20, 2023. It is now read-only.

system journal fills with errors: _VmDirConsumePartner failed, error code (51) #39

Open
1 of 3 tasks
oddboy opened this issue Dec 31, 2018 · 5 comments
Open
1 of 3 tasks

Comments

@oddboy
Copy link

oddboy commented Dec 31, 2018

Describe the bug
Once lightwave is installed on two domain controllers, the logs fill with errors like:

Dec 31 21:53:49 dc01 vmdird[1532]: t@139860643936000: _VmDirConsumePartner failed, error code (51) Dec 31 21:54:10 dc01 vmdird[1532]: t@139860643936000: _VmDirConsumePartner failed, error code (51) Dec 31 21:54:34 dc01 vmdird[1532]: t@139860643936000: _VmDirConsumePartner failed, error code (51) Dec 31 21:54:36 dc01 vmdird[1532]: t@139860643936000: _VmDirConsumePartner failed, error code (51) Dec 31 21:55:38 dc01 vmdird[1532]: t@139860643936000: _VmDirConsumePartner failed, error code (51)

From looking at the source code, it appears that error code 51 is "LDAP SERVER BUSY". I have found some references to this pertaining to OpenLDAP and Microsoft's AD LDAP implementation. However, neither domain controller is overly busy, CPU utilization less than 5% generally with 0.2 load average.

Impact

  • Low - Not really in your way but annoyance
  • Medium - Can be worked around
  • High - Blocker

Expected behavior
Unclear. I can't tell if there is a problem or not. Changes to the directory (like create a user) seem to work correctly and get replicated to peers.

Observed behavior
The error seems to imply that there is a communication problem between domain controller nodes. The by product of this is that the logs are filled with these messages about every 40 seconds or so. There are times during which this message is not logged, but those often occur when the systems (in the "lab") are not being used for much of anything, so I'm not sure if these log lines are indication of an actual problem.

To Reproduce
Steps to reproduce the behavior:

  1. Install and configure lightwave on first node.
  2. Replace default certificates with enterprise signed CA
  3. Add a secondary node to domain
  4. Check logs, see error

Environment:

  • OS: Photon 2.0, PHOTON_BUILD_NUMBER=0922243
  • Lightwave Version: 1.3.1.7-1.ph2
  • Likewise version: 6.2.11.4-4.ph2

Additional context
As noted above, the only deviation from a very default install in my case was that I replaced the default certificate on the first node with one signed by our enterprise CA. Note however that the same problem occurs even with a fully default configuration, that is, with no customizations at all.

@oddboy
Copy link
Author

oddboy commented Jan 4, 2019

additional logging seems to reveal that this error is caused by a failure of CLDAP queries, aka "LDAP Pings". For reference, https://ldapwiki.com/wiki/LDAP%20ping.

Jan 04 04:34:24 dc01 vmdird[1168]: t@140192521856768: VmDirSendLdapResponse: code (97), Error (14)() Jan 04 04:34:24 dc01 vmdird[1168]: t@140192521856768: VmDirSendLdapResponse: code (97), Error (14)() Jan 04 04:34:24 dc01 vmdird[1168]: t@140192521856768: Request Control: 1.3.6.1.4.1.4203.1.9.1.1 Jan 04 04:34:25 dc01 vmdird[1168]: t@140193663264512: _VmDirConsumePartner failed, error code (51) Jan 04 04:34:25 dc01 vmdird[1168]: t@140192521856768: VmDirSendLdapResult: Request (Search), Error (0), Message (), (0) socket (192.168.3.118)

the OID in the above log extract is the one OpenLDAP uses for these CLDAP messages. From what I understand, these are meant to be done over UDP.

However, it appears that on my lightwave installation, nothing is listening on UDP/389.

So, it appears as if any functions that depend on the status of these CLDAP pings fail because nothing is listending on UDP/389.

is there a way to get vmdird to listen on UDP?

@oddboy
Copy link
Author

oddboy commented Jan 4, 2019

a packet capture that shows a SRV DNS querie for _ldap._tcp.dc._msdcs., followed by a series of A and AAAA lookups for the listed DCs, then a connection attempt from the client trying to join to both of the DCs on UDP/389, followed by ICMP unreachable messages indicating that there's nothing accepting connections on the DC's on UDP/389.

screen shot here

so, I guess it's a likewise problem rather than a lightwave / vmdir issue?

not sure where to go from here...

@sruo
Copy link
Contributor

sruo commented Jan 4, 2019

Describe the bug
Once lightwave is installed on two domain controllers, the logs fill with errors like:

Dec 31 21:53:49 dc01 vmdird[1532]: t@139860643936000: _VmDirConsumePartner failed, error code (51) Dec 31 21:54:10 dc01 vmdird[1532]: t@139860643936000: _VmDirConsumePartner failed, error code (51) Dec 31 21:54:34 dc01 vmdird[1532]: t@139860643936000: _VmDirConsumePartner failed, error code (51) Dec 31 21:54:36 dc01 vmdird[1532]: t@139860643936000: _VmDirConsumePartner failed, error code (51) Dec 31 21:55:38 dc01 vmdird[1532]: t@139860643936000: _VmDirConsumePartner failed, error code (51)

Hi Oddboy:
Above error is from consumer is benign. In this version, when a node is pulling changes from a partner, it will send LDAP_BUSY(51) when receiving a replication search request.

If you upgrade to latest Lightwave version, this problem no longer exits.
Let me know if you have further questions.

Thanks.

@sruo
Copy link
Contributor

sruo commented Jan 4, 2019

"

So, it appears as if any functions that depend on the status of these CLDAP pings fail because nothing is listending on UDP/389.

is there a way to get vmdird to listen on UDP?
"
Current version of vmdir does not listen on UPD port.
Which feature/function in your use case requires CLDAP (UDP) ping?

@oddboy
Copy link
Author

oddboy commented Jan 14, 2019

I can see that the commands lw-get-dc-name and domainjoin-cli appear to use the CLDAP queries as these fail with error:

Failed to query time on domain controller. Error code 2453 (NERR_DCNotFound).

I'm not sure what other effect this might be having

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants