Skip to content
This repository has been archived by the owner on Jul 23, 2020. It is now read-only.

OpenShift get watch is returning bogus datas #3240

Closed
chmouel opened this issue Apr 23, 2018 · 13 comments
Closed

OpenShift get watch is returning bogus datas #3240

chmouel opened this issue Apr 23, 2018 · 13 comments

Comments

@chmouel
Copy link

chmouel commented Apr 23, 2018

I am trying to debug an idler bug and have encountered a bug on prod-preview, when using oc get watch we seem to get a lot of build events that have already been captured which seems to loop for ever or at least for a long time.

This buildname is called "app-test-1-sadsadsa-1" which seems to be located on 2a for user ssamal-preview

without using idler and just via the CLI i am capturing the watch output :

oc get build -w --all-namespaces -o json > /tmp/a.json

i then wait for a while and I can see the same build object is showing up again and again :

% jq 'select(.metadata.name == "app-test-1-sadsadsa-1") .metadata.ownerReferences[0].uid' /tmp/a.json|wc -l
     706

This seems to go forever and i had to stop the capturing before this filled up my laptop disk space.

This is set as SEV1 because I need this fixed before I can fix this other SEV1 #2352

@kbsingh
Copy link
Collaborator

kbsingh commented Apr 23, 2018

if we can get a better description and breakdown than 'something' 'someone' 'some things' 'other things' - we can try and help triage the issue down.

@chmouel
Copy link
Author

chmouel commented Apr 23, 2018

@kbsingh I am not sure what you are not understanding here, it's all in the past snippet, I have rephrased some of it if that's what you are asking?

@pbergene
Copy link
Collaborator

What version of oc tools are you using?

@chmouel
Copy link
Author

chmouel commented Apr 23, 2018

@pbergene oc v3.6.0+c4dd4cf but that's happenning inside the loop of idler too

@chmouel
Copy link
Author

chmouel commented Apr 23, 2018

Further to explain what i mean by hapenning from inside the loop of idler, idler loop is using oc watch to know when there is a new build event, this is what I get currently from the system, it works fine at first and then loop forever on this tenant ssamal-preview :

image

The only added code is this diff :

diff --git a/internal/condition/build_condition.go b/internal/condition/build_condition.go
index 64f7b97..f4444f5 100644
--- a/internal/condition/build_condition.go
+++ b/internal/condition/build_condition.go
@@ -28,6 +28,7 @@ func (c *BuildCondition) Eval(object interface{}) (bool, error) {
 	}
 
 	if u.HasActiveBuilds() {
+		fmt.Printf("DEBUG!!!: %s, %s, %s\n", u.ActiveBuild.Status.StartTimestamp, u.Name, u.ActiveBuild.Metadata.Name)
 		return false, nil
 	}

I can't add anything in this loop to fix fabric8-services/fabric8-jenkins-idler#141 before I can get reliable data,

@chmouel
Copy link
Author

chmouel commented Apr 23, 2018

We have done some discussion with @pbergene on this,

According to the monitoring system (zabbix) there was some idling done on prod so idler is still working.

We only have some project looping forever not all of them have this issue.

Since there is multiple workers on idler only one of them in this case would get stuck but others would still be able to properly works, which is what seems to happen.

If other workers gets bugged on other bogus namespaces then we probably run in jenkins idler not functioning anymore.

@pbergene
Copy link
Collaborator

There are multiple builds that seem to be looping on 2a, but not for 2. Compiling a list on HK.

@aslakknutsen
Copy link
Collaborator

@chmoul Isn't this just the sync plugin updating the model over and over for no reason?

@chmouel
Copy link
Author

chmouel commented Apr 23, 2018

@aslakknutsen ah yes i forgot about that little bugger thanks, I am still not sure about this then :

  • how come this happen on some projects who has active project and not on other ones.

I don't see difference between the different payloads that coming thru as well.

fixing #2352 would anyway shut this up,

@chmouel
Copy link
Author

chmouel commented Apr 23, 2018

removing the sev1 since this the expected bogus behaviour, we should probably sev1 the other issue about the sync-plugin since it's not just a retarded bug but affect performance badly, pradeepto do you know if we have a uber issue for that one ?

@jfchevrette
Copy link
Contributor

I've done some troubleshooting on this. The following namespaces/builds are currently exhibiting this issue on starter-us-east-2a

dgutierr playground-1
djalma-silva-jr helloworldvertx-1
dmitresso openshiftspace-1
fabrice-pipart teknichrono-12
jszhaw osiotest-1
ldimaggi-osiotest5 testapr231524499600658-1
osio-ci-ee9 testapr231524498240285-1
rhn-support-jlee helloworld-vertx-1
ssamal-preview app-test-1-sadsadsa-1
ssamal-preview launcher-test-sunil.app-test-1rtuio1-1

Upon watching the builds with -ojson or -oyaml, I noticed that only the resourceVersion field kept changing. This field is updated whenever openshift updates an object. The number in the above builds is incrementing at at pace of ~60 per second.

I then ran a watch in a terminal while I killed the jenkins instance from another and it stopped. The build object stopped being updated and thus being reported to my watch command.

So it would appear that something in jenkins is sending tons of openshift API calls per second to update the builds, for some reason. I believe that finding and fixing what is doing this in Jenkins will help solve this issue.

@jfchevrette
Copy link
Contributor

I also see this on the following builds running in starter-us-east-2

aazores app-test-1-4
asalles test-1
dlabrecq osio-prod-app2a-1
ibuziuk trest-vert-1
ldimaggi app-test-1-1
lxia max63chars-0123456789012345678901234567890123456789012345end-1
pgarg-1 test-quickstart-1
sunil-thaha wildfly-test-health-check-6
vemishra veethika-m-1

@chmouel
Copy link
Author

chmouel commented Apr 24, 2018

as identified this is caused by the sync plugin generating that bogus behaviour, i have create a new issue here to track this down and closing this. #3266

@chmouel chmouel closed this as completed Apr 24, 2018
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants