Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

psite-create returns success but site is not created #181

Open
jbutkus opened this issue Oct 2, 2014 · 7 comments
Open

psite-create returns success but site is not created #181

jbutkus opened this issue Oct 2, 2014 · 7 comments

Comments

@jbutkus
Copy link

jbutkus commented Oct 2, 2014

The drush pantheon-site-create command returns success error code (0), but site is not created (convergence has failed).

We are using latest version of Drush as suggested (7.0-dev).

All shell commands are expected to use exit code zero (0) when operation succeeds, and non-zero when there is a failure.

Can this command be modified to behave this way, or maybe an option --strict-posix might be added which, when used, would make this happen?

@joshkoenig
Copy link
Contributor

Is this problem replicable? The job which waits on site creation should already detect failures and return a workflow error if a convergence or other spinup task returned a failed status.

Side note: I don't believe we have any documentation telling you to use Drush 7 (it is unstable). If you can point me to that I will fix it.

@jbutkus
Copy link
Author

jbutkus commented Oct 3, 2014

There was a support case opened first. I received confirmation that it's
replicable and only then, per suggestion, submitted this issue.

Re: Drush version - we followed setup procedure as outlined initially. Is
there an official setup guide we may follow for best results?
2014 Spal. 3 11:33 "Josh Koenig" [email protected] rašė:

Is this problem replicable? The job which waits on site creation should
already detect failures and return a workflow error if a convergence or
other spinup task returned a failed status.

Side note: I don't believe we have any documentation telling you to use
Drush 7 (it is unstable). If you can point me to that I will fix it.


Reply to this email directly or view it on GitHub
pantheon-systems/terminus#181 (comment)
.

@joshkoenig
Copy link
Contributor

Can you share the process to replicate? That will assist us in closing this issue.

As of now we don't provide instructions on setting up Drush, but 7.x-dev is a development release so may create unrelated stability issues.

@jbutkus
Copy link
Author

jbutkus commented Oct 3, 2014

@joshkoenig - thank you, we are now testing Drush 6.4 and as it seems to be working we will switch to it in our production machines soon.

Right now we have worked around this issue as follows (pseudo-code):

drush pantheon-site-create $SITE_NAME \
                        --json \
                        --label=$SITE_NAME \
                        --organization=$ORGANIZATION \
                        --product=$PRODUCT

if [[ $? -ne 0 ]]
then
    echo "drush pantheon-site-create has failed"
    exit 1
fi

NO_INSTALLER=1
RETRIES=20
while [[ $RETRIES -gt 0 ]]
do
    sleep 15
    if [[ $( curl -sL $SITE_URL ) =~ "Install WordPress" ]]
    then
        NO_INSTALLER=0
        break
    fi
    RETRIES=$( expr $RETRIES - 1 )
done

if [[ $NO_INSTALLER -eq 0 ]]
then
    terminus wp db query --site=$SITE_NAME <$OUR_SQL_FILE
fi

If we skip the curl operation once in a while we would experience and error and it would be visible in Dashboard. Replicable at least once in a 100 spin-ups, on average.

@joshkoenig
Copy link
Contributor

Ok so it's not really replicable in that it can be reproduced deterministically by the same set of actions. It's a "fails 1% of the time" issue.

Can you follow up with an example of where it has failed? I can look into why this failure was not detected.

@jbutkus
Copy link
Author

jbutkus commented Oct 3, 2014

Thus so far it's best to abandon attempts to create these sites and if possible send you the log with full debug output for such a case?

@joshkoenig
Copy link
Contributor

We just need to know the UUID of a site that didn't work (you can safely post that here; UUIDs alone are harmless) so that we can look at what part of the process failed.

Terminus is meant to detect failed workflow operations and exit with a nonzero status if they occur, but it appears there is an edge case where that is not happening.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants