Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve handling of storage provider faults #27

Open
michaelschnyder opened this issue Mar 27, 2017 · 0 comments
Open

Improve handling of storage provider faults #27

michaelschnyder opened this issue Mar 27, 2017 · 0 comments

Comments

@michaelschnyder
Copy link
Member

This issue is based on a series of issues related to jobbr < v.1.0. We had several issues with a non reliable MSSQL Server which caused a couple of application terminations as shown below. I'm adding this here and not in mssql since the responsibility of fault-tolerance should be in the server and not in every provider.

Expected behavior is to not crash if possible or at least let other jobs complete and start new jobs if the storage provider is available again.

There were different origins for accessing the database, which should be re-evaluated against the current implementation

When scheduling new jobs

Application: JobServer.Service.exe
Framework Version: v4.0.30319
Description: The process was terminated due to an unhandled exception.
Exception Info: System.Data.SqlClient.SqlException
Stack:
   at System.Data.SqlClient.SqlCommand.RunExecuteReaderTds(System.Data.CommandBehavior, System.Data.SqlClient.RunBehavior, Boolean, Boolean, Int32, System.Threading.Tasks.Task ByRef, Boolean, System.Data.SqlClient.SqlDataReader)
   at System.Data.SqlClient.SqlCommand.RunExecuteReader(System.Data.CommandBehavior, System.Data.SqlClient.RunBehavior, Boolean, System.String, System.Threading.Tasks.TaskCompletionSource`1<System.Object>, Int32, System.Threading.Tasks.Task ByRef, Boolean)
   at System.Data.SqlClient.SqlCommand.RunExecuteReader(System.Data.CommandBehavior, System.Data.SqlClient.RunBehavior, Boolean, System.String)
   at System.Data.SqlClient.SqlCommand.ExecuteReader(System.Data.CommandBehavior, System.String)
   at System.Data.SqlClient.SqlCommand.ExecuteDbDataReader(System.Data.CommandBehavior)
   at System.Data.Common.DbCommand.System.Data.IDbCommand.ExecuteReader(System.Data.CommandBehavior)
   at Dapper.SqlMapper+<QueryImpl>d__61`1[[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]].MoveNext()
   at System.Collections.Generic.List`1[[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]]..ctor(System.Collections.Generic.IEnumerable`1<System.__Canon>)
   at System.Linq.Enumerable.ToList[[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]](System.Collections.Generic.IEnumerable`1<System.__Canon>)
   at Dapper.SqlMapper.Query[[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]](System.Data.IDbConnection, System.String, System.Object, System.Data.IDbTransaction, Boolean, System.Nullable`1<Int32>, System.Nullable`1<System.Data.CommandType>)
   at Jobbr.Server.Dapper.DapperStorageProvider.GetFutureJobRunsByTriggerId(Int64)
   at Jobbr.Server.Core.JobService.GetNextJobRunByTriggerId(Int64)
   at Jobbr.Server.Core.DefaultScheduler.CreateSchedule(Jobbr.Server.Model.JobTriggerBase)
   at Jobbr.Server.Core.DefaultScheduler.ScheduleJobRuns(System.Object)
   at System.Threading.TimerQueueTimer.CallCallbackInContext(System.Object)
   at System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
   at System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
   at System.Threading.TimerQueueTimer.CallCallback()
   at System.Threading.TimerQueueTimer.Fire()
   at System.Threading.TimerQueue.FireNextTimers()
   at System.Threading.TimerQueue.AppDomainTimerCallback()

StartReadyJobsFromQueue

It seems there are two different ways to fail from here:

First (2x, / 3 Months)

Application: Fis.JobServer.Service.exe
Framework Version: v4.0.30319
Description: The process was terminated due to an unhandled exception.
Exception Info: System.Data.DataException
Stack:
   at Dapper.SqlMapper.ThrowDataException(System.Exception, Int32, System.Data.IDataReader, System.Object)
   at DynamicClass.Deserializefb7261cb-e3af-4055-bd56-687d3f6f86a8(System.Data.IDataReader)
   at Dapper.SqlMapper+<QueryImpl>d__61`1[[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]].MoveNext()
   at System.Collections.Generic.List`1[[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]]..ctor(System.Collections.Generic.IEnumerable`1<System.__Canon>)
   at System.Linq.Enumerable.ToList[[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]](System.Collections.Generic.IEnumerable`1<System.__Canon>)
   at Dapper.SqlMapper.Query[[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]](System.Data.IDbConnection, System.String, System.Object, System.Data.IDbTransaction, Boolean, System.Nullable`1<Int32>, System.Nullable`1<System.Data.CommandType>)
   at Jobbr.Server.Dapper.DapperStorageProvider.GetJobRuns()
   at Jobbr.Server.Dapper.DapperStorageProvider.Update(Jobbr.Server.Model.JobRun)
   at Jobbr.Server.Core.JobService.UpdateJobRunState(Jobbr.Server.Model.JobRun, Jobbr.Common.Model.JobRunState)
   at Jobbr.Server.Core.DefaultJobExecutor.StartReadyJobsFromQueue(System.Object)
   at System.Threading.TimerQueueTimer.CallCallbackInContext(System.Object)
   at System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
   at System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
   at System.Threading.TimerQueueTimer.CallCallback()
   at System.Threading.TimerQueueTimer.Fire()
   at System.Threading.TimerQueue.FireNextTimers()
   at System.Threading.TimerQueue.AppDomainTimerCallback()

Second (also 2x / 3 Months)

Application: Fis.JobServer.Service.exe
Framework Version: v4.0.30319
Description: The process was terminated due to an unhandled exception.
Exception Info: System.Data.SqlClient.SqlException
Stack:
   at System.Data.ProviderBase.DbConnectionInternal.TryOpenConnectionInternal(System.Data.Common.DbConnection, System.Data.ProviderBase.DbConnectionFactory, System.Threading.Tasks.TaskCompletionSource`1<System.Data.ProviderBase.DbConnectionInternal>, System.Data.Common.DbConnectionOptions)
   at System.Data.ProviderBase.DbConnectionClosed.TryOpenConnection(System.Data.Common.DbConnection, System.Data.ProviderBase.DbConnectionFactory, System.Threading.Tasks.TaskCompletionSource`1<System.Data.ProviderBase.DbConnectionInternal>, System.Data.Common.DbConnectionOptions)
   at System.Data.SqlClient.SqlConnection.TryOpenInner(System.Threading.Tasks.TaskCompletionSource`1<System.Data.ProviderBase.DbConnectionInternal>)
   at System.Data.SqlClient.SqlConnection.TryOpen(System.Threading.Tasks.TaskCompletionSource`1<System.Data.ProviderBase.DbConnectionInternal>)
   at System.Data.SqlClient.SqlConnection.Open()
   at Dapper.SqlMapper+<QueryImpl>d__61`1[[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]].MoveNext()
   at System.Collections.Generic.List`1[[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]]..ctor(System.Collections.Generic.IEnumerable`1<System.__Canon>)
   at System.Linq.Enumerable.ToList[[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]](System.Collections.Generic.IEnumerable`1<System.__Canon>)
   at Dapper.SqlMapper.Query[[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]](System.Data.IDbConnection, System.String, System.Object, System.Data.IDbTransaction, Boolean, System.Nullable`1<Int32>, System.Nullable`1<System.Data.CommandType>)
   at Jobbr.Server.Dapper.DapperStorageProvider.GetJobRuns()
   at Jobbr.Server.Dapper.DapperStorageProvider.Update(Jobbr.Server.Model.JobRun)
   at Jobbr.Server.Core.JobService.UpdateJobRunState(Jobbr.Server.Model.JobRun, Jobbr.Common.Model.JobRunState)
   at Jobbr.Server.Core.DefaultJobExecutor.StartReadyJobsFromQueue(System.Object)
   at System.Threading.TimerQueueTimer.CallCallbackInContext(System.Object)
   at System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
   at System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
   at System.Threading.TimerQueueTimer.CallCallback()
   at System.Threading.TimerQueueTimer.Fire()
   at System.Threading.TimerQueue.FireNextTimers()
   at System.Threading.TimerQueue.AppDomainTimerCallback()

Progress Updates when DB is down

Starts always in Jobbr.Server.Core.JobRunContext.ProcOnOutputDataReceived and fails withing the storage layer.

Application: Fis.JobServer.Service.exe
Framework Version: v4.0.30319
Description: The process was terminated due to an unhandled exception.
Exception Info: System.Data.DataException
Stack:
   at Dapper.SqlMapper.ThrowDataException(System.Exception, Int32, System.Data.IDataReader, System.Object)
   at DynamicClass.Deserializeb7edca92-8010-45c0-86e5-f388654751b8(System.Data.IDataReader)
   at Dapper.SqlMapper+<QueryImpl>d__61`1[[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]].MoveNext()
   at System.Collections.Generic.List`1[[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]]..ctor(System.Collections.Generic.IEnumerable`1<System.__Canon>)
   at System.Linq.Enumerable.ToList[[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]](System.Collections.Generic.IEnumerable`1<System.__Canon>)
   at Dapper.SqlMapper.Query[[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]](System.Data.IDbConnection, System.String, System.Object, System.Data.IDbTransaction, Boolean, System.Nullable`1<Int32>, System.Nullable`1<System.Data.CommandType>)
   at Jobbr.Server.Dapper.DapperStorageProvider.GetJobRuns()
   at Jobbr.Server.Dapper.DapperStorageProvider.Update(Jobbr.Server.Model.JobRun)
   at Jobbr.Server.Core.JobService.UpdateJobRunProgress(Jobbr.Server.Model.JobRun, Double)
   at Jobbr.Server.Core.JobRunContext.HandleMessage(Jobbr.Server.ServiceMessaging.ProgressServiceMessage)
   at DynamicClass.CallSite.Target(System.Runtime.CompilerServices.Closure, System.Runtime.CompilerServices.CallSite, Jobbr.Server.Core.JobRunContext, System.Object)
   at Jobbr.Server.Core.JobRunContext.ProcOnOutputDataReceived(System.Object, System.Diagnostics.DataReceivedEventArgs)
   at System.Diagnostics.Process.OutputReadNotifyUser(System.String)
   at System.Diagnostics.AsyncStreamReader.FlushMessageQueue()
   at System.Diagnostics.AsyncStreamReader.GetLinesFromStringBuilder()
   at System.Diagnostics.AsyncStreamReader.ReadBuffer(System.IAsyncResult)
   at System.IO.Stream+ReadWriteTask.InvokeAsyncCallback(System.Object)
   at System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
   at System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
   at System.IO.Stream+ReadWriteTask.System.Threading.Tasks.ITaskCompletionAction.Invoke(System.Threading.Tasks.Task)
   at System.Threading.Tasks.Task.FinishContinuations()
   at System.Threading.Tasks.Task.FinishStageThree()
   at System.Threading.Tasks.Task.FinishStageTwo()
   at System.Threading.Tasks.Task.Finish(Boolean)
   at System.Threading.Tasks.Task.ExecuteWithThreadLocal(System.Threading.Tasks.Task ByRef)
   at System.Threading.Tasks.Task.ExecuteEntry(Boolean)
   at System.Threading.Tasks.Task.System.Threading.IThreadPoolWorkItem.ExecuteWorkItem()
   at System.Threading.ThreadPoolWorkQueue.Dispatch()
   at System.Threading._ThreadPoolWaitCallback.PerformWaitCallback()

and

Application: Fis.JobServer.Service.exe
Framework Version: v4.0.30319
Description: The process was terminated due to an unhandled exception.
Exception Info: System.Data.SqlClient.SqlException
Stack:
   at System.Data.ProviderBase.DbConnectionInternal.TryOpenConnectionInternal(System.Data.Common.DbConnection, System.Data.ProviderBase.DbConnectionFactory, System.Threading.Tasks.TaskCompletionSource`1<System.Data.ProviderBase.DbConnectionInternal>, System.Data.Common.DbConnectionOptions)
   at System.Data.ProviderBase.DbConnectionClosed.TryOpenConnection(System.Data.Common.DbConnection, System.Data.ProviderBase.DbConnectionFactory, System.Threading.Tasks.TaskCompletionSource`1<System.Data.ProviderBase.DbConnectionInternal>, System.Data.Common.DbConnectionOptions)
   at System.Data.SqlClient.SqlConnection.TryOpenInner(System.Threading.Tasks.TaskCompletionSource`1<System.Data.ProviderBase.DbConnectionInternal>)
   at System.Data.SqlClient.SqlConnection.TryOpen(System.Threading.Tasks.TaskCompletionSource`1<System.Data.ProviderBase.DbConnectionInternal>)
   at System.Data.SqlClient.SqlConnection.Open()
   at Dapper.SqlMapper+<QueryImpl>d__61`1[[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]].MoveNext()
   at System.Collections.Generic.List`1[[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]]..ctor(System.Collections.Generic.IEnumerable`1<System.__Canon>)
   at System.Linq.Enumerable.ToList[[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]](System.Collections.Generic.IEnumerable`1<System.__Canon>)
   at Dapper.SqlMapper.Query[[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]](System.Data.IDbConnection, System.String, System.Object, System.Data.IDbTransaction, Boolean, System.Nullable`1<Int32>, System.Nullable`1<System.Data.CommandType>)
   at Jobbr.Server.Dapper.DapperStorageProvider.GetJobRunById(Int64)
   at Jobbr.Server.Core.JobService.UpdateJobRunProgress(Jobbr.Server.Model.JobRun, Double)
   at Jobbr.Server.Core.JobRunContext.HandleMessage(Jobbr.Server.ServiceMessaging.ProgressServiceMessage)
   at DynamicClass.CallSite.Target(System.Runtime.CompilerServices.Closure, System.Runtime.CompilerServices.CallSite, Jobbr.Server.Core.JobRunContext, System.Object)
   at Jobbr.Server.Core.JobRunContext.ProcOnOutputDataReceived(System.Object, System.Diagnostics.DataReceivedEventArgs)
   at System.Diagnostics.Process.OutputReadNotifyUser(System.String)
   at System.Diagnostics.AsyncStreamReader.FlushMessageQueue()
   at System.Diagnostics.AsyncStreamReader.GetLinesFromStringBuilder()
   at System.Diagnostics.AsyncStreamReader.ReadBuffer(System.IAsyncResult)
   at System.IO.Stream+ReadWriteTask.InvokeAsyncCallback(System.Object)
   at System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
   at System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
   at System.IO.Stream+ReadWriteTask.System.Threading.Tasks.ITaskCompletionAction.Invoke(System.Threading.Tasks.Task)
   at System.Threading.Tasks.Task.FinishContinuations()
   at System.Threading.Tasks.Task.FinishStageThree()
   at System.Threading.Tasks.Task.FinishStageTwo()
   at System.Threading.Tasks.Task.Finish(Boolean)
   at System.Threading.Tasks.Task.ExecuteWithThreadLocal(System.Threading.Tasks.Task ByRef)
   at System.Threading.Tasks.Task.ExecuteEntry(Boolean)
   at System.Threading.Tasks.Task.System.Threading.IThreadPoolWorkItem.ExecuteWorkItem()
   at System.Threading.ThreadPoolWorkQueue.Dispatch()
   at System.Threading._ThreadPoolWaitCallback.PerformWaitCallback()

and

Application: Fis.JobServer.Service.exe
Framework Version: v4.0.30319
Description: The process was terminated due to an unhandled exception.
Exception Info: System.Data.SqlClient.SqlException
Stack:
   at System.Data.SqlClient.SqlCommand.RunExecuteReaderTds(System.Data.CommandBehavior, System.Data.SqlClient.RunBehavior, Boolean, Boolean, Int32, System.Threading.Tasks.Task ByRef, Boolean, System.Data.SqlClient.SqlDataReader)
   at System.Data.SqlClient.SqlCommand.RunExecuteReader(System.Data.CommandBehavior, System.Data.SqlClient.RunBehavior, Boolean, System.String, System.Threading.Tasks.TaskCompletionSource`1<System.Object>, Int32, System.Threading.Tasks.Task ByRef, Boolean)
   at System.Data.SqlClient.SqlCommand.RunExecuteReader(System.Data.CommandBehavior, System.Data.SqlClient.RunBehavior, Boolean, System.String)
   at System.Data.SqlClient.SqlCommand.ExecuteReader(System.Data.CommandBehavior, System.String)
   at System.Data.SqlClient.SqlCommand.ExecuteDbDataReader(System.Data.CommandBehavior)
   at System.Data.Common.DbCommand.System.Data.IDbCommand.ExecuteReader(System.Data.CommandBehavior)
   at Dapper.SqlMapper+<QueryImpl>d__61`1[[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]].MoveNext()
   at System.Collections.Generic.List`1[[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]]..ctor(System.Collections.Generic.IEnumerable`1<System.__Canon>)
   at System.Linq.Enumerable.ToList[[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]](System.Collections.Generic.IEnumerable`1<System.__Canon>)
   at Dapper.SqlMapper.Query[[System.__Canon, mscorlib, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089]](System.Data.IDbConnection, System.String, System.Object, System.Data.IDbTransaction, Boolean, System.Nullable`1<Int32>, System.Nullable`1<System.Data.CommandType>)
   at Jobbr.Server.Dapper.DapperStorageProvider.GetJobRunById(Int64)
   at Jobbr.Server.Core.JobService.UpdateJobRunProgress(Jobbr.Server.Model.JobRun, Double)
   at Jobbr.Server.Core.JobRunContext.HandleMessage(Jobbr.Server.ServiceMessaging.ProgressServiceMessage)
   at DynamicClass.CallSite.Target(System.Runtime.CompilerServices.Closure, System.Runtime.CompilerServices.CallSite, Jobbr.Server.Core.JobRunContext, System.Object)
   at Jobbr.Server.Core.JobRunContext.ProcOnOutputDataReceived(System.Object, System.Diagnostics.DataReceivedEventArgs)
   at System.Diagnostics.Process.OutputReadNotifyUser(System.String)
   at System.Diagnostics.AsyncStreamReader.FlushMessageQueue()
   at System.Diagnostics.AsyncStreamReader.GetLinesFromStringBuilder()
   at System.Diagnostics.AsyncStreamReader.ReadBuffer(System.IAsyncResult)
   at System.IO.Stream+ReadWriteTask.InvokeAsyncCallback(System.Object)
   at System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
   at System.Threading.ExecutionContext.Run(System.Threading.ExecutionContext, System.Threading.ContextCallback, System.Object, Boolean)
   at System.IO.Stream+ReadWriteTask.System.Threading.Tasks.ITaskCompletionAction.Invoke(System.Threading.Tasks.Task)
   at System.Threading.Tasks.Task.FinishContinuations()
   at System.Threading.Tasks.Task.FinishStageThree()
   at System.Threading.Tasks.Task.FinishStageTwo()
   at System.Threading.Tasks.Task.Finish(Boolean)
   at System.Threading.Tasks.Task.ExecuteWithThreadLocal(System.Threading.Tasks.Task ByRef)
   at System.Threading.Tasks.Task.ExecuteEntry(Boolean)
   at System.Threading.Tasks.Task.System.Threading.IThreadPoolWorkItem.ExecuteWorkItem()
   at System.Threading.ThreadPoolWorkQueue.Dispatch()
   at System.Threading._ThreadPoolWaitCallback.PerformWaitCallback()
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants