Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#:drain? #t does not wait until all fibers have completed. #47

Open
emixa-d opened this issue Jun 3, 2021 · 4 comments
Open

#:drain? #t does not wait until all fibers have completed. #47

emixa-d opened this issue Jun 3, 2021 · 4 comments
Labels

Comments

@emixa-d
Copy link
Collaborator

emixa-d commented Jun 3, 2021

No threads, no operations, only run-fibers with #:drain? #t and spawn-fiber.
FWIW, the bug doesn't manifest when disabling preemption (#:hz 0). Anyway, here is a test case waiting to succeed:

;;;; Copyright (C) 2021 Maxime Devos
;;;;
;;;; This library is free software; you can redistribute it and/or
;;;; modify it under the terms of the GNU Lesser General Public
;;;; License as published by the Free Software Foundation; either
;;;; version 3 of the License, or (at your option) any later version.
;;;;
;;;; This library is distributed in the hope that it will be useful,
;;;; but WITHOUT ANY WARRANTY; without even the implied warranty of
;;;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
;;;; Lesser General Public License for more details.
;;;;
;;;; You should have received a copy of the GNU Lesser General Public
;;;; License along with this library; if not, write to the Free Software
;;;; Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA

(use-modules (ice-9 control)
             (fibers)
             (srfi srfi-43)
             (srfi srfi-64)
             (rnrs bytevectors))

(define do-nothing-but-dont-tell-guile
  (eval '(lambda _ (values)) (current-module)))

(define N_ITERATIONS 20000)
;; (* 2 (current-processor-count)) on the system
;; the test was written on (and failing), in case
;; that matters.
(define N_THREAD 8)

(define (thread thread-index)
  "Do things, make garbage and return #t."
  (pk 'fiber-start thread-index)
  (do ((i 0 (+ 1 i)))
      ((>= i N_ITERATIONS))
    (do-nothing-but-dont-tell-guile thread-index i)
    ;; Curiously, if I remove the following expression,
    ;; then the bug consistently does not manifest. Some connection
    ;; to garbage management perhaps?
    ;;
    ;; Using (cons thread-index i) also manifests the bug,
    ;; but (subjectively) less frequently.
    (do-nothing-but-dont-tell-guile
     (make-bytevector 600)))
  ;; If the test fails and you look at the output,
  ;; notice that not all (fiber-start INDEX) are matched
  ;; by a (fiber-end INDEX).
  (pk 'fiber-end thread-index)
  #t)

(define (check-all-done per-thread)
  (define (bad? a)
    (not (eq? a #t)))
  (define bad (vector-index bad? per-thread))
  (if bad
      (begin (pk 'bad!! bad (vector-ref per-thread bad))
             #f)
      #t))

(test-assert "all fibers are xrun to completion"
  (let ((thread-indices (iota N_THREAD))
        ;; When a thread has completed, an element
        ;; of this vector is set to #t.
        (thread-done? (make-vector (+ 1 N_THREAD))))
    (run-fibers
     (lambda ()
       (define (run! thread-index)
         (spawn-fiber
          (lambda ()
            (vector-set! thread-done? thread-index
                         (thread thread-index)))))
       (for-each run! thread-indices))
     #:drain? #t
     ;; No need
     #:install-suspendable-ports? #f
     #:hz 700)
    (vector-set! thread-done? N_THREAD #t)
    (check-all-done thread-done?)))

Pinging @amirouche since someone on IRC suggested so.
Your N_THREAD and N_ITERATIONS may vary. Also, this test case occasionally succeeds.

@amirouche
Copy link

Oh but it does not rely on POSIX threads? Then I do not know.

@emixa-d
Copy link
Collaborator Author

emixa-d commented Jun 4, 2021

This is for version 1.0.0 btw. Didn't test on master.

@emixa-d
Copy link
Collaborator Author

emixa-d commented Jun 4, 2021

I can reproduce on master (44d17e6) (GNU Guile 2.2.7) as well.

@emixa-d
Copy link
Collaborator Author

emixa-d commented Jun 4, 2021

My suspicion:

When creating a scheduler with parallelism (make-scheduler #:parallelism N), it created a few other schedulers internally, which do not share a workqueue.

For testing whether the fibers have finished (see scheduler-work-pending & run-fibers), scheduler-work-pending? is used. But that procedure only looks at the workqueues of only one scheduler, and all other schedulers (co-schedulers?) are ignored!

Seems like we need to call scheduler-work-pending? on all schedulers in run-fibers.

@emixa-d emixa-d added the bug label Sep 4, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants