Question about checking linearizability of a history with a custom model #26

metanet · 2018-08-16T15:10:28Z

Hi there,

I am not sure if I should open this issue to this repo or the jepsen repo. I am giving a shot here and can port the issue to the jepsen repo if requested.

I have a very simple jepsen test taking 20 seconds and injecting no faults in the environment. It is a mutex test for now, but I will use it for testing re-entrant mutex in future once I make this version work.

In my test, i have 5 clients. Each client is acquiring and releasing the same distributed lock object repeatedly.

Here is my generator:

     :generator (->> [{:type :invoke, :f :acquire}
                                      ;(gen/sleep 1)
                                      {:type :invoke, :f :release}
                                      (gen/sleep 1)]
                                     cycle
                                     gen/seq
                                     gen/each
                                     (gen/stagger 1/10))

Here is my custom model:

(defrecord ReentrantMutex [owner lockCount]
  Model
  (step [this op]
    (if (nil? (:value op))
      (knossos.model/inconsistent "no owner!")
      (condp = (:f op)
        :acquire (if (or (nil? owner) (= owner (:value op)))
                   (createReentrantMutex this (:value op) (inc lockCount) op)
                   (do
                     ;(info (str "cannot acquire owner: " owner " lock count: " lockCount " op: " op " this model hash: " (System/identityHashCode this)))
                     (knossos.model/inconsistent "cannot acquire")))
        :release (if (or (nil? owner) (not= owner (:value op)))
                   (do
                     ;(info (str "cannot release owner: " owner " lock count: " lockCount " op: " op " this model hash: " (System/identityHashCode this)))
                     (knossos.model/inconsistent "cannot release"))
                   (if (= lockCount 1)
                     (createReentrantMutex this nil 0 op)
                     (createReentrantMutex this owner (dec lockCount) op))))))

  Object
  (toString [this] (str "owner: " owner ", lockCount: " lockCount)))

(defn createEmptyReentrantMutex []
  (let [
        initial (ReentrantMutex. nil 0)
        ;_ (info (str "initial model: " (System/identityHashCode initial)))
        ]
    initial)
  )

(defn createReentrantMutex [prev owner lockCount op]
  (
    let [
         newModel (ReentrantMutex. owner lockCount)
         ;_ (info (str "owner: " owner ", count: " lockCount ", op: " op " new hash: " (System/identityHashCode newModel) " prev owner: " (:owner prev) " prev lock count: " (:lockCount prev) " prev hash:" (System/identityHashCode prev)))
         ]
    newModel)
  )

My model is the ReentrantMutex defrecord. Initially, it is ReentrantMutex(nil, 0), means nobody holding the lock. For instance, if n1 acquires the lock, it will be ReentrantMutex("n1", 1).

When I run this test for 20 seconds without introducing any failure in the system, I get a history of operations.

I am sharing my history files and the timeline file here.
history.txt
history.edn.txt
timeline.html.txt

Sorry for the file extensions. Please rename "history.edn.txt" to "history.edn" and "timeline.html.txt" to "timeline.html.

In short, the history of operations is this one: N3_ACQ => N3_REL => N2_ACQ => N2_REL => N4_ACQ => N4_REL => N1_ACQ => N1_REL => N5_ACQ => N5_REL => N3_ACQ => N3_REL => N2_ACQ => N2_REL => N4_ACQ => N4_REL => N1_ACQ => N1_REL => N5_ACQ => N5_REL => N3_ACQ => N3_REL => N2_ACQ => N2_REL => N4_ACQ

So the lock is held by N3, N2, N4, N1, N5, N3, N2, N4, N1, N5, N3, N2, N4 in order. You can verify this by checking the history files and the timeline file.

The thing is, when the analyzer starts running with my model, it never terminates (when info logs above are commented out). I cannot say that I fully understand how it works internally but I see that it is walking through possible histories based on concurrency of operations, but each one failing with one of the knossos.model/inconsistent returns in my model. Moreover, as I run it with logs printed, I see states like ReentrantMutex("n2", 555). Probably, I am misunderstanding something, but I don't expect to see such states because acquire and release operations of the same client are sequential. Since I have a short history, I expect the analyzer to complete the verification step in short time. Therefore, I suspect I am doing something wrong in my model, or using the tool in some wrong way.

I have tried many things, but couldn't manage to make this model work. Could you help me with this issue? Any hint or a pointer would be very helpful.

Thanks in advance.

The text was updated successfully, but these errors were encountered:

aphyr · 2018-08-16T15:33:46Z

Having two definitions for createReentrantMutex looks a little unusual to me--I think normally this would be (defn createReentrantMutex ([] ...) ([arg1 arg2 arg3] ...)), but if it compiles and you're not getting arity exceptions, maybe it works! This miiiight be due to an infinite state space interacting poorly with model memoization code that runs before the analysis begins; we normally test with finite, compact spaces, and this one is countably infinite, so that miiiight be what's going on. Can you get a thread dump of the process using jstack or similar?

metanet · 2018-08-16T15:46:38Z

Thanks for the response.

I know it is not idiomatic clojure :) I am a clojure newbie. The code above works fine. The very first model objects are created with the createEmptyReentrantMutex function and then the rest goes on ReentrantMutex and createReentrantMutex.

Btw, I also run this test for 10 secs, only with a few acquire and releases, but still no luck.

I will get back to you with a thread dump.

metanet · 2018-08-17T09:31:44Z

Here is a thread dump for another run of the same test.
dump2.txt

aphyr · 2018-08-17T11:49:22Z

Yep, you're being bit by the memorization phase, which is designed for finite state spaces--but you've got an infinite state space with this model. You're in luck, I actually have some time today for the first time in months, and I know how to fix this, so I'll take a shot at it later on :)On Aug 17, 2018 05:31, Ensar Basri Kahveci <[email protected]> wrote:Here is a thread dump for another run of the same test. dump2.txt —You are receiving this because you commented.Reply to this email directly, view it on GitHub, or mute the thread.

metanet · 2018-08-17T12:02:49Z

that would be very helpful. thanks a lot.

looking forward to get the fix.

metanet · 2018-08-17T12:23:16Z

I have been cleaning up my code to give you a reproducer. You can run the test via https://github.com/metanet/jepsen/tree/custom-model if you like.
Just cd into the hazelcast directory and hit lein run test --workload lock --time-limit 20.

aphyr · 2018-08-17T14:17:08Z

Right, I've released 0.3.3-SNAPSHOT which aborts memoization for large state spaces. #25, you may also find this of use! Would y'all like to give that a shot and see if it helps?

metanet · 2018-08-19T09:00:17Z

thanks a lot. I will re-run it and ping you back.

metanet · 2018-08-19T10:41:58Z

I updated the knossos dependency of jepsen to 0.3.4-SNAPSHOT, re-installed jepsen, and run the test, but still encountering the same issue :(

metanet · 2018-08-27T08:51:20Z

ok, I have some good news. your explanation about the search space helped me to simplify my model a little bit. My goal is to test reentrancy, such that a client acquires the lock 2 times, and releases it. However, my model turned out to be too loose for this test. So I made the following simple change to reduce the search space and now the analysis step terminates successfully.

Thanks for your help.

(defrecord ReentrantMutex [owner lockCount]
  Model
  (step [this op]
    (if (nil? (:value op))
      (knossos.model/inconsistent "no owner!")
      (condp = (:f op)
        :acquire (if (and (< lockCount 3) (or (nil? owner) (= owner (:value op)))) ;;; <<<=== CHANGED LINE
                   (createReentrantMutex this (:value op) (inc lockCount) op)
                   (do
                     ;(info (str "cannot acquire owner: " owner " lock count: " lockCount " op: " op " this model hash: " (System/identityHashCode this)))
                     (knossos.model/inconsistent "cannot acquire")))
        :release (if (or (nil? owner) (not= owner (:value op)))
                   (do
                     ;(info (str "cannot release owner: " owner " lock count: " lockCount " op: " op " this model hash: " (System/identityHashCode this)))
                     (knossos.model/inconsistent "cannot release"))
                   (if (= lockCount 1)
                     (createReentrantMutex this nil 0 op)
                     (createReentrantMutex this owner (dec lockCount) op))))))

metanet closed this as completed Aug 27, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about checking linearizability of a history with a custom model #26

Question about checking linearizability of a history with a custom model #26

metanet commented Aug 16, 2018 •

edited

Loading

aphyr commented Aug 16, 2018

metanet commented Aug 16, 2018

metanet commented Aug 17, 2018

aphyr commented Aug 17, 2018 via email

metanet commented Aug 17, 2018

metanet commented Aug 17, 2018

aphyr commented Aug 17, 2018

metanet commented Aug 19, 2018

metanet commented Aug 19, 2018

metanet commented Aug 27, 2018 •

edited

Loading

Question about checking linearizability of a history with a custom model #26

Question about checking linearizability of a history with a custom model #26

Comments

metanet commented Aug 16, 2018 • edited Loading

aphyr commented Aug 16, 2018

metanet commented Aug 16, 2018

metanet commented Aug 17, 2018

aphyr commented Aug 17, 2018 via email

metanet commented Aug 17, 2018

metanet commented Aug 17, 2018

aphyr commented Aug 17, 2018

metanet commented Aug 19, 2018

metanet commented Aug 19, 2018

metanet commented Aug 27, 2018 • edited Loading

metanet commented Aug 16, 2018 •

edited

Loading

metanet commented Aug 27, 2018 •

edited

Loading