-
Notifications
You must be signed in to change notification settings - Fork 378
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Jack stops working after some time #307
Comments
JackPosixSemaphore is not used anymore since 1.9.11-RC1, please update your jack2 version and try again. |
Ok. I've recompiled from source and now semaphore error is gone. But JackPosixProcessSync::LockedTimedWait error is still there. Nov 13 12:24:42 echo jackd[2911]: jackdmp 1.9.12 |
Hi loblik, Can you run jackd with -v (verbose mode) and paste the output while it is failing? I had the same issue and narrowed it down to the alsa driver thread being stuck in xruns. I have a fix in my local copy, and will issue a pull request for it soon. Would be nice to know if it would help you too. |
I'm getting the same problem. I'm just running LSP plugins as standalone JACK applications in debug mode and spontaneously get: Jack: JackRequest::Notification On the plugin side: JACK version: |
arre525 - I have similar issues with both 1.9.11 and 1.9.12 versions. My use case is different, I don't use the network manger, I use various plugin hosts and effects programs for realtime music performance. I've suspected the ALSA driver also, but did not have the insight on how to fix it. I see that you have a clone of the Jack2 repo, but I could not find any commit that looked like a fix for the ALSA driver. I would be happy to test your fix, or to provide you with verbose data logs from a driver failure event. If you're still monitoring this issue, please get back to me. I would love to get this issue fixed and be able to reliably use jack. |
@Rippert - Yes, I had been planning on pushing my change to my repo and then issueing a pull request, but when doing some final tests I noticed there were still more issues then just the one I fixed. (And due to lack of time to do a more proper analysis I stopped working on it altogether). Anyway, the main point was that in verbose mode, you'd get something like this:
Which goes on and on. Anyway, I'll commit what I had already on my repo (or you just patch it from the diff on my blog), and hope it helps you as well! |
@Rippert : See arre525@a4c9ee5 |
@arre525 : Thanks. I have seen similar errors. I didn't save them, as you said, they go on and on. I've also seen your blog before. It's one of the reasons I kept trying to get Jack to work on a Raspberry Pi. I will try your code when I get back home. Which version of the kernel are you using? The xrun problem seemed to get better for me recently when Raspbian upgraded to 4.14. Not totally fixed, just less frequent. The lockups do seem to occur after a burst of xruns, latency spikes as you call them. Like when opening or closing a process that connects to Jack. Sometimes when starting up Jack itself. Pretty random, and increasing the number of samples per period doesn't really fix it as you still get the bursts of xruns occasionally. You should go ahead and submit your pull request to the official repo. You've documented it and it's shortcomings pretty well, so the maintainers can decide whether it's worth including. Thanks, |
@falkTX and other jack audio devs: I have seen the problem arre525 mentions on my Macbook Air with a core i5 running the KXStudio OS, so it's not just an ARM problem. It is a lot less frequent on my x86 machine, but still happens occasionally. I'm not sure if there's anything you can do if it's a bug in the kernel code as arre525 suspects, but just to let you know. |
@Rippert : I am using kernel 4.18.8 (with the 4.18.8-RT9 patchset for realtime behavior) |
@arre525 your change makes sense, and something I worked-around in ARM devices too (though in a different way). |
@7890 not tested yet. I believe I should build and install JACK from the develop branch? |
@sadko4u yes, thanks |
Just did a test on Raspbian Lite - looks like it still has the problem. Control: Experiment: Please let me know if you want me to try something differently, I will try. Also, please note - I have been using Arch Linux Arm for the last month with the default repo version of jack2 from pacman (1.9.12), and it does not have this problem. This is with the same hardware. I can get some heavy xruns, but it is always recoverable by changing the buffersize, and I can always kill jackd without the -KILL switch. This seems in line with the statement by @arre525 that his patch does not fix all cases and a change to the kernel may be needed. The Arch kernel is more recent and has the PREEMPT patch compared with the Raspbian kernel. There are surely many other differences also. Let me know if you would like more info. This was just a very quick test. Thanks for all your work on this, |
Getting a new kernel for raspbian would be the final test then, if it is possible. |
I tried using rpi-update to upgrade the kernel. it started at: The new kernel didn't change anything in terms of jackd locking up. It still ended up with rolling xruns and needing the -KILL switch to stop it. I'm not sure that this is a valid test as the kernel wasn't compiled with the same options as the Arch Linux Kernel, for example it doesn't have the PREEMPT option. I'll check the There are also other differences between Arch and Raspbian, Arch being a much more minimal system than even Raspbian Lite intrinsically. I could try compiling a new kernel for Raspbian, but I'd need to be able to get the compilation parameters of the Arch kernel to try to get them equal. If anyone has suggestions for where to find such information, let me know. |
Hi, Thanks for merging the change and promoting me to co-author:)
In any case, the ideal scenario is that we do a more proper investigation for the real cause here. I have plenty of pi's laying around here. If I find some time (might not be any time soon but I'll do my best), I can also have a further look into this. @Rippert , perhaps to kickstart me, do you have an image that I can simply flash on an SD card (and a description of the HW you are using?) We'll nail this the bug eventually:) |
@arre525 thanks for the detailed info. Responding to your bullets:
Unfortunately I'm getting this on Raspbian:
I like the idea of just using the ARCH kernel in Raspbian. Other than copying over the /boot partition and adjusting the PART_UUIDs in there, what else would I have to do? I don't have an image right now, but could make one. It will be fairly large given the multiple operating systems (using PINN) and all the customization inside each one. It's on a 60GB SSD, but only about 10GB of actual data between ARCH and Raspbian. I can probably extract each OS from the disk separately if that would help. |
I rebooted into Arch and tried reducing the buffersize of jackd. I got it down to 8 periods (jack_bufsize spit errors when I tried 4). Looking at guitarix, there were the rolling xruns, and they had the xx.xxe08 signature. However, I was still able to increase the buffer back to 64 without losing control of jackd. Guitarix needed a restart, but it worked fine afterward. So, similar results in terms of xruns, but jack remains responsive to commands and does not need to be killed to recover. The sound never stopped working. For Arch:
@arre525 I was able to get the kernel configuration out of Arch. It's attached, along with the other diagnostics you asked for, as text files because they are so long. Let me know if you have a problem with those and I can just cut and paste them. Raspbian_lsmod_dmesg.txt I'll start looking at how to compile a new kernel for raspbian with the Arch config. I compiled kernels back in the 90s, but not since then. I think some things may have changed in the meantime :) |
Thanks.
Nah, never mind then. It will take us more time than it saves.
Copying the full /boot would probably also work, but it somewhat overkill. Normally the minimum that you should do it (from the top of my head):
I would guess that's all you need to swap kernels. (Does seems easier now then compiling the kernel yourself:) |
@arre525 If you want a prepackaged image for ARCH, you can install the PINN image on your SD card (or HD): PINN is a fork of NOOBS, and one of the OS options is ARCH. That's where I got my current ARCH install from. I also tried out the install direct from the ARCH LINUX ARM site (not an image, so more work), and it worked well with jack too, but I wanted to get more use out of my 64G SSD, so I switched to PINN so I could have multiple OSes on one drive. I'll try your instructions to get the ARCH kernel into Raspbian when I get the chance, thanks. |
I followed your instructions and was able to boot Raspbian using the ARCH kernel, but it wouldn't load any modules:
Took a look at the snd_bcm2835 module to see if I could load it but no go:
Seems to be looking in the right place, but acting like the modules were compiled against a different kernel or something. I did try replacing the whole /boot partition contents, and I had to change /sbin/init from an absolute symbolic link to a relative one to get it to finish booting. Unfortunately, still the same Exec format error on the modules. |
OK, that was dumb. Raspbian doesn't like compressed modules. So a quick Unfortunately, I still ended up with unrecoverable xruns in jackd. So it's something else that's different between ARCH and Raspbian. By the way, I did double check:
It's definitely running the ARCH kernel and modules. |
Nice, thanks for putting the time into it. Fascinating. The exact same kernel and same jackd version, and still different results? |
It's not really the same jackd version. They just both say 1.9.12 when queried. On ARCH I just installed from the standard repo using pacman. On Raspbian, I built from source using the latest develop branch code (originally to test the new commit added from your fork). I'll look into trying to transfer the binary from ARCH to Raspbian, or building on ARCH statically as you mention if that doesn't work. I suppose it's possible to download the source and the PKGBUILD files on ARCH similar to how you can do that on Debian. I've only ever done that on Debian. It's possible the Arch folks have some trick up their sleeve in there. I seem to remember that you were using some different hardware for one of your builds in your blog, an OrangePi? Is that right? If it had the same problem, then there should be some kind of common factor. |
If you haven't compiled it yet on arch, I guess the simplest is to compile (statically) on raspbian and transfer that to ARCH. (If you compile staticallly you dont depend on any libs and so its easier to be sure it works when you move it to a new platform) |
Hi, I don't overview the current status - just a note on statically: there is libjack somewhere in this scenario, you can't (or shouldn't) statically link to it. If you see different behaviour of same jackd, you could check if libjack is (compiled) the same (way). If this is not related or you have thought already about it, please just ignore. |
Yes, I do. This is the code (linux/alsa/alsa_driver.c)
The idea is that is shows the delay between when the last trigger tstamp changed (e.g. when audio started processing again) and the current time. Regardless of the fact that is is printed in the wrong format (a big negative number) , as long as the alsa tstamp is constant, the difference between those numbers is still an accurate delay. From what I make up from your post and mine, is that it simply takes too long for it to service it, causing the "logical" xruns. Simply having more processing running that linux has to schedule between could be a simple explanation between raspbian and arch, but to know for sure you'd have to force xruns on both and compare the delays. What I still want to do, is look into whether jackd isn't being too inefficient in handling the xrun. (If it has to do extra processing recovering from it, it will take more time, and jump right back into one. ). (Again, once I find more time, will keep you posted) Regarding the comment of the static build, yes, typically in the configure you can specify this so it ends up in the cflags. But as @7890 mentionned this might be more tricky as I first naively anticipated. Since you would compile on the two targets anyway, it seems to make more sense simply to compile the exact same code and take it from there. |
Ahh, thanks for the code snippet. I understand the large negative numbers now. |
Well, I can't get ARCH to become unrecoverable, but I can get it to spit out a lot of xruns just by lowering the number of periods. So here is some output from ARCH when Guitarix is running and I reduce the periods to 8:
So, instead of the ~1 msec differential on Raspbian, ARCH has ~20 msec. So it's possible that ARCH simply has more time to recover between each xrun than Raspbian. Which seems different than the idea that ARCH has more available processor cycles than Raspbian due to less load. Of course, it's possible that whatever is causing the weird operation of the |
Forget the last post. I just tried it again, and now the timestamps are ~1 msec apart on ARCH. Not sure why the difference, but obviously not a very reliable metric. |
Check this out. I quickly tested this on my x86 virtual machine, recompiling jackd myself.
This allows to accurately show how long it takes for the alsa_driver_wait function to be called each time (without relying on alsa timestamps). Some findings:
Note how " Jack: 29275 us (and 0 s) passed since last alsa_driver_wait " takes 30ms while the rest only takes 2-5ms. Also the buffer length (and even jackd) says the buffer is 2 ms. The delays are around that, but they don't immediately cause an xrun. Going to look a bit further, but I think I can pretty much nail it down from here. |
Some more printing:
This is pretty retarted:
Anyway, pretty interesting stuff. If I find some more jucy details, will share:) |
@Rippert I just discovered "soft mode" in jackd. Even without recompiling, could you simply try the -s option (in the alsa parameters). |
Finally brought out the orange pi again. Guitarix has a damn good reverb setting:) Anyway, some observations:
|
For the negative xrun delay printing, I can confirm this is fixed already now by commit 756b4fa (setting snd_pcm_sw_params_set_tstamp_mode). Before:
Now:
|
@Rippert what build are you using? (git show) Cause I don't understand why you still see the negative timestamps.. |
@arre525 really interesting findings!
This must be remembered - it's a valuable discovery |
Maybe given this much traffic (mea cupla:) on this thread, perhaps also to focus it back on the initial issue.
Most of the time there is no issue; but sometimes it suddenly goes into a loop of xruns and does not recover. Even though printfs can play a major role, this is not the only cause: If, in the case it "works", I open other ssh session and do a print as fast as possible, it does not trigger the xruns. => So this is I guess still the main focus of the thread |
Yes.. it seems like once the error is triggered, it can not recover. It also seems like the system can cope with desired arguments, but not for every try. It has the signature of a bug. |
But it may be using the wrong library path. I tried to use LD_LIBRARY_PATH, but it won't start when I do. I need to create a new fresh Raspbian and ARCH to try out the static build anyway, so I'll have to get back to you after I do that. |
Tried one more thing, but the results are weird, in a kinda good way. I reconfigured compiled and installed jack2 with the libpath set to overwrite the old apt-get installed libraries in So it looks like the code changes did do something, but there is some new problem now with verbose output. It may still be just something in my system setup, but the positive changes in the non-recoverable mode and the xrun out to clients seem pretty good. @arre525 I'm not sure why you are still getting unrecoverable states, except that you're using an OrangePi zero which is a bit slower than an RPi 3B+. Anyway, thanks for the improved code, and sorry about my poor use of libraries. |
Just for completeness, I reverted to the experimental Raspbian kernel:
Results were the same, no unrecoverable state.
Again, no unrecoverable state when lowering jackd all the way down to 8 periods and then back up to a reasonable number. |
Testing @arre525 alsa driver patch. I reverted to the commit just before it was added and rebuilt/installed:
Sure enough, it is now quite easy to get into the unrecoverable state by lowering the periods. So Arnout's patch definitely improved things. |
Something is still odd - I've tried with and without patch and jackd gets into unrecoverable state when doing this:
Worst case is if hitting once a limit will render the server unresponsive, not good. Btw there were no other clients running during test. I think this behavior is not limited to rpi computers - we'll have to investigate more how alsa + jackd organize for a smooth data passing and handle odd cases in a reasonably robust way, this is not the most simple task. Thank you both, OP and all on this thread to bring it up and track so far!
|
Re "with and without patch": I naively only applied the patch @arre525 posted 2 days ago. If there is anything else to apply I will give it a spin! Meanwhile I'll try to understand the core of the issue better, we'll find a solution. |
@7890 your comment about not being limited to rpi computers sounds like you are using some other hardware, are you? If it's an rpi, can you show your /boot/config.txt contents and the output of aplay -L? I can try to reproduce your results if so. |
@Rippert yes not an rpi just a random PC.
|
@7890 I've seen similar problems on my old MacBook running KXStudio, a while back, but they went away after an update and I haven't been able to reproduce them since. You may be having problems with your internal soundcard. Do you have a USB interface you could try? |
I'm running into this problem a lot now, I think this is a regression in the Alsa driver. I have not bisected to find out when it occurred, but I don't recall it being this bad when running v1.9.12. I'm running commit b35fa69 (Tue Mar 26 22:47:48 2019 +0100). When I run jackd with the Alsa driver I can get into the unrecoverable state by loading the system (my method is simply spamming ctrl+n in Chrome to spawn new windows).
I get the same behavior even if I don't use verbose mode. If I instead run jackd with the dummy driver and hook up alsa_in and alsa_out to the same sound card I do not end up in the unrecoverable state. I get some audio glitches when loading the system but it always recovers. |
Hi @falkTX ! We're having similar issues:
with jackd2 1.9.12, fresh build from repository code (tag v1.9.12). Exactly same problem with 1.9.11, installed package from stretch repo. Initially everything works OK, but after a variable time (from minutes to hours), all clients are disconnected like that. Jack server is still running and, in fact, we can restart the clients (without restarting jackd) and everything works OK until the next "error event". Please, note that v1.9.12 also prints the "JackPosixSemaphore::TimedWait err = Connection timed out", what doesn't fit your statement about. Thanks! |
Hi @falkTX & everybody! https://github.com/zynthian/zynthian-sys/issues/93 Resume: Regards, |
Nice find @jofemodo! I have a feeling jackd must be trying to compare system timestamps somewhere instead of relying on a monotonic clock for this. If someone had time to look at this, can't be too hard to trace back to the relevant code. Don't think this is the same root cause as most other ppl are encountering in this thread, but anyway one that needs fixing. Fancy work at zynthian btw |
Thanks! We are really happy with our little wonderful machine ;-) Regards, |
I open a separate issue: #469 , although both could be related ... |
I start jack using following command:
jackd -R -d alsa -d hw:Device -r 44100
Then I also load netmanager with:
jack_load netmanager -i "-c"
After some time jack stops working with error "Connection time dout". This is probably because sem_timedwait in JackPosixSemaphore.cpp returns ETIMEDOUT.
I noticed JackAndroidSemaphore.cpp has very similiar code and it does some special handling of ETIMEDOUT error. So could it be something is missing here?
It's running as realtime process on quadcore ARMv7 without any other workload. So I guess performance is not a problem.
Full log follows.
Nov 10 01:10:23 echo jackd[4543]: jackdmp 1.9.11
Nov 10 01:10:23 echo jackd[4543]: Copyright 2001-2005 Paul Davis and others.
Nov 10 01:10:23 echo jackd[4543]: Copyright 2004-2014 Grame.
Nov 10 01:10:23 echo jackd[4543]: jackdmp comes with ABSOLUTELY NO WARRANTY
Nov 10 01:10:23 echo jackd[4543]: This is free software, and you are welcome to redistribute it
Nov 10 01:10:23 echo jackd[4543]: under certain conditions; see the file COPYING for details
Nov 10 01:10:23 echo jackd[4543]: JACK server starting in realtime mode with priority 10
Nov 10 01:10:23 echo jackd[4543]: self-connect-mode is "Don't restrict self connect requests"
Nov 10 01:10:23 echo jackd[4543]: audio_reservation_init
Nov 10 01:10:23 echo jackd[4543]: Acquire audio card Audio1
Nov 10 01:10:23 echo jackd[4543]: creating alsa driver ... hw:Device|hw:Device|1024|2|44100|0|0|nomon|swmeter|-|32bit
Nov 10 01:10:23 echo jackd[4543]: configuring for 44100Hz, period = 1024 frames (23.2 ms), buffer = 2 periods
Nov 10 01:10:23 echo jackd[4543]: ALSA: final selected sample format for capture: 24bit little-endian in 3bytes format
Nov 10 01:10:23 echo jackd[4543]: ALSA: use 2 periods for capture
Nov 10 01:10:23 echo jackd[4543]: ALSA: final selected sample format for playback: 24bit little-endian in 3bytes format
Nov 10 01:10:23 echo jackd[4543]: ALSA: use 2 periods for playback
Nov 10 01:10:24 echo jack_wait[4544]: server is available
Nov 10 01:10:24 echo jackd[4543]: Starting Jack NetManager
Nov 10 01:10:24 echo jackd[4543]: Listening on '225.3.19.154:19000'
Nov 10 01:10:24 echo jack_load[4552]: netmanager is running.
Nov 10 01:10:24 echo jack_load[4552]: client name = netmanager
Nov 10 01:10:24 echo systemd[1]: Started Jack Audio Connection Kit daemon.
Nov 11 06:43:21 echo jackd[4543]: JackPosixSemaphore::TimedWait err = Connection timed out
Nov 11 06:43:21 echo jackd[4543]: SuspendRefNum error
Nov 11 06:43:21 echo jackd[4543]: JackClient::Execute error name = netmanager
Nov 11 06:43:21 echo jackd[4543]: JackPosixProcessSync::LockedTimedWait error usec = 464380 err = Connection timed out
Nov 11 06:43:21 echo jackd[4543]: JackEngine::ClientDeactivate wait error ref = 3 name = netmanager
Nov 11 10:56:34 echo jackd[4543]: JackPosixProcessSync::LockedTimedWait error usec = 5000000 err = Connection timed out
Nov 11 10:56:34 echo jackd[4543]: Driver is not running
Nov 11 10:56:34 echo jackd[4543]: Cannot create new client
Nov 11 10:56:34 echo jackd[4543]: CheckSize error size = 32 Size() = 12
Nov 11 10:56:34 echo jackd[4543]: CheckRead error
Nov 11 10:56:34 echo jackd[4543]: CheckSize error size = -1 Size() = 4
Nov 11 10:56:34 echo jackd[4543]: CheckRead error
Nov 11 10:56:34 echo jackd[4543]: CheckSize error size = 0 Size() = 12
Nov 11 10:56:34 echo jackd[4543]: CheckRead error
Nov 11 16:22:29 echo jackd[4543]: Jack main caught signal 15
The text was updated successfully, but these errors were encountered: