Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Tests] Java Runtime Environment crashes with fatal error with network stream #245

Closed
hukoyu opened this issue May 14, 2020 · 13 comments
Closed
Labels
area: samples Samples bug needs-triage Bug does not yet have a priority assigned p1 Medium priority
Milestone

Comments

@hukoyu
Copy link
Collaborator

hukoyu commented May 14, 2020

Added new java scenario test with this PR #244 and caught this exception. The test just opens a network stream to https://www.microsoft.com address and reads from the stream line by line.

Please look at the PR for sample code used and Makefile.

Exception:

OpenJDK 64-Bit Server VM warning: Can't detect primordial thread stack location - find_vma failed
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f763d035278, pid=36, tid=0x00007f7600b31e00
#
# JRE version: OpenJDK Runtime Environment (8.0_212-b04) (build 1.8.0_212-b04)
# Java VM: OpenJDK 64-Bit Server VM (25.212-b04 mixed mode linux-amd64 compressed oops)
# Derivative: IcedTea 3.12.0
# Distribution: Custom build (Sat May  4 17:33:35 UTC 2019)
# Problematic frame:
# J 668 C2 java.util.Arrays.equals([B[B)Z (54 bytes) @ 0x00007f763d035278 [0x00007f763d035260+0x18]
#
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
bad count while changing owner
2020-05-14T01:32:57.000000Z [(H)ERROR] tid(0x7f76ecb38700) | :OE_ENCLAVE_ABORTING [/home/hukoyu/openenclave/host/calls.c:oe_call_enclave_function_by_table_id:91]
Makefile:26: recipe for target 'run-hw' failed```

cc @prp @davidchisnall @letmaik 
@hukoyu
Copy link
Collaborator Author

hukoyu commented May 14, 2020

@hukoyu
Copy link
Collaborator Author

hukoyu commented May 14, 2020

Here is the full log of run-hw for this sample java app
java-network.log.log

@prp
Copy link
Member

prp commented May 14, 2020

@hukoyu, could you do the following:

  • run this with gdb and include a stack trace at the time when the SIGSEGV has happened?
  • include a log with LKL syscall and signal tracing turned on to show the last syscall before the SIGSEV?
  • do you see the same behaviour in hardware and software mode?
  • do you see the same behaviour when running with a single ethread?

@letmaik letmaik added the bug label May 14, 2020
@hukoyu
Copy link
Collaborator Author

hukoyu commented May 14, 2020

@prp the issue reproes with SGXLKL_ETHREADS=1. For run-hw we get same error I reported initially. For run-sw error looks a bit different:

[[  SGX-LKL ]] lkl_start_init(): creating LKL termination thread
[[  SGX-LKL ]] create_lkl_termination_thread(): enter
[[  SGX-LKL ]] lkl_start_init(): calling lkl_mount_virtial()
[[  SGX-LKL ]] lkl_termination_thread(): enter
[[  SGX-LKL ]] lkl_termination_thread(): Performed LKL syscall to get host task allocated (pid=36)
[[  SGX-LKL ]] lkl_start_init(): calling init_random()
[[  SGX-LKL ]] init_random(): Adding entropy to entropy pool
[    0.010284] random: crng init done
[[  SGX-LKL ]] init_enclave_clock(): Setting enclave realtime clock
[    0.048176] software IO TLB: mapped [mem 0x7fef8705d000-0x7fef8b05d000] (64MB)
[[  SGX-LKL ]] wg0 has public key /ZjlMYO5vpGVxmi2utd0CUb/a5NY7ZfMfS6DbzMCuTw=
[    0.049942] virtio_blk virtio2: [vda] 819200 512-byte logical blocks (419 MB/400 MiB)
[[  SGX-LKL ]] FAIL: Bug: next_delay_ns already set for timer
2020-05-14T09:26:45.000000Z [(H)ERROR] tid(0x7fef84052700) | :OE_ENCLAVE_ABORTING [/home/hukoyu/openenclave/host/calls.c:oe_call_enclave_function_by_table_id:91]
[   SGX-LKL  ] init (0: 19 exit=0) [   SGX-LKL  ] FAIL: sgxlkl_ethread_init() failed (ethread_id=0 result=19 (OE_ENCLAVE_ABORTING))
Makefile:30: recipe for target 'run-sw' failed
make: *** [run-sw] Error 1```

@hukoyu
Copy link
Collaborator Author

hukoyu commented May 14, 2020

@prp with default SGXLKL_ETHREADS value (4 in my VM) it fails for run-hw with the initial error I reported. But it passes for run-sw. Here is the log for run-sw:
java-network-run-sw.log

@hukoyu
Copy link
Collaborator Author

hukoyu commented May 14, 2020

@prp here are logs with SGXLKL_TRACE_LKL_SYSCALL=1 SGXLKL_TRACE_SIGNAL=1
java-network-run-hw-verbose.log
java-network-run-sw-verbose.log

@hukoyu
Copy link
Collaborator Author

hukoyu commented May 14, 2020

@prp here is the call stack from debuging:

Thread 10 "ENCLAVE" received signal SIGILL, Illegal instruction.
[Switching to Thread 0x7fffd4b9b700 (LWP 26995)]
0x00007fff3cdad1a9 in ?? ()
(gdb) bt
#0  0x00007fff3cdad1a9 in ?? ()
#1  0x0000000000000246 in ?? ()
#2  0x00007fff3cdad160 in ?? ()
#3  0x00007fff3dc99738 in ?? ()
#4  0x00007fff3dcc91a4 in ?? ()
#5  0x00007fff3da0bccc in ?? ()
#6  0x00007fff3da0cf5e in ?? ()
#7  0x00007fff3d80ab5b in ?? ()
#8  0x00007fff3d9e3520 in ?? ()
#9  0x00007fff3d83c34e in JNI_CreateJavaVM ()
#10 0x00007fff3dd10f32 in ?? ()
#11 0x00007fff005a7730 in _exec (lt_=0x7fff00b31f30) at sched/lthread.c:160
#12 0x0000000000000000 in ?? ()

letmaik pushed a commit that referenced this issue May 15, 2020
Disabling java-network scenario test for now due to #245
@prp prp added area: samples Samples p1 Medium priority labels May 15, 2020
@prp prp added this to the Milestone 1 milestone May 15, 2020
@prp
Copy link
Member

prp commented May 15, 2020

@prp here is the call stack from debuging:

Thread 10 "ENCLAVE" received signal SIGILL, Illegal instruction.
[...]

@hukoyu since this is a SIGILL, it's not the problem. You can continue execution here.

@hukoyu
Copy link
Collaborator Author

hukoyu commented May 15, 2020

@prp I skipped the SIGILL erros and got SIGSEV but call stack is still same

Thread 9 "ENCLAVE" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffd539c700 (LWP 18747)]
0x00007fff3cdad2b4 in ?? ()
(gdb) bt
#0  0x00007fff3cdad2b4 in ?? ()
#1  0x0000000000000246 in ?? ()
#2  0x00007fff3cdad160 in ?? ()
#3  0x00007fff3dc99738 in ?? ()
#4  0x00007fff3dcc91a4 in ?? ()
#5  0x00007fff3da0bccc in ?? ()
#6  0x00007fff3da0cf5e in ?? ()
#7  0x00007fff3d80ab5b in ?? ()
#8  0x00007fff3d9e3520 in ?? ()
#9  0x00007fff3d83c34e in JNI_CreateJavaVM ()
#10 0x00007fff3dd10f32 in ?? ()
#11 0x00007fff005a7730 in _exec (lt_=0x7fff00b31f50) at sched/lthread.c:160
#12 0x0000000000000000 in ?? ()

@prp
Copy link
Member

prp commented May 16, 2020

This suggests that the problem is in the JVM code. The next step would be to run this with a debug build of the HotSpot JVM that includes symbols. Given that it only fails in HW mode, it must be caused by a new recent regression that we have introduced.

@hukoyu
Copy link
Collaborator Author

hukoyu commented Jun 4, 2020

basic-network-(debug)-(run-hw)-(8-ethreads) also fails.
https://tfsproduks1.visualstudio.com/Ab5352d9a-5ed5-48f2-b767-7f273b55b53c/sgx-lkl/_TestManagement/Runs?runId=9086&_a=resultSummary&resultId=100009

[   SGX-LKL  ] SGX-LKL (OE) Git version c68d9cf-dirty LKL version 5.3.0 [DEBUG build (-O0)] [HARDWARE DEBUG]
[   SGX-LKL  ] nproc=4 ETHREADS=8 CMDLINE="mem=32M"
[   SGX-LKL  ] HW TLS support: conf->fsgsbase=1
[   SGX-LKL  ] Registering disk 0 (path='sgxlkl-alpine.img', mnt='/', [RW   ])
[   SGX-LKL  ] get_signed_libsgxlkl_path... result=/opt/sgx-lkl-debug/bin/../lib/libsgxlkl.so.signed
[   SGX-LKL  ] oe_create_enclave... result=0 (OE_OK)
[   SGX-LKL  ] sgxlkl_enclave_init(ethread_id=0)
[[  SGX-LKL ]] wg0 has public key Rg0sc/Ix4oRyKDC6gfxAoYtqvdXcoMNEgW1rytQab1Q=
[   SGX-LKL  ] ethread (4: 0) ethread (5: 0) init (0: 0 exit=18) ethread (7: 0) ethread (6: 0) ethread (1: 0) ethread (3: 0) ethread (2: 0) 
[   SGX-LKL  ] oe_terminate_enclave... done
[   SGX-LKL  ] SGX-LKL-OE exit: exit_status=18
make: *** [run-hw] Error 18

@SeanTAllen
Copy link
Contributor

I ran into this today locally. Here's what I have for information. Much of it is supporting the previous information:

Happens for me with make run-hw (not make run-sw) with either debug or non-release build.

sean on dc-sean-dev in languages/java/network on  oe_port [!?]  make run-hw
sgx-lkl-java --hw-debug sgxlkl-java.img MainApp
[   SGX-LKL  ] SGX-LKL (OE) 0.2.0-dev (d3e0b9a) LKL 5.3.0 [DEBUG build (-O0)] [HARDWARE DEBUG]
[   SGX-LKL  ] nproc=8 ETHREADS=8 CMDLINE="mem=32M"
[   SGX-LKL  ] HW TLS support: conf->fsgsbase=1
[   SGX-LKL  ] Registering disk 0 (path='sgxlkl-java.img', mnt='/', [RW    ])
[   SGX-LKL  ] get_signed_libsgxlkl_path... result=/home/sean/code/lsds/test/build_musl/./libsgxlkl.so.signed
[   SGX-LKL  ] oe_create_enclave... result=0 (OE_OK)
[   SGX-LKL  ] sgxlkl_enclave_init(ethread_id=0)
[[  SGX-LKL ]] sgxlkl_enclave_init(): enter
[[  SGX-LKL ]] sgxlkl_enclave_show_attribute(): enclave base=0x7f9a80000000 size=1.043 GB
[[  SGX-LKL ]] sgxlkl_enclave_show_attribute(): enclave heap base=0x7f9a80b49000 size=1024.00 M end=0x7f9ac0b49000
[[  SGX-LKL ]] sgxlkl_enclave_init(): calling _dlstart_c()
[[  SGX-LKL ]] _register_enclave_signal_handlers(): Registering OE exception handler...
[[  SGX-LKL ]] lkl_start_init(): kernel command line: 'mem=32M console=hvc0'
[[  SGX-LKL ]] lkl_start_init(): lkl_start_kernel() called
[    0.000000] Linux version 5.3.0+ (sean@dc-sean-dev) (gcc version 7.5.0 (Ubuntu 7.5.0-3ubuntu1~18.04)) #1 Wed Jun 10 15:54:03 UTC 2020
[    0.000000] memblock address range: 0x7f9abe8c8000 - 0x7f9ac08c8000
[    0.000000] Built 1 zonelists, mobility grouping on.  Total pages: 8080
[    0.000000] Kernel command line: mem=32M console=hvc0 virtio_mmio.device=256@0x1000000:1 virtio_mmio.device=273@0x2000000:2
[    0.000000] Dentry cache hash table entries: 4096 (order: 3, 32768 bytes, linear)
[    0.000000] Inode-cache hash table entries: 2048 (order: 2, 16384 bytes, linear)
[    0.000000] mem auto-init: stack:off, heap alloc:off, heap free:off
[    0.000000] Memory available: 32176k/32768k RAM
[    0.000000] SLUB: HWalign=32, Order=0-3, MinObjects=0, CPUs=1, Nodes=1
[    0.000000] NR_IRQS: 4096
[    0.000000] lkl: irqs initialized
[    0.000000] clocksource: lkl: mask: 0xffffffffffffffff max_cycles: 0x1cd42e4dffb, max_idle_ns: 881590591483 ns
[    0.000000] lkl: time and timers initialized (irq3)
[    0.000537] printk: console [hvc0] enabled
[    0.000537] pid_max: default: 4096 minimum: 301
[    0.001096] Mount-cache hash table entries: 512 (order: 0, 4096 bytes, linear)
[    0.001096] Mountpoint-cache hash table entries: 512 (order: 0, 4096 bytes, linear)
[    0.002211] devtmpfs: initialized
[    0.006822] random: get_random_bytes called from 0x00007f9a806649a4 with crng_init=0
[    0.006822] clocksource: jiffies: mask: 0xffffffff max_cycles: 0xffffffff, max_idle_ns: 19112604462750000 ns
[    0.007326] futex hash table entries: 16 (order: -4, 384 bytes, linear)
[    0.007326] xor: automatically using best checksumming function   8regs
[    0.007326] NET: Registered protocol family 16
[    0.010059] clocksource: Switched to clocksource lkl
[    0.010615] NET: Registered protocol family 2
[    0.011188] tcp_listen_portaddr_hash hash table entries: 256 (order: 0, 4096 bytes, linear)
[    0.011188] TCP established hash table entries: 512 (order: 0, 4096 bytes, linear)
[    0.011188] TCP bind hash table entries: 512 (order: 0, 4096 bytes, linear)
[    0.011188] TCP: Hash tables configured (established 512 bind 512)
[    0.011748] UDP hash table entries: 128 (order: 0, 4096 bytes, linear)
[    0.011748] UDP-Lite hash table entries: 128 (order: 0, 4096 bytes, linear)
[    0.011748] NET: Registered protocol family 1
[    0.011748] virtio-mmio: Registering device virtio-mmio.0 at 0x1000000-0x10000ff, IRQ 1.
[    0.012297] virtio-mmio: Registering device virtio-mmio.1 at 0x2000000-0x2000110, IRQ 2.
[    0.012297] workingset: timestamp_bits=62 max_order=14 bucket_order=0
[    0.013964] NET: Registered protocol family 38
[    0.013964] io scheduler mq-deadline registered
[    0.013964] io scheduler kyber registered
[    0.097783] software IO TLB: mapped [mem 0x7f9b1c40b000-0x7f9b2040b000] (64MB)
[    0.101438] loop: module loaded
[    0.101998] device-mapper: ioctl: 4.40.0-ioctl (2019-01-18) initialised: [email protected]
[    0.103125] wireguard: WireGuard 0.0.20191219 loaded. See www.wireguard.com for information.
[    0.103125] wireguard: Copyright (C) 2015-2019 Jason A. Donenfeld <[email protected]>. All Rights Reserved.
[    0.103684] NET: Registered protocol family 10
[    0.104244] Segment Routing with IPv6
[    0.104244] sit: IPv6, IPv4 and MPLS over IPv4 tunneling driver
[    0.104805] This architecture does not have kernel memory protection.
[    0.105378] Run /init as init process
[[  SGX-LKL ]] lkl_start_init(): lkl_start_kernel() finished
[[  SGX-LKL ]] lkl_start_init(): creating LKL termination thread
[[  SGX-LKL ]] create_lkl_termination_thread(): enter
[[  SGX-LKL ]] lkl_start_init(): calling lkl_mount_virtial()
[[  SGX-LKL ]] lkl_termination_thread(): enter
[[  SGX-LKL ]] lkl_termination_thread(): Performed LKL syscall to get host task allocated (pid=36)
[[  SGX-LKL ]] lkl_start_init(): calling init_random()
[[  SGX-LKL ]] init_random(): Adding entropy to entropy pool
[    0.105936] random: crng init done
[[  SGX-LKL ]] init_enclave_clock(): Setting enclave realtime clock
[[  SGX-LKL ]] wg0 has public key quT14z9b7JSZU37M55ReMiTD9rf5CF21pUh93cQAVTE=
Azure Quote Provider: libdcap_quoteprov.so [ERROR]: Could not retreive environment variable for 'AZDCAP_DEBUG_LOG_LEVEL'
[    0.138647] virtio_blk virtio2: [vda] 819200 512-byte logical blocks (419 MB/400 MiB)
[[  SGX-LKL ]] lkl_mount_disk(): lkl_mount_disk(dev="/dev/vda", mnt="/mnt/vda", ro=0)
[    0.140323] EXT4-fs (vda): warning: mounting unchecked fs, running e2fsck is recommended
[    0.141452] EXT4-fs (vda): mounted filesystem without journal. Opts:
[[  SGX-LKL ]] lkl_mount_disks(): Set working directory: /
[[  SGX-LKL ]] libc_start_main_stage2(): Calling app main: /usr/bin/java
OpenJDK 64-Bit Server VM warning: Can't detect primordial thread stack location - find_vma failed
#
# A fatal error has been detected by the Java Runtime Environment:
#
#  SIGSEGV (0xb) at pc=0x00007f9abd065438, pid=36, tid=0x00007f9a80b56fb0
#
# JRE version: OpenJDK Runtime Environment (8.0_212-b04) (build 1.8.0_212-b04)
# Java VM: OpenJDK 64-Bit Server VM (25.212-b04 mixed mode linux-amd64 compressed oops)
# Derivative: IcedTea 3.12.0
# Distribution: Custom build (Sat May  4 17:33:35 UTC 2019)
# Problematic frame:
# J 643 C2 java.util.Arrays.equals([B[B)Z (54 bytes) @ 0x00007f9abd065438 [0x00007f9abd065420+0x18]
#
# Failed to write core dump. Core dumps have been disabled. To enable core dumping, try "ulimit -c unlimited" before starting Java again
#
bad count while changing owner
[[  SGX-LKL ]] FAIL: Kernel panic! Aborting...
2020-06-10T16:01:49.000000Z [(H)ERROR] tid(0x7f9b117fa700) | :OE_ENCLAVE_ABORTING [/home/sean/openenclave-sgxlkl/host/calls.c:oe_call_enclave_function_by_table_id:91]
[   SGX-LKL  ] ethread (7: 19) [   SGX-LKL  ] FAIL: sgxlkl_ethread_init() failed (id=7 result=19 (OE_ENCLAVE_ABORTING))
Makefile:26: recipe for target 'run-hw' failed
make: *** [run-hw] Error 1

The segfault is caused by the network call in the Java program. I can recreate the error with this smaller program:

import java.io.*;
import java.net.*;

public class MainApp
{
        public static void main(String[] args) throws MalformedURLException {
                URL url = new URL("https://www.google.com/");
                try {
                    url.openStream();
                } catch (Exception e) {
                     System.out.println("x");
                }
        }
}

I've tested with 1 to 8 ETHREADS. Always the same result.

sgx-lkl commit: d3e0b9a
oe commit: 8fec3caf5acae75cd4657557d20f70d7e072f0ae

@prp prp changed the title Java Runtime Environment crashes with fatal error with network stream [Tests] Java Runtime Environment crashes with fatal error with network stream Jun 27, 2020
@davidchisnall davidchisnall added the needs-triage Bug does not yet have a priority assigned label Jul 28, 2020
@SeanTAllen
Copy link
Contributor

Closing as a duplicate of #680.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: samples Samples bug needs-triage Bug does not yet have a priority assigned p1 Medium priority
Projects
None yet
Development

No branches or pull requests

5 participants