Force ViewBackend dispatch when the bridge connection is lost #161

psaavedra · 2021-05-07T12:09:24Z

Case of use:

WebView 1 load a Page Domain1/page (surface 1)
WebView 1 load a Page Domain2/page (surface 2) (but is filtered by the Content Filters Policy (surface 2 is inmediately destroyed)
WebView 2 load a Page Domain1/page (surface 3)
WebView 2 is closed
WebView 1 not updated anymore.

Explanation:

The figure shows the situation in 2):

At that point, during the creation of the new surface (4) will trigger the ViewBackend:dispatchFrameCallbacks() but for the surface number 4, being inhibited the previous one (the surface 3). So this Surface 3 will not process the callbacks associated to the frame reached inmediately after the register action is done (2nd red box) and it will be never received the commit from the WebProcess because this will keep waiting for the wl_callback_send_done() triggered in the Surface::dispatchFrameCallbacks().

The 1b95b25 attempts to address that situation adding an Instance::dispatchFrameCallbacks() of the previous surface when the new surface is registered but that does not guarantee that you get the frame ACK message associated to the previous surface after the register (the orange box just after the blue box). The result: the previous surface never sends the wl_callback_send_done() associated for this surface.

So far the real problem. From here how the bug is manifested: ...

In the state in which everything is we have:

The new surface (4) was created but destroyed immediately afterwards (but not by an unregisterSurface() but by a wl_client_destroy)
The previous surface (3) did not send the wl_callback_send_done() to the target renderer backend so the WebProcess stops to generate a new frame.

Then 3 scenarios:

If a page from a different domain is loaded on the same view: Then the situation can be recovered since a new Surface is going to be created and the previous one dies
If the new request is in the same domain. GAME_OVER. There is no new a registerSurface() and the Surface 3 is still pending a Surface::dispatchFrameCallbacks()
Creation a new page in a view ... apparently everything is still working. We will continue ...

... The new page load works in the new WebView because: 1) it is a new surface 2) associated with a new ViewBackend but recovers the situation? The answer is no: ...

A priori you could thinkg that the registerSurface() will shoot the ViewBackend::dispatchFrameCallbacks() for the previous surface. That is True but for the previous surface in ViewBackend but this is new one so only has one Surface (number 5) therefore it will not dispatch anything. As a consequence once this WebView is closed returns to the previous View, we will have that Surface 3 is still in the same inconsistent state.

So where can we make sure the dispatch for a previous Surface is going to work when those lazy frame happends?. I said before... there is not an unregisterSurface() for Surface 4, it dies abruptly therefore the place where to make sure that the previous frame of a previous surface is dispatched is in the bridgeConnectionLost().

As each ViewBackend (and hence web view) can get its contents rendered by different WPEWebProcess due to PSON, keep around a list of all the differen bridgeIds which have been associated with it in a std::vector. Newly added elements are always pushed at the back, so the most recent association (i.e. the "current" active one) is always the last element. Keeping the list allows getting back to the previously used ones.

Arrange for calling ViewBackend::unregisterSurface() also when a Surface is destroyed. This is a no-op if the nested compositor had the chance to receive and handle a FdoIPC::UnregisterSurface message before the surface is destroyed; but allows undoing the bridge association when the Wayland client connection is closed beforehand. The destruction callback for the surface is guaranteed to always be called by libwayland, even on dropped connections, to allow cleaning up reagardless of the fate of its clients. Among others, this covers the case in which a WPEWebProcess is abruptly exited, for example due to a crash or a top-level resource load being blocked by a content filter. Co-authored-by: Adrian Perez de Castro <[email protected]> Acked-by: Adrian Perez de Castro <[email protected]> Signed-off-by: Pablo Saavedra <[email protected]>

…urface In an un/register surface action is taken; make sure that surfaces other than the new active one do not have callbacks pending to be dispatched. Co-authored-by: Adrian Perez de Castro <[email protected]> Acked-by: Adrian Perez de Castro <[email protected]> Signed-off-by: Pablo Saavedra <[email protected]>

The ~ViewBackend sets now the m_backend to NULL and the ViewBackend::dispatchFrameCallbacks() checks if m_backend is not NULL before call wpe_view_backend_dispatch_frame_displayed(). Acked-by: Adrian Perez de Castro <[email protected]> Signed-off-by: Pablo Saavedra <[email protected]>

This patch address the following scenario: * View 1: Opens a non-blocked page (surface1) (domain1/pageA) * View 1: Opens a blocked page (surface2) (domain2/pageA) * That add a new bridge Id in the bridgeIds * Frames related to the surface1 arrives late (after the new surface is registered). There is not a dispatch notification because the `bridgeIds.back()`, at that moment, is associated to the `surface2`. Therefore `wpe_view_backend_dispatch_frame_displayed(m_backend);` is inhibited * Destroy surface2 because the content is filtered * View 2: Opens a non-blocked page (surface3) (domain3/pageC) * View 2: Closed * Destroy ViewBackend * Destroy Surface3 * View 1: Open a non-blocked page in the same domain that the related to the surface1 (domain1/pageB) * Issue: New frames will not be generated because the previous `wpe_view_backend_dispatch_frame_displayed` was not called for related WebProcess backend (1) Signed-off-by: Pablo Saavedra <[email protected]>

... instead of do it throught the ViewBackend::unregisterSurface(). During destruction we only want to remove the surface-viewbackend association but we do not want to call dispatchFrameCallbacks because the nested compositor client is dying Acked-by: Pablo Saavedra <[email protected]>

jameshilliard · 2021-07-23T18:23:33Z

So I tried applying this on top of 1.10.0 to see if it would have any effect on #166. I got the following error.

(cog:398): Cog-Core-WARNING **: 18:20:23.914: <app:///> Crash!: The renderer process crashed. Reloading the page may fix intermittent failures.
--398-- REDIR: 0xb21d1d0 (libstdc++.so.6:operator delete(void*)) redirected to 0x484f064 (operator delete(void*))
==398== Invalid read of size 4
==398==    at 0xB158D68: WS::Instance::unregisterSurface(WS::Surface*) (ws.cpp:551)
==398==    by 0xB158DBB: operator() (ws.cpp:157)
==398==    by 0xB158DBB: WS::s_compositorInterface::{lambda(wl_client*, wl_resource*, unsigned int)#1}::operator()(wl_client, wl_resource, unsigned int) const::{lambda(wl_resource)#1}::_FUN(wl_resource) (ws.cpp:159)
==398==    by 0xB944F83: destroy_resource (wayland-server.c:724)
==398==    by 0xB94916B: for_each_helper (wayland-util.c:372)
==398==    by 0xB9496BF: wl_map_for_each (wayland-util.c:385)
==398==    by 0xB945107: wl_client_destroy (wayland-server.c:883)
==398==    by 0xB9451DB: wl_client_connection_data (wayland-server.c:337)
==398==    by 0xB947033: wl_event_loop_dispatch (event-loop.c:1027)
==398==    by 0xB157CDF: operator() (ws.cpp:75)
==398==    by 0xB157CDF: WS::ServerSource::{lambda(_GSource*, int (*)(void*), void*)#3}::_FUN(_GSource*, int (*)(void*), void*) (ws.cpp:84)
==398==    by 0x80F91A3: g_main_dispatch (gmain.c:3337)
==398==    by 0x80F91A3: g_main_context_dispatch (gmain.c:4055)
==398==    by 0x80F9403: g_main_context_iterate.constprop.0 (gmain.c:4131)
==398==    by 0x80F94EB: g_main_context_iteration (gmain.c:4196)
==398==  Address 0x50810ea8 is 8 bytes inside a block of size 24 free'd
==398==    at 0x484F0E8: operator delete(void*) (vg_replace_malloc.c:802)
==398==    by 0xB159103: deallocate (new_allocator.h:133)
==398==    by 0xB159103: deallocate (alloc_traits.h:492)
==398==    by 0xB159103: _M_deallocate_node_ptr (hashtable_policy.h:2064)
==398==    by 0xB159103: _M_deallocate_node (hashtable_policy.h:2054)
==398==    by 0xB159103: _M_erase (hashtable.h:1888)
==398==    by 0xB159103: std::_Hashtable<unsigned int, std::pair<unsigned int const, WS::Surface*>, std::allocator<std::pair<unsigned int const, WS::Surface*> >, std::__detail::_Select1st, std::equal_to<unsigned int>, std::hash<unsigned int>, std::__detail::_Mod_range_hashing, std::__detail::_Default_ranged_hash, std::__detail::_Prime_rehash_policy, std::__detail::_Hashtable_traits<false, false, true> >::erase(std::__detail::_Node_const_iterator<std::pair<unsigned int const, WS::Surface*>, false, false>) (hashtable.h:1863)
==398==    by 0xB158D83: erase (hashtable.h:807)
==398==    by 0xB158D83: erase (unordered_map.h:797)
==398==    by 0xB158D83: WS::Instance::unregisterSurface(WS::Surface*) (ws.cpp:549)
==398==    by 0xB158DBB: operator() (ws.cpp:157)
==398==    by 0xB158DBB: WS::s_compositorInterface::{lambda(wl_client*, wl_resource*, unsigned int)#1}::operator()(wl_client, wl_resource, unsigned int) const::{lambda(wl_resource)#1}::_FUN(wl_resource) (ws.cpp:159)
==398==    by 0xB944F83: destroy_resource (wayland-server.c:724)
==398==    by 0xB94916B: for_each_helper (wayland-util.c:372)
==398==    by 0xB9496BF: wl_map_for_each (wayland-util.c:385)
==398==    by 0xB945107: wl_client_destroy (wayland-server.c:883)
==398==    by 0xB9451DB: wl_client_connection_data (wayland-server.c:337)
==398==    by 0xB947033: wl_event_loop_dispatch (event-loop.c:1027)
==398==    by 0xB157CDF: operator() (ws.cpp:75)
==398==    by 0xB157CDF: WS::ServerSource::{lambda(_GSource*, int (*)(void*), void*)#3}::_FUN(_GSource*, int (*)(void*), void*) (ws.cpp:84)
==398==    by 0x80F91A3: g_main_dispatch (gmain.c:3337)
==398==    by 0x80F91A3: g_main_context_dispatch (gmain.c:4055)
==398==  Block was alloc'd at
==398==    at 0x484CD80: operator new(unsigned long) (vg_replace_malloc.c:417)
==398==    by 0xB1581FF: allocate (new_allocator.h:115)
==398==    by 0xB1581FF: allocate (alloc_traits.h:460)
==398==    by 0xB1581FF: _M_allocate_node<std::pair<unsigned int const, WS::Surface*> > (hashtable_policy.h:2032)
==398==    by 0xB1581FF: _Scoped_node<std::pair<unsigned int const, WS::Surface*> > (hashtable.h:272)
==398==    by 0xB1581FF: _M_emplace<std::pair<unsigned int const, WS::Surface*> > (hashtable.h:1673)
==398==    by 0xB1581FF: insert<std::pair<unsigned int const, WS::Surface*> > (hashtable_policy.h:1021)
==398==    by 0xB1581FF: insert (unordered_map.h:586)
==398==    by 0xB1581FF: WS::Instance::registerSurface(unsigned int, WS::Surface*) (ws.cpp:407)
==398==    by 0xB4CC7A7: ffi_call_SYSV (sysv.S:114)
==398==    by 0xB4CBFCF: ffi_call_int (ffi.c:747)
==398==    by 0xB9482C3: wl_closure_invoke (connection.c:1018)
==398==    by 0xB945443: wl_client_connection_data (wayland-server.c:432)
==398==    by 0xB947033: wl_event_loop_dispatch (event-loop.c:1027)
==398==    by 0xB157CDF: operator() (ws.cpp:75)
==398==    by 0xB157CDF: WS::ServerSource::{lambda(_GSource*, int (*)(void*), void*)#3}::_FUN(_GSource*, int (*)(void*), void*) (ws.cpp:84)
==398==    by 0x80F91A3: g_main_dispatch (gmain.c:3337)
==398==    by 0x80F91A3: g_main_context_dispatch (gmain.c:4055)
==398==    by 0x80F9403: g_main_context_iterate.constprop.0 (gmain.c:4131)
==398==    by 0x80F94EB: g_main_context_iteration (gmain.c:4196)
==398==    by 0x7F5AFEF: g_application_run (gapplication.c:2560)
==398== 

(cog:398): WPE-FDO-WARNING **: 18:20:25.015: Instance::dispatchFrameCallbacks(): Cannot find surface with bridgeId 1 in view backend map. Probably the associated surface is gone due to a premature exit in the client side

aperezdc and others added 6 commits May 5, 2021 01:48

psaavedra requested a review from aperezdc May 7, 2021 12:09

Base automatically changed from aperezdc/bridgeid-list to master May 12, 2021 15:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Force ViewBackend dispatch when the bridge connection is lost #161

Force ViewBackend dispatch when the bridge connection is lost #161

psaavedra commented May 7, 2021

jameshilliard commented Jul 23, 2021

Force ViewBackend dispatch when the bridge connection is lost #161

Are you sure you want to change the base?

Force ViewBackend dispatch when the bridge connection is lost #161

Conversation

psaavedra commented May 7, 2021

jameshilliard commented Jul 23, 2021