Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Object::retain crash in Sprite::setSpriteFrame(std::string_view) #2292

Open
aryamansingh2008 opened this issue Dec 27, 2024 · 23 comments
Open

Comments

@aryamansingh2008
Copy link

There are multiple crashes where the spriteFrameCache is unable to find a spriteFrame.

This only happens when android app receives onTrimMemory signal, on which I am calling Director::purgeCacheData method using mGLSurfaceView.queueEvent. This only happens in android and I am unable to repro the crash on my device.

I am calling the SpriteFrameCache::addSpriteFramesWithFile method with the plist name in which the spriteFrame belongs, right before calling Sprite::setFrameFrame. Can there be any reason why SpriteFrameCache::addSpriteFramesWithFile method would fail? Should axmol return error in such cases?

null pointer dereference: SIGSEGV  0x0000000000000008
#00 pc 0x24b9f40 libMyGame.so (ax::Object::retain() [Object.cpp:85]) 
#01 pc 0x241edbc libMyGame.so (ax::Sprite::setSpriteFrame(ax::SpriteFrame*) [Sprite.cpp:1574])
@rh101
Copy link
Contributor

rh101 commented Dec 27, 2024

Can you please show an example of the code? Specifically, which version of SpriteFrameCache::addSpriteFramesWithFile is called, along with the code used to get the sprite frame and pass it to Sprite::setSpriteFrame().

Also, if you take a look at Sprite::setSpriteFrame(), you will see a very important comment:

void Sprite::setSpriteFrame(SpriteFrame* spriteFrame)
{
    // retain the sprite frame
    // do not removed by SpriteFrameCache::removeUnusedSpriteFrames
    if (_spriteFrame != spriteFrame)
    {
        AX_SAFE_RELEASE(_spriteFrame);
        _spriteFrame = spriteFrame;
        spriteFrame->retain();
    }
...
}

It explicitly states: do not removed by SpriteFrameCache::removeUnusedSpriteFrames

The problem is the call to Director::purgeCacheData, which does the following:

void Director::purgeCachedData()
{
    FontFNT::purgeCachedData();
    FontAtlasCache::purgeCachedData();

    if (s_SharedDirector->getGLView())
    {
        SpriteFrameCache::getInstance()->removeUnusedSpriteFrames();  ////<<<<<< Here is the issue
        _textureCache->removeUnusedTextures();

        // Note: some tests such as ActionsTest are leaking refcounted textures
        // There should be no test textures left in the cache
        AXLOGD("{}\n", _textureCache->getCachedTextureInfo());
    }
    FileUtils::getInstance()->purgeCachedEntries();
}

If there are references to any of the frames from a texture atlas, then removeUnusedSpriteFrames should never be used, since it's pointless, as the texture would not be freed, so you're not actually saving memory at all.

@aryamansingh2008
Copy link
Author

void setAssets() {
	SpriteFrameCache::getInstance()->addSpriteFramesWithFile("game.plist");

	spriteNode->setSpriteFrame("image1.png");
}

From my understanding, whenever setSpriteFrame(SpriteFrame* spriteFrame) is called, it will retain the spriteFrame and the director's purgeCachedData won't remove it as it is not unused. Even if, it is removed, the next time I call setAssets() method, addSpriteFramesWithFile should add it back and spriteNode->setSpriteFrame("image1.png"); should be safe to call.

@aryamansingh2008
Copy link
Author

If there are references to any of the frames from a texture atlas, then removeUnusedSpriteFrames should never be used, since it's pointless, as the texture would not be freed, so you're not actually saving memory at all.

Yeah, that's true. But isn't it necessary to be called for _textureCache->removeUnusedTextures(); to even work. Since, sprite frames retain textures.

@rh101
Copy link
Contributor

rh101 commented Dec 27, 2024

From my understanding, whenever setSpriteFrame(SpriteFrame* spriteFrame) is called, it will retain the spriteFrame and the director's purgeCachedData won't remove it as it is not unused. Even if, it is removed, the next time I call setAssets() method, addSpriteFramesWithFile should add it back and spriteNode->setSpriteFrame("image1.png"); should be safe to call.

That would be the obvious way it should work, but it's clearly not, and now I'm actually curious why this is so. The comment in the code, do not removed by SpriteFrameCache::removeUnusedSpriteFrames, is from Cocos2d-x. and not a change made in Axmol, so it's been in there for a long time.

@rh101
Copy link
Contributor

rh101 commented Dec 27, 2024

@aryamansingh2008 I cannot reproduce this issue at all. The only way I could get it to fail is if an invalid frame name is passed to setSpriteFrame, one that doesn't exist in the PLIST.

The test code used is as follows:

SpriteFrameCache::getInstance()->addSpriteFramesWithFile("grossini.plist");
auto sprite = Sprite::createWithSpriteFrameName("grossini_dance_01.png");
sprite->setPositionNormalized(Vec2(0.5f, 0.5f));
addChild(sprite);

scheduleOnce([this](float) {
    Director::getInstance()->purgeCachedData(); // Purge unused frames

    scheduleOnce([this](float) {
            SpriteFrameCache::getInstance()->addSpriteFramesWithFile("grossini.plist"); // Reload atlas

            auto frame2Sprite = Sprite::createWithSpriteFrameName("grossini_dance_02.png"); // create sprite from different frame
            addChild(frame2Sprite);

        }, 1.f, "c2");

}, 1.0f, "c1");

The scheduled callbacks with 1 second delay between each are just there so I could see what is happening, but other than that they are not important.

Does your code do anything else between the call to purgeCachedData and the call to addSpriteFramesWithFile?

@aryamansingh2008
Copy link
Author

Same here, I am also unable to repro this.

This is what I am doing using mGLSurfaceView.queueEvent whenever I receive low memory signal.

JNIEXPORT void Java_org_cocos2dx_cpp_AppActivity_purgeCachedData(JNIEnv* env, jobject thiz) {
	DeviceInfoInterface::getInstance()->sendCrashlyticsLog("Purge signal received");
	ax::Director::getInstance()->getScheduler()->runOnAxmolThread([] { ax::Director::getInstance()->purgeCachedData(); });
}

And the addSpriteFramesWithFile and setSpriteFrame part gets triggered from touch events like button click.

I tried to reproduce it by doing this in director drawScene

// Draw the Scene
void Director::drawScene()
{
    _renderer->beginFrame();

    // calculate "global" dt
    calculateDeltaTime();

    if (_glView)
    {
        _glView->pollEvents();
    }

    // tick before glClear: issue #533
    if (!_paused)
    {
        _eventDispatcher->dispatchEvent(_eventBeforeUpdate);
        _scheduler->update(_deltaTime);
        this->purgeCachedData();                    // purge cache data on every frame
        _eventDispatcher->dispatchEvent(_eventAfterUpdate);
        UserDefault::getInstance()->checkAndFlush();
    }
    
    //.... rest of the code

@aryamansingh2008
Copy link
Author

aryamansingh2008 commented Dec 27, 2024

That would be the obvious way it should work, but it's clearly not, and now I'm actually curious why this is so.
Why do you say it's clearly not the behaviour? I didn't get this idea from the cocos2d-x's comment.

// do not removed by SpriteFrameCache::removeUnusedSpriteFrames
I think this just means that this spriteFrame wouldn't be removed when calling removeUnusedSpriteFrames as this will be retained.

@rh101
Copy link
Contributor

rh101 commented Dec 27, 2024

That would be the obvious way it should work, but it's clearly not, and now I'm actually curious why this is so.
Why do you say it's clearly not the behaviour? I didn't get this idea from the cocos2d-x's comment.

The remark "but it's clearly not" means that the way it should work is not how it seems to be working, since it is resulting in a crash, with regards to the SpriteFrameCache::addSpriteFramesWithFile() then Sprite::setSpriteFrame() calls.

// do not removed by SpriteFrameCache::removeUnusedSpriteFrames
I think this just means that this spriteFrame wouldn't be removed when calling removeUnusedSpriteFrames as this will be retained.

My interpretation of the comment is based on the "do not", yet you are interpreting it to mean "is not". Perhaps you are right, and the comment should have been "is not removed by SpriteFrameCache::removeUnusedSpriteFrames".

@rh101
Copy link
Contributor

rh101 commented Dec 27, 2024

In another post you mentioned that you have modified the engine in some way. Are you able to compile your application with an unmodified version of Axmol and release that?

Also, is there a specific reason to modify the engine rather than add that functionality in the project? Modifying the engine means you need to be 100% certain that those modifications are not the cause of the issues, and just to clarify, I'm not stating that they are. When posting an issue here, we cannot really help unless we are testing with the exact same engine code.

An example of this was the ANR issue you posted, where the code was different to what is in Axmol (the setPreserveEGLContextOnPause). It just ends up wasting a lot of time, so less likely for anyone to really be able to help resolve the issue.

@aryamansingh2008
Copy link
Author

aryamansingh2008 commented Jan 3, 2025

My team has been using cocos 3.14 version for a long time and had added many custom functionalities on top of that like using graphemeCluster to calculate visible string length in Input Text Field, caching writeable path in android to avoid JNI calls, adding custom writeable path in macOS so that it doesn't use Documents folder, handle scale of RichImageElement in Rich Text, caching csb files in memory to avoid disk reads, UserDefault disk writes changed to per frame instead of per instruction for USER_DEFAULT_PLAIN_MODE, adding isSpriteFramesWithFileLoaded in SpriteFrameCache's addSpriteFramesWithFile implementation to again avoid disk reads, and so on. These custom changes were mostly around improving performance of the engine and might not necessarily apply for all.

Said that, we have planned to move these custom changes out of the engine which we can and to raise PRs for the others to get those functionalities added in the axmol project.

@rh101
Copy link
Contributor

rh101 commented Jan 3, 2025

Said that, we have planned to move these custom changes out of the engine which we can and to raise PRs for the others to get those functionalities added in the axmol project.

That would be good.

Any luck tracking down the source of the crash?

@aryamansingh2008
Copy link
Author

No, but the code causing us problem is this

void setAssets() {
	if (!SpriteFrameCache::getInstance()->isSpriteFramesWithFileLoaded("game.plist")) {
		SpriteFrameCache::getInstance()->addSpriteFramesWithFile("game.plist");
	}

	spriteNode->setSpriteFrame("image1.png");
}

I have added logs to narrow down the root cause of this. My assumption is that either there is a bug in the SpriteSheet full state or there is a bug in addSpriteFramesWithDictionary flow which is ignoring few sprite frames. Will update here once the logs are released in production. (I am unable to repro the issue)

@rh101
Copy link
Contributor

rh101 commented Jan 3, 2025

Any chance you could change the code to this equivalent code, so you will prevent the crash and log the issue:

auto frame = SpriteFrameCache::getInstance()->findFrame("image1.png");
if (frame)
{
    spriteNode->setSpriteFrame(frame);
}
else
{
    // Log error here and handle the error condition (such as wipe entire sheet/reload it etc.)
}

At least then you would know that SpriteFrameCache::getInstance()->isSpriteFramesWithFileLoaded("game.plist") returned true yet the frame did not exist, which may indicate that there may be a problem in the SpriteFrameCache or the PlistSpriteSheetLoader.

Also, is there any chance that Director::getInstance()->purgeCachedData(); is being called from any thread other than the main thread, such as from Android java code that runs as a result of some OS event? If it is, then that would most likely be the problem, and simply fixed by running it on the axmol thread (via runOnAxmolThread in C++ or AxmolEngine.runOnGLThread(runnable) for Java code).

@aryamansingh2008
Copy link
Author

aryamansingh2008 commented Jan 3, 2025

Also, is there any chance that Director::getInstance()->purgeCachedData(); is being called from any thread other than the main thread, such as from Android java code that runs as a result of some OS event? If it is, then that would most likely be the problem, and simply fixed by running it on the axmol thread (via runOnAxmolThread in C++ or AxmolEngine.runOnGLThread(runnable) for Java code).

We are scheduling this call back using AxmolEngine.runOnGLThread(runnable) and using runOnAxmolThread as well. I know it is redundant, but that surely isn't the problem here.

At least then you would know that SpriteFrameCache::getInstance()->isSpriteFramesWithFileLoaded("game.plist") returned true yet the frame did not exist, which may indicate that there may be a problem in the SpriteFrameCache or the PlistSpriteSheetLoader.

The logs I have added are doing similar work. I am logging result of SpriteFrameCache::getInstance()->isSpriteFramesWithFileLoaded("game.plist") and SpriteFrameCache::getInstance()->getSpriteFrameByName(image1.png) both before and after the SpriteFrameCache::getInstance()->addSpriteFramesWithFile("game.plist") call. So, that should give us enough data.

@rh101
Copy link
Contributor

rh101 commented Jan 14, 2025

@aryamansingh2008 I was responding to the post you put up, but you deleted it, so I'm not sure what is going on there. Regardless of that, you mentioned this:

I further looked into why this could happen, one possible reason I could see was if there are multiple plist which have image1.png sprite frame, this could happen since _spriteFrameToSpriteSheetMap can point to a different plist and lead to unexpected behaviour. But in my case, image1.png is only present in game.plist.

That is correct. Sprite sheets must contain globally unique IDs for the frame entries in them, since they're all stored in the same list. This is mentioned in the wiki post here, including suggestions on the possible ways to organise image resources to ensure that they are unique.

@aryamansingh2008
Copy link
Author

Sorry, I needed to clarify something about the logs, the previous message didn't have full details.

SpriteFrameCache::getInstance()->isSpriteFramesWithFileLoaded("game.plist") // return true
SpriteFrameCache::getInstance()->getSpriteFrameByName("image1.png") // returns null

if (!SpriteFrameCache::getInstance()->isSpriteFramesWithFileLoaded("game.plist")) {
    !SpriteFrameCache::getInstance()->addSpriteFrameWithFiles("game.plist");
}

SpriteFrameCache::getInstance()->isSpriteFramesWithFileLoaded("game.plist") // return true
SpriteFrameCache::getInstance()->getSpriteFrameByName("image1.png") // returns null

Important to note that I didn't test the behaviour of calling addSpriteFrameWithFiles despite isSpriteFramesWithFileLoaded being true.

Also, I have checked all resources, image1.png is only present in game.plist. Can some other commonly named sprite frame cause this undefined behaviopur?

Please let me know if there could be any other reason or any hypos, it will help me to debug further.

@rh101
Copy link
Contributor

rh101 commented Jan 15, 2025

How many frames are in that sprite sheet?
Approximately how many frames in total across all sprite sheets?

Are you certain that there are no duplicate names across any of the sprite sheets?

I've tried so many different combinations of purging via Director::getInstance()->purgeCachedData(); and then reloading the sprite sheets, and I cannot reproduce this at all.

I've been trying to find any possible way that isSpriteFramesWithFileLoaded returns true when it should return false, and so far I have had no luck finding why that would ever happen.

If you get rid of the call to Director::getInstance()->purgeCachedData();, does this issue still occur? Also, just out of curiosity, how often does the application receive a onTrimMemory signal? Is it using a large amount of memory?

@aryamansingh2008
Copy link
Author

There are around 300 sprite sheets, which on average have 10 sprite frames per sprite sheet.

I haven't check for duplicates across all 3000 frames, but I have checked for the problem causing sprite frame "image1.png", it is just present in single sprite sheet (game.plist).

The application uses around 700MB RAM on high resolution devices, the purge signal is not very frequent but the crash seems to have affected around 0.5% of the users.

We are planning to add logs during purge signal to verify the link between image1.png and game.plist. The hypo is that, _spriteFrameToSpriteSheetMap["image1.png"] may not be pointing to game.plist. Any other ideas on what could be wrong?

Also, I wanted to know, why did axmol take decision of removing unused sprite frames from cache even when texture is present in the memory. These sprite frames could be used without any extra load of reading from disk. Should the removeUnusedSpriteFrames flow be like, removing sprite frames from cache only if all of the sprite sheet's frames are unused. Basically, only when the texture can be released.

@rh101
Copy link
Contributor

rh101 commented Jan 27, 2025

Also, I wanted to know, why did axmol take decision of removing unused sprite frames from cache even when texture is present in the memory.

This isn't a change in Axmol, since the code has been in there from at least Cocos2d-x v3, as you can see here: https://github.com/cocos2d/cocos2d-x/blob/e4b6a5ef8fcdc99a7ebea606312b67fe4c534b9f/cocos/base/CCDirector.cpp#L699

If you suspect that the behavior of the code has changed in some way since Cocos2d-x, then it's best to always check.

There are some changes related to the frame cache, and perhaps there is a subtle bug in there, or it could be that the changes have brought an existing bug to light which is in either the engine or the project, but it's difficult to track down without some way to reproduce it.

@rh101
Copy link
Contributor

rh101 commented Jan 27, 2025

We are planning to add logs during purge signal to verify the link between image1.png and game.plist. The hypo is that, _spriteFrameToSpriteSheetMap["image1.png"] may not be pointing to game.plist. Any other ideas on what could be wrong?

Can you please explain the theory regarding spriteFrameToSpriteSheetMap, because I'm not sure I follow how that would affect it?

Previously, you've noted that the following returns null:

SpriteFrameCache::getInstance()->getSpriteFrameByName("image1.png")

The implementation is as follows:

SpriteFrame* SpriteFrameCache::getSpriteFrameByName(std::string_view name)
{
    auto* frame = findFrame(name);
    if (!frame)
    {
        AXLOGD("axmol: SpriteFrameCache: Frame '{}' isn't found", name);
    }
    return frame;
}

and findFrame:

SpriteFrame* SpriteFrameCache::findFrame(std::string_view frame)
{
    return _spriteFrames.at(frame);
}

_spriteFrameToSpriteSheetMap is not accessed at all in the chain of calls above, so the problem is likely elsewhere. This is assuming that no strange bug exists that is affecting _spriteFrames in some way (such as a memory overwrite).

@halx99 Out of curiosity, is there any chance at all that the hash map implementations would end up with hash value collisions? I just noticed that they were changed from this:

Map<std::string, SpriteFrame*> _spriteFrames;
std::unordered_map<std::string, std::shared_ptr<SpriteSheet>> _spriteSheets;
std::unordered_map<std::string, std::shared_ptr<SpriteSheet>> _spriteFrameToSpriteSheetMap;

to this:

StringMap<SpriteFrame*> _spriteFrames;
hlookup::string_map<std::shared_ptr<SpriteSheet>> _spriteSheets;
hlookup::string_map<std::shared_ptr<SpriteSheet>> _spriteFrameToSpriteSheetMap;

I'm not familiar with the hlookup or the StringMap (which also seems to be using hlookup).

@rh101
Copy link
Contributor

rh101 commented Jan 27, 2025

The application uses around 700MB RAM on high resolution devices, the purge signal is not very frequent but the crash seems to have affected around 0.5% of the users.

Is it possible to manually trigger the purge signal from the Android OS for testing? Just so it goes through the same code path.

Also, if it's possible, make the SpriteFrameCache::hasFrame(std::string_view frame) public instead of protected, and call it to check if the frame exists in _spriteFrameToSpriteSheetMap prior to using it. If it should be in there, but it's not, then that that should be logged as an error.

bool SpriteFrameCache::hasFrame(std::string_view frame) const
{
    return _spriteFrameToSpriteSheetMap.find(frame) != _spriteFrameToSpriteSheetMap.end();
}

@aryamansingh2008
Copy link
Author

aryamansingh2008 commented Jan 28, 2025

Can you please explain the theory regarding spriteFrameToSpriteSheetMap, because I'm not sure I follow how that would affect it?

SpriteFrameCache::getInstance()->isSpriteFramesWithFileLoaded("game.plist") // return true
SpriteFrameCache::getInstance()->getSpriteFrameByName("image1.png") // returns null

if (!SpriteFrameCache::getInstance()->isSpriteFramesWithFileLoaded("game.plist")) {
    !SpriteFrameCache::getInstance()->addSpriteFrameWithFiles("game.plist");
}

SpriteFrameCache::getInstance()->isSpriteFramesWithFileLoaded("game.plist") // return true
SpriteFrameCache::getInstance()->getSpriteFrameByName("image1.png") // returns null

Since game.plist SpriteSheet info says it is full and image1.png is not present in spriteFrames map, the hypo was that at some point image1.png didn't refer to game.plist SpriteSheet in the spriteFrameToSpriteSheetMap. So, when image1.png was removed, this change didn't reflect in the game.plist SpriteSheet. By checking the code, it looks like it could only happen when spriteFrameToSpriteSheetMap["image1.png"] did not refer game.plist SpriteSheet.

Is it possible to manually trigger the purge signal from the Android OS for testing? Just so it goes through the same code path

We have tried to repro it this way, but we were not able to repro the crash.

Also, if it's possible, make the SpriteFrameCache::hasFrame(std::string_view frame) public instead of protected, and call it to check if the frame exists in _spriteFrameToSpriteSheetMap prior to using it. If it should be in there, but it's not, then that that should be logged as an error.

Yeah, this we can do. But any idea on why/when this method's behaviour could be out of sync with SpriteFrameCache::getInstance()->getSpriteFrameByName(string_view)?

@rh101
Copy link
Contributor

rh101 commented Jan 28, 2025

Since game.plist SpriteSheet info says it is full and image1.png is not present in spriteFrames map, the hypo was that at some point image1.png didn't refer to game.plist SpriteSheet in the spriteFrameToSpriteSheetMap. So, when image1.png was removed, this change didn't reflect in the game.plist SpriteSheet. By checking the code, it looks like it could only happen when spriteFrameToSpriteSheetMap["image1.png"] did not refer game.plist SpriteSheet.

Sorry, I still don't quite follow. What do you mean by "when image1.png was removed"? How is it removed? Are you able to show example code, or even pseudo code, to explain the scenario you're describing? I'm trying to understand the sequence of calls to achieve what you're describing.

If you're implying that "image1.png" belongs to a different sprite sheet, and then that sprite sheet, or frame, is removed before attempting to use "image1.png" from a sprite sheet that should still be in memory, then that brings us back to the point of requirement of having unique names across all sprite sheets, which was mentioned in a previous post. You already mentioned that "Image1.png" is unique, so this should not be the issue.

If you mean that "image1.png" is an individual file, and it is removed, then that should not affect anything, because individual image files are not added to the SpriteFrameCache::_spriteFrames list.

I've added more tests to the cpp-tests project in the attempt to create as many possible scenarios that could trigger an issue, including one that involves calling Director::purgeCachedData();, and none of them have failed in any way.

@halx99 halx99 changed the title Object::retain crash in Sprite::setFrameFrame(std::string_view) Object::retain crash in Sprite::setSpriteFrame(std::string_view) Feb 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants