Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Command sync -r fails for file with filetype <unicode> on perforce #41

Open
xun-eric-zhang opened this issue Apr 8, 2024 · 3 comments

Comments

@xun-eric-zhang
Copy link

Hi, thanks for the awesome work on p4vfs!
I would like to report an issue I had when testing out p4vfs. I found out an solution, but wanted to check if there is a better fix, or if I missed something for my setup.

  • Background
    Perforce server is in Unicode mode, Client P4CHARSET is set to utf8.

  • Issue
    If a file unicode_test.txt is of unicode filetype on perforce, then following command would fail.
    p4vfs.exe sync -r -f //depot//unicode_test.txt
    And gives the error: - Translation of file content failed near line ...
    The same command in p4 would succeed:
    p4 sync -f //depot//unicode_test.txt
    The same p4vfs command in local mode would also succeed:
    p4vfs.exe sync -r -f -t //depot//unicode_test.txt
    Just in case I attached the file for my test there: UnicodeTest.txt

  • Investigation
    By outputing m_P4->m_ClientApi->GetCharset().Text() in FDepotClient::Connect(const DepotConfig& config), I found the issue.
    In p4vfs.exe, the FDepotClient is properly picking up P4CHARSET settings and is set to utf8. However in P4VFS.Service.exe, it does not pick up P4CHARSET, and charset is set to auto. I'm not sure why, but this is the reason we have translation problem using sync command in service mode, but not in local mode.

  • Fix
    Added the following in FDepotClient::Connect(const DepotConfig& config)

DepotString charset = GetEnvImpersonated(DepotConstants::P4CHARSET); 
if (charset.empty() == false) 
{ 
    m_P4->m_ClientApi->SetCharset(charset.c_str()); 
} 

Much appreciated if you can share any insight on this problem. Thank you.

@jessk-msft
Copy link
Collaborator

Hi Eric, thanks for your report and investigation into this problem. The Microsoft P4VFS still does not have proper support for unicode enabled perforce servers. The ignored use of P4CHARSET variable is one example of what is missing.

Your solution to have the P4VFS.Service get the P4CHARSET value from the impersonated environment (either the calling process or session user) is slightly functional:

  • Setting the P4CHARSET for unicode servers is required. Calling SetCharset will get things going.
  • If the P4CHARSET is UTF8, then the server metadata being sent (filenames, tags, etc) will work for ASCII characters (0-127) as UTF8 characters. However for characters in extended ASCII (> 127), such as CP-1252 (ANSI), or multi-byte unicode characters, we'd start to see incorrect decoding of strings. So watch out for filenames with unicode characters.

The real problem in P4VFS is that DepotString is always assumed to be multi-byte CP-1252 string. This is the default Windows encoding and is default for P4API on Windows with non-unicode servers. Instead of just a typedef to ANSI string, I always imagined that the DepotString could be smart wrapper for UTF8 or ANSI string. It could decode to UTF16 (wchar_t) when needed.

I'd be really curious to know how popular this feature would be.
Recent work in P4VFS.UnitTest to support duplicating the test server with different settings (ie, unicode enabled) has put us in a great position to properly test support unicode enabled features.

Regards
-Jess

@xun-eric-zhang
Copy link
Author

Hi Jess,

Thank you so much for the detailed explanation!
Indeed I got hit by the problem when filename contains non-ASCII characters.

Regarding the popularity of this feature, I'll speak of my experience.
Since our team is based in Japan, it's pretty common to have Perforce server setup in unicode mode. This probably would apply to other dev teams that has a working language with non-ASCII characters.
When I'm testing with UE5 source, I got around 2700 files assigned as unicode type by perforce.

It would be fantastic if there is plan for P4VFS to support unicode Perforce server. I'm sure a broad audience would appreciate that :)

Best,
Eric

@hulucc
Copy link

hulucc commented Dec 30, 2024

@jessk-msft I figured out a workaround to make p4vfs 1.28.2.0 work with unicode enabled perforce server without filename and file content encoding problem.

  1. Add app.manifest below to P4VFS.Console and P4VFS.Service, which will make any AString in p4vfs take as utf8 string, also make p4api works in utf8 mode.
<assembly manifestVersion="1.0" xmlns="urn:schemas-microsoft-com:asm.v1">
  <assemblyIdentity type="win32" name="..." version="6.0.0.0"/>
  <application>
    <windowsSettings>
      <activeCodePage xmlns="http://schemas.microsoft.com/SMI/2019/WindowsSettings">UTF-8</activeCodePage>
    </windowsSettings>
  </application>
</assembly>
  1. Patch source/P4VFS.Core/Source/DepotClient.cpp SetOutputEncoding(const DepotString& fileType), which disable any p4vfs side charset convertion when filetype is unicode
		CharSetApi::CharSet charSet = CharSetApi::CSLOOKUP_ERROR;
		if (fileType.empty() == false)
		{
			if (strstr(fileType.c_str(), "text"))
				charSet = CharSetApi::NOCONV;
			else if (strstr(fileType.c_str(), "utf8"))
				charSet = CharSetApi::UTF_8;
			else if (strstr(fileType.c_str(), "utf16"))
				charSet = CharSetApi::UTF_16_LE;
			else if (strstr(fileType.c_str(), "unicode"))
				// charSet = CharSetApi::UTF_16_LE;
				charSet = CharSetApi::NOCONV;
		}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants