-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
not save as utf8, UnicodeDecodeError('utf-8', #9
Comments
Hello, @Yensan. Can you post a gist to an example notebook along with the command you used to reproduce it? Thanks! |
@jbn |
Sorry for the delay, @Yensan! I was unable to replicate this. Are you on windows? I think the default encoding for command line is not unicode for windows, so when you pipe output it's going to give a problem. Try doing, nbmerge file_1.ipynb file_2.ipynb file_3.ipynb -o _merged.ipynb instead to skip piping. If not, let me know and I'll go back to debugging. |
@jbn |
Hi @Yensan. I read up a bit on the problem and would like to fix it. Any chance I could get you to run this script: https://gist.github.com/jbn/6b87f180cff5dae4b6554ef58ba26c6f in the directory with your notebooks, replacing "./YOUR_NOTEBOOK_FILE.ipynb" with your notebook name. If you copy and paste the output, it should be a relatively easy fix. Thanks if you can :) |
(⊙o⊙) oh! Sorry I can't open https://gist.github.com/ in my net.... Because 'Greate wall' issue 😄 |
import sys, locale
exprs = """
locale.getpreferredencoding()
type(fp)
fp.encoding
sys.stdout.isatty()
sys.stdout.encoding
sys.stdin.isatty()
sys.stdin.encoding
sys.stderr.isatty()
sys.stderr.encoding
sys.getdefaultencoding()
sys.getfilesystemencoding()
"""
with open("./YOUR_NOTEBOOK_FILE.ipynb", "r") as fp:
for expr in exprs.strip().split():
print(expr.rjust(30), eval(expr)) Can't help with the ctypes issue. Never really use that code. |
I am so sorry to reply so late, because my career is so tortuous. (If any remote job will be grateful) This .ipynb file is edited in Windows and Mac, then I run your script in Windows 10 pro(Chinese-simpfied), Although Win10 is a virtual machine, but never mind, the result is the same. Python 3.5.2 (v3.5.2:4def2a2901a5, Jun 25 2016, 22:18:55) [MSC v.1900 64 bit (AMD64)] on win32 Windows: C:\Users\aC>systeminfo
主机名: C53
OS 名称: Microsoft Windows 10 专业版
OS 版本: 10.0.17763 暂缺 Build 17763
OS 制造商: Microsoft Corporation
OS 配置: 独立工作站
OS 构件类型: Multiprocessor Free
初始安装日期: 2019/1/6, 14:03:29
系统启动时间: 2019/1/11, 0:28:07
系统类型: x64-based PC
处理器: 安装了 1 个处理器。
[01]: Intel64 Family 6 Model 61 Stepping 4 GenuineIntel ~1600 Mhz
BIOS 版本: Parallels Software International Inc. 14.0.1 (45154), 2018/9/7
系统区域设置: zh-cn;中文(中国)
输入法区域设置: en-us;英语(美国) Your script output: locale.getpreferredencoding() cp936
type(fp) <class '_io.TextIOWrapper'>
fp.encoding cp936
sys.stdout.isatty() True
sys.stdout.encoding cp936
sys.stdin.isatty() True
sys.stdin.encoding cp936
sys.stderr.isatty() True
sys.stderr.encoding cp936
sys.getdefaultencoding() utf-8
sys.getfilesystemencoding() mbcs |
Hello @jbn, !nbmerge 1.ipynb 2.ipynb 3.ipynb > merged.ipynb Thx a lot!! Best,
|
To clarify, is this issue only on Windows, and not Unix (Linux or Mac OS)? EDIT: I just ran this on Ubuntu Bionic (copy-pasted Chinese characters into two notebooks), e.g.
and ran into new issues whatsoever. So I think it could be helpful to label this issue as being specific to Windows only (to avoid unnecessarily freaking out/turning off people who aren't running this with Windows). This is a great package by the way! Elegant solution to a recurring problem. |
if there are some character beyond ASCII, it do not save as utf-8.
for example, Chinese in *.ipynb, it is saved as GBK actully. So cause
UnicodeDecodeError('utf-8',
The text was updated successfully, but these errors were encountered: