Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

how to set the charset ? some results have been encoded by GBK ,and chinese characters are garbled #30

Open
DitieNoder opened this issue May 19, 2022 · 8 comments

Comments

@DitieNoder
Copy link

DitieNoder commented May 19, 2022

chinese characters of the records appear as black squares . i tried transform them from gbk to utf-8 , but not works.
google says that : In nodejs , string that has been encoded in utf8 cannot be converted back to GBK .

1、I find this doc https://software.indexdata.com/yaz/doc/zoom.html ,it say The charset is used in the Content-Type header of HTTP requests.
image

2、The following settings were attempted but not valid
.set('charset',' iso8859') //ASCII utf8
.set('lang','56')
.set('Content-Type','text/html;charset:utf-8')
.set('Content-Language','zh-CN')

please have a look .
the record.raw from zoom.connection('z39.91marc.cn:2100/uc_bib') and used user/password .
image

3、the record.raw from this server zoom.connection('192.83.186.170:210/INNOPAC') is natural .
image

4、when .set('preferredRecordSyntax', 'cnmarc') ,the module warns this , could the module support 'cnmarc' ?
HN05E2P%9JL{2E1V9ATM)MG

@dengelke
Copy link
Owner

Thanks @DitieNoder for another detailed example.

Can you please provide me a working code example maybe using an open server somewhere so I can replicate on my local machine, as it is a bit tricky to investigate otherwise?

@DitieNoder
Copy link
Author

DitieNoder commented May 19, 2022

Thanks @DitieNoder for another detailed example.

var zoom = require('node-zoom2');
zoom.connection('www.yuntsg.net:2100/calis')
.set('user', 'xxxxx')
.set('password', 'xxxxx')
.set('preferredRecordSyntax', 'cnmarc')
.query('prefix', '@attr 1=7 ' + '9787508690254')
.search(function (err, resultset) {
resultset && resultset.getRecords(0, resultset.size, function (err, records) {
console.log(resultset.size)
while (records && records.hasNext()) {
var record = records.next();
console.log(record.raw);
}
});
});

some database of GB2312
国图库(GB2312): cbook
CALIS库(GB2312): calis
CIP库(GB2312):CIP

some database of utf8
支持utf8的库
国图库(UTF-8): cbook_utf8
CALIS库(UTF-8): calis_utf8
CIP库(UTF-8):CIP_utf8

@dengelke
Copy link
Owner

Using the zoomsh tool from the yaz repository I can replicate the exact same behaviour as your test example in node.js which seems to indicate the error is somewhere in the underlying C library this node.js library relies on.

zoomsh
set user aizhi
set password 579012
set preferredRecordSyntax cnmarc
connect www.yuntsg.net:2100/calis
search @attr 1=7 9787508690254
show 0 2

Would suggest you raise a ticket there, they might be able to assist or explain the configuration required to get the characters to display correctly.

@adamdickmeiss
Copy link

  1. cnmarc is not a record syntax and the value will be ignored. This target seems to return UNIMARC records.
  2. ZOOM YAZ can do conversion for each record .. so that's client-side (not server side). https://software.indexdata.com/yaz/doc/zoom.records.html

This example should illustrate this:

zoomsh
set user aizhi
set password 579012
set preferredRecordSyntax unimarc
connect www.yuntsg.net:2100/calis
search @attr 1=7 9787508690254
show 0 1 render;charset=gb2312
show 0 1 xml;charset=gb2312
show 0 1 json;charset=gb2312

@dengelke
Copy link
Owner

dengelke commented May 28, 2022

Thanks @adamdickmeiss! I now have a working code example for node-zoom2.

zoom.connection('www.yuntsg.net:2100/calis')
  .set('user', 'aizhi')
  .set('password', '579012')
  .set('preferredRecordSyntax', 'unimarc')
  .query('prefix', '@attr 1=7 ' + '9787508690254')
  .search(function (err, resultset) {
    resultset.getRecords(0, resultset.size, function (err, records) {
      while (records && records.hasNext()) {
        var record = records.next();
        console.log(record.render); // this is garbled
        console.log(record.get('render;charset=gb2312')) // this renders correctly
        console.log(record.xml); // this is garbled
        console.log(record.get('xml;charset=gb2312')) // this renders correctly
        console.log(record.json); // this is garbled
        console.log(record.get('json;charset=gb2312')) // this renders correctly
      }
    });
  });

@DitieNoder
Copy link
Author

DitieNoder commented May 28, 2022

Thanks @adamdickmeiss! I now have a working code example for node-zoom2.

zoom.connection('www.yuntsg.net:2100/calis')
  .set('user', 'aizhi')
  .set('password', '579012')
  .set('preferredRecordSyntax', 'unimarc')
  .query('prefix', '@attr 1=7 ' + '9787508690254')
  .search(function (err, resultset) {
    resultset.getRecords(0, resultset.size, function (err, records) {
      while (records && records.hasNext()) {
        var record = records.next();
        console.log(record.render); // this is garbled
        console.log(record.get('render;charset=gb2312')) // this renders correctly
        console.log(record.xml); // this is garbled
        console.log(record.get('xml;charset=gb2312')) // this renders correctly
        console.log(record.json); // this is garbled
        console.log(record.get('json;charset=gb2312')) // this renders correctly
      }
    });
  });

would you please give a screenshot of the result from the new code example ? how can you get the result correctly?

I runned the code but they go garbled as usual ,and the type of " record.get('render;charset=gb2312') " is not buffer but string .
image

It get the same garbled result when using CMD to run the .js file
image

my computer is simplified Chinese win10 x64 Enterprise 21H2 ,the node version is v16.15.0

@adamdickmeiss
Copy link

Gb2312 requires iconv. Probably not working on Windows.

@DitieNoder
Copy link
Author

Gb2312 requires iconv. Probably not working on Windows.

thank you for you attention.

As far as I know,in Nodejs , if the type of result from server is buffer ,the iconv module could decode the result from gb2312 to utf8 .
but now the type of result is string, as you see in the picture ,so even iconv has been used the characters are garbled as usual .

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants