how to set the charset ? some results have been encoded by GBK ,and chinese characters are garbled #30

DitieNoder · 2022-05-19T07:57:32Z

chinese characters of the records appear as black squares . i tried transform them from gbk to utf-8 , but not works.
google says that : In nodejs , string that has been encoded in utf8 cannot be converted back to GBK .

1、I find this doc https://software.indexdata.com/yaz/doc/zoom.html ,it say The charset is used in the Content-Type header of HTTP requests.

2、The following settings were attempted but not valid
.set('charset',' iso8859') //ASCII utf8
.set('lang','56')
.set('Content-Type','text/html;charset:utf-8')
.set('Content-Language','zh-CN')

please have a look .
the record.raw from zoom.connection('z39.91marc.cn:2100/uc_bib') and used user/password .

3、the record.raw from this server zoom.connection('192.83.186.170:210/INNOPAC') is natural .

4、when .set('preferredRecordSyntax', 'cnmarc') ,the module warns this , could the module support 'cnmarc' ?

The text was updated successfully, but these errors were encountered:

dengelke · 2022-05-19T13:57:05Z

Thanks @DitieNoder for another detailed example.

Can you please provide me a working code example maybe using an open server somewhere so I can replicate on my local machine, as it is a bit tricky to investigate otherwise?

DitieNoder · 2022-05-19T14:43:00Z

Thanks @DitieNoder for another detailed example.

var zoom = require('node-zoom2');
zoom.connection('www.yuntsg.net:2100/calis')
.set('user', 'xxxxx')
.set('password', 'xxxxx')
.set('preferredRecordSyntax', 'cnmarc')
.query('prefix', '@attr 1=7 ' + '9787508690254')
.search(function (err, resultset) {
resultset && resultset.getRecords(0, resultset.size, function (err, records) {
console.log(resultset.size)
while (records && records.hasNext()) {
var record = records.next();
console.log(record.raw);
}
});
});

some database of GB2312
国图库(GB2312): cbook
CALIS库(GB2312): calis
CIP库(GB2312)：CIP

some database of utf8
支持utf8的库
国图库(UTF-8): cbook_utf8
CALIS库(UTF-8): calis_utf8
CIP库(UTF-8)：CIP_utf8

dengelke · 2022-05-20T14:00:15Z

Using the zoomsh tool from the yaz repository I can replicate the exact same behaviour as your test example in node.js which seems to indicate the error is somewhere in the underlying C library this node.js library relies on.

zoomsh
set user aizhi
set password 579012
set preferredRecordSyntax cnmarc
connect www.yuntsg.net:2100/calis
search @attr 1=7 9787508690254
show 0 2

Would suggest you raise a ticket there, they might be able to assist or explain the configuration required to get the characters to display correctly.

adamdickmeiss · 2022-05-27T16:22:05Z

cnmarc is not a record syntax and the value will be ignored. This target seems to return UNIMARC records.
ZOOM YAZ can do conversion for each record .. so that's client-side (not server side). https://software.indexdata.com/yaz/doc/zoom.records.html

This example should illustrate this:

zoomsh
set user aizhi
set password 579012
set preferredRecordSyntax unimarc
connect www.yuntsg.net:2100/calis
search @attr 1=7 9787508690254
show 0 1 render;charset=gb2312
show 0 1 xml;charset=gb2312
show 0 1 json;charset=gb2312

dengelke · 2022-05-28T03:55:30Z

Thanks @adamdickmeiss! I now have a working code example for node-zoom2.

zoom.connection('www.yuntsg.net:2100/calis')
  .set('user', 'aizhi')
  .set('password', '579012')
  .set('preferredRecordSyntax', 'unimarc')
  .query('prefix', '@attr 1=7 ' + '9787508690254')
  .search(function (err, resultset) {
    resultset.getRecords(0, resultset.size, function (err, records) {
      while (records && records.hasNext()) {
        var record = records.next();
        console.log(record.render); // this is garbled
        console.log(record.get('render;charset=gb2312')) // this renders correctly
        console.log(record.xml); // this is garbled
        console.log(record.get('xml;charset=gb2312')) // this renders correctly
        console.log(record.json); // this is garbled
        console.log(record.get('json;charset=gb2312')) // this renders correctly
      }
    });
  });

DitieNoder · 2022-05-28T14:13:56Z

Thanks @adamdickmeiss! I now have a working code example for node-zoom2.

zoom.connection('www.yuntsg.net:2100/calis')
  .set('user', 'aizhi')
  .set('password', '579012')
  .set('preferredRecordSyntax', 'unimarc')
  .query('prefix', '@attr 1=7 ' + '9787508690254')
  .search(function (err, resultset) {
    resultset.getRecords(0, resultset.size, function (err, records) {
      while (records && records.hasNext()) {
        var record = records.next();
        console.log(record.render); // this is garbled
        console.log(record.get('render;charset=gb2312')) // this renders correctly
        console.log(record.xml); // this is garbled
        console.log(record.get('xml;charset=gb2312')) // this renders correctly
        console.log(record.json); // this is garbled
        console.log(record.get('json;charset=gb2312')) // this renders correctly
      }
    });
  });

would you please give a screenshot of the result from the new code example ? how can you get the result correctly?

I runned the code but they go garbled as usual ，and the type of " record.get('render;charset=gb2312') " is not buffer but string .

It get the same garbled result when using CMD to run the .js file

my computer is simplified Chinese win10 x64 Enterprise 21H2 ,the node version is v16.15.0

adamdickmeiss · 2022-05-28T19:58:04Z

Gb2312 requires iconv. Probably not working on Windows.

DitieNoder · 2022-05-29T00:57:58Z

Gb2312 requires iconv. Probably not working on Windows.

thank you for you attention.

As far as I know，in Nodejs , if the type of result from server is buffer ,the iconv module could decode the result from gb2312 to utf8 .
but now the type of result is string, as you see in the picture ,so even iconv has been used the characters are garbled as usual .

DitieNoder mentioned this issue May 20, 2022

how to set the charset ? In nodejs, chinese characters are garbled in results whitch returned from some GB2312 servers indexdata/yaz#64

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

how to set the charset ? some results have been encoded by GBK ,and chinese characters are garbled #30

how to set the charset ? some results have been encoded by GBK ,and chinese characters are garbled #30

DitieNoder commented May 19, 2022 •

edited

Loading

dengelke commented May 19, 2022

DitieNoder commented May 19, 2022 •

edited

Loading

dengelke commented May 20, 2022

adamdickmeiss commented May 27, 2022

dengelke commented May 28, 2022 •

edited

Loading

DitieNoder commented May 28, 2022 •

edited

Loading

adamdickmeiss commented May 28, 2022

DitieNoder commented May 29, 2022

how to set the charset ? some results have been encoded by GBK ,and chinese characters are garbled #30

how to set the charset ? some results have been encoded by GBK ,and chinese characters are garbled #30

Comments

DitieNoder commented May 19, 2022 • edited Loading

dengelke commented May 19, 2022

DitieNoder commented May 19, 2022 • edited Loading

dengelke commented May 20, 2022

adamdickmeiss commented May 27, 2022

dengelke commented May 28, 2022 • edited Loading

DitieNoder commented May 28, 2022 • edited Loading

adamdickmeiss commented May 28, 2022

DitieNoder commented May 29, 2022

DitieNoder commented May 19, 2022 •

edited

Loading

DitieNoder commented May 19, 2022 •

edited

Loading

dengelke commented May 28, 2022 •

edited

Loading

DitieNoder commented May 28, 2022 •

edited

Loading