LineReader stops reading when it hits a character like "É" or "ñ" #5

pkamb · 2011-09-14T17:47:22Z

So you have a textfile such as:

diner
restaurant
lunch-spot
greasy spoon
café // "é" character
coffee shop
cafeteria

LineReader stops reading when it hits the "café" line above. Never gets to "coffee shop".

johnjohndoe · 2011-09-15T09:46:46Z

Maybe the file is not encoded using UTF-8? I use NSUTF8StringEncoding in the FileReader. See (NSString*)readLine in line 72. Maybe you can find a way to discover the encoding type of the file before you start reading its content. You are welcome to fork the project.

ZuzooVn · 2013-02-13T08:05:36Z

Hi, i still have this problems

johnjohndoe · 2013-02-13T08:08:19Z

Have you verified which character encoding is used by the file you are trying to read?

ZuzooVn · 2013-02-13T08:22:58Z

Hi, it's Unicode (UTF-8)

johnjohndoe · 2013-02-13T16:00:12Z

Could you can upload a zipped sample somewhere? Then I will find the time to take a look at it in a few days.

ZuzooVn · 2013-02-13T16:53:14Z

I think you can create new document with some character like í, é, ñ ..... Or i will update some sample data

johnjohndoe · 2013-02-13T19:11:37Z

I think you should really upload an example file somewhere. I can write an ñ both into an ASCII or UTF-8 encoded file.
You can also find out yourself about the character encoding used in the file with an editor. If you are using Windows I recommend Notepad++. On MacOSX or Linux run the following command in a shell: $ file filename.

ZuzooVn · 2013-02-14T05:40:08Z

This is file's info: Non-ISO extended-ASCII English text, with very long lines, with CRLF line terminators.

This is the file: http://www.mediafire.com/?1cwr4if28w504md

It have "î" character

johnjohndoe · 2013-02-15T16:21:31Z

Agreed. As I suspected the file is not encoded as UTF-8.

I converted the file to UTF-8 using Notepad++ (options are visible in the menu) so you can try again with this file.

Bagging-and-Transporting-Koi-wps-utf8-txt.html

ZuzooVn · 2013-02-16T09:04:29Z

Maybe we must automatically convert all file to UTF-8 before start reading its content

johnjohndoe · 2013-02-16T20:41:03Z

I suggest that you look for a way to recognize the character encoding in front. Feel free to add it to the LineReader.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

LineReader stops reading when it hits a character like "É" or "ñ" #5

LineReader stops reading when it hits a character like "É" or "ñ" #5

pkamb commented Sep 14, 2011

johnjohndoe commented Sep 15, 2011

ZuzooVn commented Feb 13, 2013

johnjohndoe commented Feb 13, 2013

ZuzooVn commented Feb 13, 2013

johnjohndoe commented Feb 13, 2013

ZuzooVn commented Feb 13, 2013

johnjohndoe commented Feb 13, 2013

ZuzooVn commented Feb 14, 2013

johnjohndoe commented Feb 15, 2013

ZuzooVn commented Feb 16, 2013

johnjohndoe commented Feb 16, 2013

LineReader stops reading when it hits a character like "É" or "ñ" #5

LineReader stops reading when it hits a character like "É" or "ñ" #5

Comments

pkamb commented Sep 14, 2011

johnjohndoe commented Sep 15, 2011

ZuzooVn commented Feb 13, 2013

johnjohndoe commented Feb 13, 2013

ZuzooVn commented Feb 13, 2013

johnjohndoe commented Feb 13, 2013

ZuzooVn commented Feb 13, 2013

johnjohndoe commented Feb 13, 2013

ZuzooVn commented Feb 14, 2013

johnjohndoe commented Feb 15, 2013

ZuzooVn commented Feb 16, 2013

johnjohndoe commented Feb 16, 2013