Replies: 7 comments
-
What is the shape of the data? How many columns? What data types? How many records does this predicate filter out: Does the file contain any records that span a line, quoted fields that contain newline characters? Technically, it isn't possible to start parsing a CSV file from the middle of a file, due to the rules around quoted fields it isn't possible to know the parse state. A new line character might indicate the end of a record. However, while it isn't possible for all data sets, it might be possible for your dataset assuming it doesn't have any quoted fields with newlines. If that's the case, then you should be able to seek the stream to the middle of the file, locate the next newline, and start parsing from there. Having said that, if the data never changes, I would probably apply the filter and sorting to the data and store it in a format that won't require parsing. |
Beta Was this translation helpful? Give feedback.
-
Csv has 3 columns input, output and finalValue, all columns are of integer data type. The condition filterValue > 0 removes around 10k records. Having said that, if the data never changes, I would probably apply the filter and sorting to the data and store it in a format that won't require parsing - can you please give an example how that can be achieved? |
Beta Was this translation helpful? Give feedback.
-
You can use System.IO.BinaryReader/Writer: // pre-process
using var csv = CsvDataReader.Create(fileName, options);
result = csv.GetRecords().Where(x => x.finalValue > 0).ToArray();
result = result.OrderBy(x => x.input).ToArray();
using var oStream = File.Create("myData.bin");
using var bw = new BinaryWriter(oStream);
// write the number of records first.
bw.Write(result.Length);
foreach(Var record in result){
bw.Write(record.input);
bw.Write(record.output);
bw.Write(record.finalValue);
} That will give you a file, "myData.bin" that contains the filtered/sorted data as packed binary integers. var iStream = File.OpenRead("myData.bin");
var br = new BinaryReader(iStream);
var count = br.ReadInt32();
var records = new Record[count];
for(int i = 0; i < count; i++){
var record = new Record();
record.input = br.ReadInt32();
record.output = br.ReadInt32();
record.finalValue = br.ReadInt32();
} This will almost certainly be quite a bit faster than using the CSV. |
Beta Was this translation helpful? Give feedback.
-
Thanks for the help here. While I try to run the above code the first 3 lines i.e are taking 15 seconds in Raspberry Pi, is there a way just this load can be optimized? The post load process is working quite fast for me. |
Beta Was this translation helpful? Give feedback.
-
The idea is that you only have to do that once as an external preprocessing step, since the data doesn't change. Then your main process only deals with the .bin file from then on, and the csv is never used again. The optimization is that you eschew the need for CSV in your main process altogether. Maybe I misunderstood your requirements. |
Beta Was this translation helpful? Give feedback.
-
My intention is to reduce the processing time while I am reading the excel and converting it to strongly typed data. The below series of steps takes around 20 seconds to execute, is there a possibility I could reduce this time? // pre-process I noticed that if I don't do .ToArray() and process 1 record at a time, this fastens the process but I also have a requirement to sort the excel data before processing. |
Beta Was this translation helpful? Give feedback.
-
If the records never change, there is no reason to optimize that step, as you only need to run it once. |
Beta Was this translation helpful? Give feedback.
-
Hi, I have a scenario wherein I am fetching more than 50 million records, the fetching takes good amount 15+ seconds time. The records would always remain same so I was thinking if
Is there a feasibility to run data reader for this scenario? If not can anyone suggest how the processing time can be reduced. I am using below code to fetch data
using var csv = CsvDataReader.Create(fileName, options);
result = csv.GetRecords().Where(x => x.finalValue > 0).ToArray();
result = result.OrderBy(x => x.input).ToArray();
Beta Was this translation helpful? Give feedback.
All reactions