-
Hello, The input file consists of two different tables and I need to read them both. (This does not adhere to the csv standard, so I assume it can't be solved using your API, and it will be needed to split the file first. But please let me know if I am mistaken). |
Beta Was this translation helpful? Give feedback.
Replies: 8 comments 12 replies
-
Without knowing more about the shape of your file, here are a couple things you can try. You can set Another option would be to read the header comment and parse the row info. You can do this manually on the TextReader before passing to Hopefully one of these options works for you. |
Beta Was this translation helpful? Give feedback.
-
You could write a Reader implementation that on the first readline reads the line with the file format spec (a list of table name + line count tuples) and then return null after all line for one table are read. The first read loop will start with the first table's header line and return null once all lines are read. The next read loop will return all rows of the second table and so on. |
Beta Was this translation helpful? Give feedback.
-
reader = reader.Skip(skip).Take(take); This bit doesn't work for me. It can't convert from CsvDataReader to DbDataReader I also tried csv = (csv.Skip(skip).Take(take).AsDbDataReader() as CsvDataReader)!; but it throws a null exception. any ideas @MarkPflug |
Beta Was this translation helpful? Give feedback.
-
var dataTable = new DataTable("test_table");
var workingPath = "C:\\Work\\2023\\Data\\MESA";
var file = "EI09 Individual U65 OEP Bonus _ 41980784_January.xls";
var ext = Path.GetExtension(file);
var fileOnly = Path.GetFileNameWithoutExtension(file);
var excel = ext.StartsWith(".xl");
var ix = 0;
var recordCount = 0;
var fieldCount = 0;
DbDataReader dbr = null;
CsvDataReader csv = null;
if (excel)
{
System.Text.Encoding.RegisterProvider(System.Text.CodePagesEncodingProvider.Instance);
var sheetName = "";
var edr = ExcelDataReader.Create(Path.Combine(workingPath, file), new ExcelDataReaderOptions() { });
do
{
sheetName = edr.WorksheetName;
using var cdw = CsvDataWriter.Create(Path.Combine(workingPath, fileOnly + "-" + sheetName + ".csv"), new CsvDataWriterOptions() { });
cdw.Write(edr);
recordCount = edr.RowCount;
fieldCount = edr.FieldCount;
} while (edr.NextResult());
file = fileOnly + "-" + sheetName + ".csv";
}
else
{
file = fileOnly + ".csv";
}
var text = File.OpenText(Path.Combine(workingPath, file)).ReadToEnd().Split('\n', '\r');
var skip = 3; // header starts on 3
var take = recordCount-4; // footer is 4 lines
//var reader = CsvDataReader.Create(textReader);
csv = CsvDataReader.Create(File.OpenText(Path.Combine(workingPath, file)), new CsvDataReaderOptions
{
Delimiter = ',',
Comment = '#',
Escape = '\"'
});
csv = (csv.Skip(skip).Take(take).AsDbDataReader() as CsvDataReader)!; // throw null exception this is a basic test I'm writing to strip out headers in XLS or CSV file for pre-processing and bulk import var result = analyzer.Analyze(csv);
var schema = result.GetSchema();
var schemaBuilder = result.GetSchemaBuilder();
var sch = schemaBuilder.Build();
var columns = schema.GetColumnSchema();
var dataSchema = new CsvSchema(schema);
var validateData = true;
var errorList = new ConcurrentBag<string>();
dataTable.Load(csv); |
Beta Was this translation helpful? Give feedback.
-
it's still not skipping from the reader. csv = CsvDataReader.Create(Path.Combine(workingPath, file), new CsvDataReaderOptions
{
Delimiter = ',',
Comment = '#',
Escape = '\"'
});
var reader = (csv.Skip(skip).Take(take));
var header = string.Join(",", reader.GetColumnSchema().Select(c => string.IsNullOrEmpty(c.ColumnName) ? $"col_{++ix}": c.ColumnName.Replace('\n', ' ').Replace('\r', ' ').Trim() ).ToArray()); // header-string for MD5_HASH reader isn't returning the first column where the actual header starts for the file. the first 3 lines are report heading info that i need to skip |
Beta Was this translation helpful? Give feedback.
-
Hi @MarkPflug -- thanks for the support! In some cases, yes they will. The main issue I'm having is that I need to be able to skip x lines at start and trim X lines from the end of files to process them correctly. I am also requiring the header be the first line so I can determine the file type by signature. Skip/Take is likely the solution here but when I call reader.GetColumnSchema(), it's returning the first record -- not skipping X lines. |
Beta Was this translation helpful? Give feedback.
-
Thanks @MarkPflug - I sent you the sample file. |
Beta Was this translation helpful? Give feedback.
-
@MarkPflug I was testing this and I noticed that given the sample file I sent you and this code: System.Text.Encoding.RegisterProvider(System.Text.CodePagesEncodingProvider.Instance);
var sheetName = "";
var edr = ExcelDataReader.Create(Path.Combine(workingPath, file), new ExcelDataReaderOptions { });
var dr = edr.Skip(3).Take(edr.RowCount - 3);
do
{
sheetName = edr.WorksheetName;
using var cdw = CsvDataWriter.Create(Path.Combine(workingPath, fileOnly + "-" + sheetName + ".csv"), new CsvDataWriterOptions()
{
});
cdw.Write(dr);
recordCount = edr.RowCount;
fieldCount = edr.FieldCount;
} while (dr.NextResult()); the resulting file contains only 1 column -- I think the reader is thrown off by the fact that there is a header that I can't seem to skip here. |
Beta Was this translation helpful? Give feedback.
Without knowing more about the shape of your file, here are a couple things you can try.
You can set
CsvDataReaderOptions.ResultSetMode = ResultSetMode.MultiResult
. This will cause the reader to consider all changes in the number of columns to be the start of a new dataset. You would callRead
in a loop, like normal, but then you would need to callNextResult
to start reading the next table. This only works if the number of columns in the tables is different though, or there is an empty line that separates them.Another option would be to read the header comment and parse the row info. You can do this manually on the TextReader before passing to
CsvDataReader.Create
, or you can use theCs…