Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Request: compare lines within two CSV files, difference in new CSV #7

Open
hspjanssen opened this issue Jan 19, 2022 · 5 comments
Open

Comments

@hspjanssen
Copy link

Hello,

First of all, great work!

I have a feature request (maybe not really in the scope of your project?). Compare two outputs (CSV files) and print lines that differ in a new CSV file. Reason is, I pull a table from an URL, table consists of approximately 2000 lines. Each day a line can be added, but now I have to check the complete file/lines again, to check the differences.

Hope this is clear, otherwise please ask.

Many thanks and keep up the good work.

@flother
Copy link
Owner

flother commented Jan 20, 2022

This is definitely outside the scope of this project, but fortunately this is something you can already do by combining HTMLTab with existing tools. Let's say that yesterday you requested data from https://example.com/data.csv and saved it to a local file named yesterday.csv. Today you can see the differences using:

sdiff --suppress-common-lines yesterday.csv <(htmltab https://example.com/data.csv)

That will use HTMLTab to get the latest version of the CSV and pass it in as the second argument to sdiff (side-by-side diff), comparing it to yesterday.csv and only outputting the lines that have changed. I know that sdiff is available on MacOS and Linux by default but I'm not sure about Windows.

This is actually a really good use-case for HTMLTab, I'll add this to the documentation as an example at some point.

@hspjanssen
Copy link
Author

hspjanssen commented Jan 20, 2022

Thanks! Checked it, works like a charm.

Is there also a simple way to filter lines within the file which match a specific string?

For example the table on bottom of your page https://flother.github.io/htmltab/, when you want to filter within colom "W" for the lines that matches string "12".

Original table:
P Team GP W D L F A GD Pts
1 Man City 21 17 2 2 53 13 40 53
2 Chelsea 21 12 7 2 45 16 29 43
3 Liverpool 20 12 6 2 52 18 34 42
4 Arsenal 20 11 2 7 33 25 8 35
5 West Ham 20 10 4 6 37 27 10 34
6 Spurs 18 10 3 5 23 20 3 33

The result should be:
P Team GP W D L F A GD Pts
2 Chelsea 21 12 7 2 45 16 29 43
3 Liverpool 20 12 6 2 52 18 34 42

Thanks in advance!

@flother
Copy link
Owner

flother commented Jan 21, 2022

Yep, that's possible too. The simplest way is to pipe the output through the standard Unix tool grep:

$ htmltab https://www.theguardian.com/football/premierleague/table | grep Leicester
1,Man City,22,18,2,2,54,13,41,56,Won against Newcastle Won against Leicester Won against Brentford Won against Arsenal Won against Chelsea
2,Liverpool,21,13,6,2,55,18,37,45,Won against Newcastle Drew with Spurs Lost to Leicester Drew with Chelsea Won against Brentford
5,Spurs,19,11,3,5,26,22,4,36,Drew with Liverpool Won against C Palace Drew with Southampton Won against Watford Won against Leicester
10,Leicester,19,7,4,8,33,36,-3,25,Lost to Aston Villa Won against Newcastle Lost to Man City Won against Liverpool Lost to Spurs
19,Newcastle,20,1,9,10,20,43,-23,12,Lost to Leicester Lost to Liverpool Lost to Man City Drew with Man Utd Drew with Watford

That's string matching within the whole file though, not individual columns. If you're feeling adventurous you should try the excellent xsv. That will allow you to search in particular columns:

$ htmltab https://www.theguardian.com/football/premierleague/table | xsv search --select Team Leicester
P,Team,GP,W,D,L,F,A,GD,Pts,Form
10,Leicester,19,7,4,8,33,36,-3,25,Lost to Aston Villa Won against Newcastle Lost to Man City Won against Liverpool Lost to Spurs

You can use xsv to remove columns and format the result:

$ htmltab https://www.theguardian.com/football/premierleague/table | xsv search --select Team Leicester | xsv select '!Form' | xsv table
P   Team       GP  W   D   L   F   A   GD  Pts
10  Leicester  19  7   4   8   33  36  -3  25

@hspjanssen
Copy link
Author

Works great, many thanks for your support, really helps a lot!

You can close this request for now.

Have a nice weekend!

@flother
Copy link
Owner

flother commented Jan 21, 2022

Glad I could help.

I'll keep this issue open as a reminder to add the example to the documentation. Once I've done that, I'll close it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants