In this example, we compare 2 workbook with only 1 data range each one. Working area is set only on numeric cells, so we can do positional difference. In second workbook we have add a new line, row 12 corresponding to VAR_9 and changed a cell in position M7.
(xltoy)$ xltoy diff data/data_sample1.xlsx data/data_sample1_diff.xlsx
add:
D12: 0.0005004366657703548
E12: 0.000547121974635698
F12: 0.0005295198569146752
G12: 0.0005072172479631228
H12: 0.0005379024723558388
I12: 0.0005430174783953075
J12: 0.0005689182087171011
K12: 0.0005538596087349697
L12: 0.0005297508191981888
M12: 0.0005409815166305603
N12: 0.0005876844243079103
O12: 0.0006204197757602877
P12: 0.0006232399068120899
Q12: 0.0007001965787723482
R12: 0.0007432987416372932
S12: 0.0007323677326847715
change:
data_sample:
M7: 905509 -> 905510
First workbook is from case1 In second workbook range was moved to another position
(xltoy)$ xltoy diff data/data_sample1.xlsx data/data_sample1_relative.xlsx --relative
<no output>
So in relative mode, no difference found
This exercise was done using data from https://data.world/ a free data sources provider.
It contains >3M cells.
Procedure in 5 steps:
- Download Air_Quality_Measures_on_the_National_Environmental_Health_Tracking_Network.csv
- Set the data working area and save the file as Air_Quality.xlsx
- Now this file was considered as the referrer file so we can store it in an efficient representation using xltoy collect
- Make some changes, add and remove cells are allowed too. and save this version as Air_Quality2.xlsx
- Use xltoy diff to find differences.
in practice:
-
wget https://query.data.world/s/aci46g2wmbpn5egnve7nj3aikoqw2m -O Air_Quality.csv
-
(excel) <open with Excel and set data range>
-
xltoy collect Air_Quality.xlsx --data --json -vvv > Air_Quality.json
- --data, to exclude eventually formulas.
- --json, we want output in JSON format.
- -vvv (triple v) means log in higher verbosity:DEBUG.
-
(excel) <Simulate some change in a new file>
-
xltoy diff Air_Quality.json Air_Quality2.xlsx --timeit -vvv
- json Vs xlsx to see different parsing times.
- --timeit, to print out the execution time.
- -vvv (triple v) means log in higher verbosity:DEBUG.
at the point 4) I've changed 2 cells randomly and the result og point 5) was:
change:
Air_Quality:
H146162: Wood -> Iron
N218636: 0 -> 1
It means, that change was found in sheet Air_Quality two cells involved: H146162 and N218636
It took:
- about 7 secs. to open file in Excel
- 35 sec. to do the snapshoot in json format
- 1.5 sec. to load json file
- diff costs less than 1sec.