-
Notifications
You must be signed in to change notification settings - Fork 335
Looking for the Best Excel Add ins
Data analysts often turn to add-ins when they encounter complicated computations in Excel. Here we examine and compare some common add-ins in terms of deployment process, development efficiency, application fluidity and, particularly, computational capabilities. esProc SPL shows more excellent performance than others. Looking Looking for the Best Excel Add-ins for details.
Excel provides convenient methods, including easy to use functions and intuitive buttons and menus, for performing simple computations. But it’s hard for it alone to accomplish complicated computations and certain special ones. So it offers the add-in interface through which an external application can be connected to use their language or script to help Excel handle those computations.
Now I’ll investigate a number of common Excel add-ins and evaluate their computational capabilities.
One of the earliest Excel add-ins, Excel DNA lets you create dynamic library functions for Excel using C#, F#, VB.net, etc.
First, you need to write a user-defined function. Below is an example written in C#, which is listed in the official website of Excel DNA. MyFunction is the name of the user-defined function.
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;
using ExcelDna.Integration;
namespace MyLibrary
{
public class Class1
{
[ExcelFunction(Description="few people use this way!")]
public static string MyFunction(string name)
{
return "Bonjour" + name;
}
}
}
The code needs to be compiled as a dynamic library to be used in Excel.
Then you configure the relationship between the user-defined function and the add-in. The following DnaSample.dna file configures the add-in’s name ("My name") and the corresponding dynamic library Mylibrary.dll, which includes multiple user-defined functions.
<DnaLibrary Name="My name" RuntimeVersion="v4.0">
<ExternalLibrary Path="Mylibrary.dll" />
</DnaLibary>
Finally you configure the new add-in in Excel to be able to call MyFunction from a cell, as shown below:
A | A | ||||
---|---|---|---|---|---|
1 | =MyFunction("jason") | press enter | --> | 1 | Bonjour jason |
2 | 2 |
Since a compiled program is ready to execute and closely integrated into Excel, the execution is smooth and fast. So the biggest advantage of Excel DNA is the high fluidity.
As the name implies, the add-in can make full use of the facilities Windows DNA Architecture offers, such as development languages and tools, integration with Excel, interactive debugging and etc.
In theory, languages like C#, F# and VB.net are universal and all-powerful. But I noted that the sample programs in their websites are all about string output. Surely this is not a good choice for display of strengths. So why is that?
Because their capabilities exist only on paper.
Those languages lack class libraries for structured data computations. Hardcoding is needed even for the most basic calculations. The extremely roundabout code isn’t suitable for handling complicated computations.
Besides, C#, F# and VB.net are compiled languages instead of interpreted languages. This type of programming languages requires users to maintain a compilation environment for compiling the algorithm, in case it is changed. But configuring Windows compilation environment is complex and difficult to learn. Actually these languages have a high technological threshold. This means that Excel DNA is more suitable for professional programmers who use it as an interface, rather than for most of the data analysts who directly use it for desktop analysis.
Other add-ins, such as Java-based JINX, also lacks class libraries for structured data computations. So JINX is unsuitable for data computations, too. Even Excel VBA, the spreadsheet tool’s built-in add-in, is no better in terms of expression capability (which means that it isn’t suitable for computing data). But as it doesn’t require integration and compilation, it is more competitive than Excel DNA and/JINX.
An add-in needs to be more convenient and easier to use at least than VBA to get popularity. So Microsoft released Excel JavaScript in 2013, a language intended for use by add-ins and more convenient than VBA.
Excel JavaScript has similar uses as other add-in languages. The point is that it is an interpreted language and thus supports modifying a program anytime and then executing it immediately without compilation. This is a big difference from Excel DNA. Generally an interpreted language has low fluidity. But as an Excel built-in, Excel JavaScript can be executed in the same process with the spreadsheet tool. In actual practice, the execution is fluid and fast, only slower than Excel DNA.
n Excel built-in add-in brings a lot of benefits. You don’t need to download the add-in and can develop programs without configuration. Excel JavaScript inherits the Excel’s cross-platform capabilities. A program can be seamlessly migrated between the standalone edition, web edition and Mac edition. The add-in can also access an Excel object, including workbook, sheet and cell. This greatly speeds up the development progress.
Yet these merits are what VBA also has. So what’s unique about Excel JavaScript?
It has great interface management ability. It can access Excel menu bar, buttons and a pop-up dialog using simpler syntax and define an add-in interface in a JS file. That’s much convenient than VBA.
Unfortunately, the interface management isn’t the key aspect of a data computing add-in. It’s not worthy of attention.
Our focus should be the computational capability. It’s a pity that JavaScript is still not equipped with any structured computation functions. It doesn’t have any advantage in handling complicated computations. It’s just another Excel-based scripting language.
A standard data computing add-in should have class libraries for structured computations, like PyXLL. PyXLL is a Pyhon-based add-in. Python Pandas has the structured computations class libraries.
PyXLL doesn’t need hardcoding when implementing simple algorithms, such as grouping & aggregation over the specified area. Here’s one example. Select a batch of employee records from an Excel worksheet, pass them to a user-defined function groupEmp, perform grouping & aggregation algorithm in PyXLL and return the result. The code for implementing the process is as follows:
import pandas as pd
import numpy as np
from pyxll import xl_func
@xl_func("dataframe")
def groupEmp(df):
df=df.groupby("deptid")['salary'].agg([len, np.sum, np.mean]) # Core code: Group summary
return df
The core code occupies only one line. Others are basically the routine. The program is succinct.
Of course the cooperation of multiple functions, instead of a single basic function, is needed to perform complicated and special computations. The problem is that it’s not convenient to do that in PyXLL.
Here’s one example of standardizing and then grouping and aggregate data. Based on a Excel table recording unit styles (columns A-E), the user-defined function will group records by STYLE and BEDROOMS and calculate averages over SQFEET column, BATHS column and PRICE column. You need to remove $ from each value of the string style PRICE column and convert it to numeric style for the computation.
The source data:
A | B | C | D | E | |
---|---|---|---|---|---|
1 | STYLE | SQFEET | BEDROOMS | BATHS | PRICE |
2 | RANCH | 1250 | 2 | 1.0 | $64,000 |
3 | SPLIT | 1190 | 1 | 1.0 | $65,850 |
4 | RANCH | 1500 | 3 | 3.0 | $86,650 |
5 | CONDO | 1400 | 2 | 1.5 | $80,050 |
6 | … |
The processed data stored in a new sheet:
A | B | C | D | E | |
---|---|---|---|---|---|
1 | STYLE | BEDROOMS | SQFEET | BATHS | PRICE |
2 | CONDO | 2 | 1630.0 | 1.75 | 95370.0 |
3 | CONDO | 3 | 1390.0 | 2.50 | 79350.0 |
4 | CONDO | 4 | 2105.0 | 2.50 | 127150.0 |
5 | RANCH | 1 | 720.0 | 1.00 | 34550.0 |
6 | … |
The user-defined function for implementing the algorithm is as follows (only core code is shown):
for i in range(1, len(b)):
b[i][4] = b[i][4].replace(“$”,‘ ‘)
b[i][4] = b[i][4].replace(“,”,‘ ‘)
for i in range(1, len(b)):
for j in [1, 2, 3, 4]:
b[i][j] = eval(b[i][j])
data = pandas.DataFrame(b[1:],columns=b[0])
out = data.groupby([‘STYLE’,‘BEDROOMS’]).mean()
return out
Only one line is for grouping, but six lines for pre-processing. That’s a little complicated.
Another example is splitting one row into multiple rows. Column A stores IDs and column B stores corresponding List values. There are List values that have multiple members separated by space. The user-defined function needs to split them by spaces and correspond each of them to the ID.
The source code:
A | B |
---|---|
ID | List |
1 | A1 B1 C1 D1 |
2 | A2 |
3 | A3 B3 C3 |
4 | A3 B4 D4 |
… | … |
The processed data stored in a new sheet:
A | B |
---|---|
ID | List |
1 | A1 |
1 | B1 |
1 | C1 |
1 | D1 |
2 | A2 |
… | … |
The user-defined function for implementing the algorithm is as follows:
split_dict = df.set_index('ID').T.to_dict('list')
split_list = []
for key,value in split_dict.items():
anomalies = value[0].split(' ')
key_array = np.tile(key,len(anomalies))
split_df = pd.DataFrame(np.array([key_array,anomalies]).T,columns=['ID','ANOMALIES'])
split_list.append(split_df)
df = pd.concat(split_list,ignore_index=True)
return df
The core code is complicated. PyXLL is not good at handling complicated or special computations.
PyXLL has one more problem. Excel has to call an external interpreter to parse the Python script. This results in very low fluidity and seriously bad user experience. Yet low fluidity isn’t a unique PyXLL problem. It’s common among all scripts add-ins requiring external interpreters, such as XLwings, Bert and RExcel. XLwings is another Python-based add-in and so share same pros and cons with PyXLL. Both Bert and RExcel are R-based. The R language is designed for implementing scientific modeling algorithms. Its structured computation class library isn’t professional. So both add-ins have weaker computing ability and lower fluidity than PyXLL.
he biggest advantage of an interpreted language is that they support immediate execution without compilation and that they are easy to maintain and modify.
esProc SPL is the professional data computing engine that offers an Excel add-in to write scripts using its SPL language. Similar to PyXLL, it boasts a wealth of structured computation function to implement simple algorithms effortlessly. To group and aggregate records in the specified area, for example, you just need the following script (groupEmp.dfx):
A | B | |
---|---|---|
1 | =create(eid, name,deptid,salary).record(arg1) | /arg1 receives the two-dimensional table generated from the Excel file |
2 | =A1.groups(deptid;sum(salary),avg(salary)) | / Core code: grouping & aggregation |
The core code is A2. It’s concise. Then we can call the user-defined function in an Excel cell. The syntax is =dfx("groupEmp",A1:D20).
SPL handles other basic algorithms in a simple and easy way (Here only the core code is shown):
A | B | |
---|---|---|
3 | =A1.select(salary>8000 && salary<10000) | /Filtering |
4 | =A1.sort(-salary) | /Inverse sorting |
5 | =A1.group(deptid).conj(~.sort(salary)) | /Windowing; sort employee records in each department by salary |
6 | =connect@l("mssql").query("select * from dept") | /Read in department records from the database |
7 | =A1.join@i(deptid,A6:deptid,deptname) | /Perform a join between the Excel table and the database table |
Now we can see that the standard of evaluating an add-in’s computational ability is how well it does in handling complicated or special computations.
Compared with PyXLL, esProc has an edge in this aspect.
It’s much easier and simpler in converting data to standard format and then grouping it in SPL than in PyXLL:
A | |
---|---|
2 | =A2.run(int(replace(replace(#5,”$”,””),”,”,””)):PRICE) |
3 | =A3.groups(STYLE,BEDROOMS;avg(SQFEET):SQFEET,avg(BATHS):BATHS,avg(PRICE):PRICE) |
It’s extremely simple to split one row into multiple rows in SPL:
A | |
---|---|
2 | =A1.news(ANOMALIES.split("");ID,~:ANOMALIES) |
A more complicated example is to calculate installments. An Excel worksheet records loan information, including load ID, amount, number of terms (by month) and annual rate:
A | B | C | D | |
---|---|---|---|---|
1 | LoanID | LoanAmt | Term | Rate |
2 | L01 | 100000 | 5 | 4.8 |
3 | L02 | 20000 | 2 | 5.0 |
4 | L03 | 500000 | 12 | 4.5 |
The user-defined function needs to get the detailed data for each term, including payable amount, interest, principal and principal balance. The expected result in a new worksheet is as follows:
A | B | C | D | E | F | G | H | |
---|---|---|---|---|---|---|---|---|
1 | LoanID | LoanAmt | Payment | Term | Rate | Interest | Principal | PrincipalBalance |
2 | L01 | 100000 | 20238.13 | 5 | 4.75 | 395.83 | 19842.29 | 80159.71 |
3 | L01 | 100000 | 20238.13 | 5 | 4.75 | 317.29 | 19920.83 | 60236.87 |
4 | L01 | 100000 | 20238.13 | 5 | 4.75 | 238.44 | 19999.69 | 40237.18 |
5 | L01 | 100000 | 20238.13 | 5 | 4.75 | 159.27 | 20078.85 | 20158.33 |
6 | … |
It’s rather convenient to get this done in SPL:
A | |
---|---|
2 | =A2.derive(Rate/100/12:mRate,LoanAmt*mRate*power((1+mRate),Term)/(power((1+mRate),Term)-1):mPayment) |
3 | =A3.news((t=LoanAmt,Term);LoanID, LoanAmt, mPayment:payment, Term, Rate, t* mRate:interest, payment-interest:principal, t=t-principal:principlebalance) |
Though having powerful computational ability, esProc is also an add-in whose execution depends on the external interpreter JVM. So the unfluidity problem still exists.
Is there a way to both retain the computational capability and run the application smoothly?
esProc SPL successfully achieve that by replacing the user-defined function with the clipboard!
To find the students whose score of each subject ranks in top 3, for example. Column A contains student names, column B-D contains scores of math, English and physics respectively. Now we want to find the eligible students to append them to the scores in college.
The source data:
A | B | C | D | |
---|---|---|---|---|
1 | name | math | english | physics |
2 | lily | 97 | 100 | 99 |
3 | Joshua | 100 | 99 | 100 |
4 | Sarah | 98 | 99 | 96 |
5 | Bertram | 94 | 95 | 85 |
6 | Paula | 91 | 88 | 91 |
7 | Sophia | 92 | 81 | 76 |
8 | Ben | 87 | 80 | 76 |
9 | Ruth | 92 | 91 | 87 |
10 | Pag | 95 | 87 | 87 |
Select the valid cells, copy them to the clipboard through ctrl+C, and then execute the following SPL script:
A | B | |
---|---|---|
1 | =clipboard().import@t() | /Get data from the clipboard |
2 | =A1.top(-3;math).(name) | /Get the 3 students whose math scores rank in top 3 |
3 | =A1.top(-3;english).(name) | |
4 | =A1.top(-3;physics).(name) | |
5 | =join@p(A2;A3;A4).export() | /Join 3 tables into a two-dimensional table and export them as strings |
6 | =clipboard(A5) | |
After the above script is executed, just select cell B11 and press ctrl+V and then data in the clipboard is copied into B11-D13. This gets the same result as the user-defined function does, as shown below:
A | B | C | D | |
---|---|---|---|---|
… | … | … | … | |
9 | Ruth | 92 | 91 | 87 |
10 | Pag | 95 | 87 | 87 |
11 | Joshua | Lily | Joshua | |
12 | Sarah | Sarah | Lily | |
13 | lily | Joshua | Sarah |
With esProc, you can always use the clipboard instead of writing a user-defined function except for certain cases, such as multiple areas are involved in the computation.
hrough the clipboard, you can have both a fluid computing process and the powerful computing ability, without the complicated deployment of the add-in. For data analysts, that’s really convenient.
In theory all add-ins including PyXLL can solve the unfluidity problem through the clipboard only if they can provide corresponding functions (getting data from the clipboard and convert it into the structured data within the Excel) in future versions.
In a nutshell, a fluid add-in has weak computing ability, and one that has strong computing ability is not so fluid. Yet with the clipboard method that can make up for the fluidity limitations, esProc SPL is surely the most suitable add-in for data analysts.
SPL Resource: SPL Official Website | SPL Blog | Download esProc SPL | SPL Source Code