-
Notifications
You must be signed in to change notification settings - Fork 33
Getting data in
Associated directory: 02-getting-data-in
We want to start small and gradually collect more and more financial information. But what does "start small" actually mean, and how we can then move forward and add more data over time?
Let's try and illustrate this with a small example.
What is the absolute minimum of financial tracking you can do? It would probably be taking note of your account balance now and then.
You start with your main account. Normally it would be the account that you use to fund most of your day-to-day expenses. It is probably a "checking" (or "current", or "debit") card account at the bank of your choice. If you have several of them -- just pick one for now, we will deal with the rest shortly.
To begin, you look at the current balance ($1230) in that account and record in in your journal:
YYYY-01-01 Start balance for assets:current
equity:opening balances -$1230
assets:current $1230
In a month, you've earned some money (let's say $1000), and you spent some of it. You check the balance again, and it is a $1100 now. As we are doing an absolute minimum, we woud record this as an unclassified expenditure:
YYYY-02-01 Start balance for assets:current
income:job -$1000
assets:current = $1100
expenses:unknown
Hledger
could now do some (trivial) math for you to figure out how much you spent this month. After a couple of months, you would know how much you are spending, but you would not know what you are spending it on.
Next step would be to slice that monolithic slab of expenses into smaller expense categories. This could be done based on the transaction statement that you can get from your bank, but going over the statements and converting each line into a journal entry would be extemely tedious and error prone.
Fortunately, hledger
allows you to automate this.
You would want to get CSV statements for your account covering a
reasonable period of time. You can start small: get a statement for
the last month or a quarter, or a calendar/fiscal year-to-date. You can always
import older statements later (this would be covered in later
chapters). For now, let's save that single statement in
./import/<institution name>/in
. For example, my current account was
once with Lloyds and so their statements will go into
./import/lloyds/in/{filename}.csv
.
From now on I will keep using Lloyds as an example, and we will start with a simple CSV statement:
$ cat import/lloyds/in/99966633_20171223_1844.csv
Transaction Date,Transaction Type,Sort Code,Account Number,Transaction Description,Debit Amount,Credit Amount,Balance,
01/05/2017,BP,'12-34-56,99966633,AVIVA,100,,986.93
07/04/2017,BP,'12-34-56,99966633,OASIS COFFEE ,2.76,,1086.93
07/04/2017,DEB,'12-34-56,99966633,WAITROSE,5.24,,1089.69
01/04/2017,,'12-34-56,99966633,INTEREST (NET) ,,1.21,1094.93
31/03/2017,BGC,'12-34-56,99966633,HSBC,100,,1093.72
30/03/2017,BGC,'12-34-56,99966633,EMPLOYER INC,,1093.72,1193.72
If you are lucky, your bank would allow you to export an arbitrary range of transactions, and you would be able to pick the convention that suits you best. For example, you might want to export at most one file per year or per quarter or per month.
However, financial institutions could be rather arbitrary in what they provide. You might be getting monthly statements, but generated on 15th of the month (so the inconveniently cross year boundary). Or maybe the maximum amount of rows in the statement will be limited by 100 or 1000 (I saw this in real life).
Or you might want to just download transactions from the last date you previously grabbed files from this financial institution, however many days that might be.
This should not matter much. Save as many or as few files as you want, combine them or split them. Multiple files do not mean additional processing overhead.
Quite often these CSV files would not be directly suitable for
hledger
's CSV import facility. You might want or need to do some data
cleaning or transformation. Let's say that in our example I want to remove extra
spaces from the description field and replace numeric account names
with human-readable ones -- even though it is not necessary to import
my sample file, I want to demonstrate how clean-up script could be
written and ran. Lets place the clean-up script in ./import/lloyds/in2csv
:
$ cat import/lloyds/in2csv
#!/bin/bash
sed -e 's/ +/ /g; s/,99966633,/,assets:Lloyds:current,/' < "$1"
Then you need to write conversion rules for hledger's CSV import
command (place them in ./import/lloyds/lloyds.rules
) and a script
that can use these rules to print out converted journal
file. Place
it in ./import/lloyds/csv2journal
. Usually, it is as simple as
this:
$ cat import/lloyds/csv2journal
#!/bin/bash
hledger print --rules-file lloyds.rules -f "$1"
We would want to use those scripts to convert files in ./in
into files in ./csv
, and then convert those into journal files:
import
└── lloyds
├── in
│ └── 99966633_20171223_1844.csv ----.
├── csv in2csv
│ └── 99966633_20171223_1844.csv <---' ----.
└── journal csv2journal
└── 99966633_20171223_1844.journal <--------------'
Build rules (from ./export/export.hs
) will automatically invoke
conversion scripts named in2csv
and csv2journal
.
The build rules are demand-driven instead of supply-driven. What I mean by that is that they would not try to discover and convert all the in/*.csv
files (as "supply-driven" build system system could do).
Instead, when you include
a file from import/lloyds/journal/
directory (like import/lloyds/journal/<whatever>.journal
),
the build rules would try and produce import/lloyds/csv/<whatever>.csv
, which will in turn demand the existence of import/lloyds/in/<whatever>.csv
,
and would call in2csv
and csv2journal
scrips as necessary.
Let's "enroll" our lloyds
directory into this automatic conversion mechanism by
editing export/export.hs
and populating the lists under these two comment lines:
-- Enumerate directories with auto-generated cleaned csv files
[ "//import/lloyds/csv/*.csv" ] |%> in2csv
-- Enumerate directories with auto-generated journals
[ "//import/lloyds/journal/*.journal" ] |%> csv2journal
Shake
operate with absolute pathnames, so the path to import
in
the example above starts with //
to indicate that the rule
csv2journal
will fire whenever you include a .journal
file with
import/lloyds/journal
in its path name, but it could be anywhere on
your disk.
Your conversion scripts will run from the directory where they
reside, so you could refer to other files (like lloyds.rules
) by
their relative names.
Most probably your account was not opened in 2017. If your account
already had some balance on 1st Jan 2017 (the date of the start of the downloaded statement), you will need to record this
in your journal. As with all other manual transactions, you will put
this one in the suitable top-level yearly file. In this case, it is 2017.journal
:
$ hledger -f 2017.journal print 'desc:opening'
2017-01-01 opening balances
assets:Lloyds:current = £100.00
equity:opening balances
Make sure you use equity:opening balances
as this is an account that
hledger close
uses to generate end-of-year equity carry-over
transactions (we will cover them later).
Normally hledger
tries to deduce the format for your amounts (£1234.56
or £1,234.56
or £1234,56
or even £1_234.56
) from the amounts encountered in your journals.
However, it is a good practice to specify the format that you want via commodity
directive.
Those directives could go into a single file (commodities.journal
), which will be included into all your yearly files:
$ cat commodities.journal
commodity £1000.00
Finally, everything is in place to convert the downloaded file
and to .journal
.
Here is a crucial bit: instead of copying the contents of that file
into your journal for the current year, lets just include
it, so now your
2017.journal
will look like this:
$ cat 2017.journal
;; This journal is for the current year
include commodities.journal
;; Opening balances
2017/01/01 opening balances
assets:Lloyds:current = £100.00
equity:opening balances
include ./import/lloyds/journal/99966633_20171223_1844.journal
As explained earlier, now that we "demand" for this journal file to exist, it will cause build rules to produce it from the corresponding source files.
Crucially, build rules would not automatically convert all .csv
files in import
to .journal
.
Now you can run ./export.sh
. This will automatically run in2csv
and csv2journal
scripts with the right set of arguments and
regenerate all your reports. If you are keeping reports under version
control you should be able to diff them against the previous version
and see exactly what has changed where at a glance.
You will notice that import rules put all expenses in the single
expenses:unknown
account (which means that they are not categorized in any way). That's fine, let's assume that you do not have
time to sort them out just now, we will do this later.
I've included a sample data file and conversion scripts in 02-getting-data-in:
import
└── lloyds
├── in
│ └── 99966633_20171223_1844.csv - sample of the original downloaded file
├── csv
│ └── 99966633_20171223_1844.csv - cleaned up file, ready to be consumed by
│ 'hledger print'
├── journal
│ └── 99966633_20171223_1844.journal - generated journal
├── in2csv - conversion script to produce ./csv/*
├── csv2journal - conversion script to produce ./journal/*
└── lloyds.rules - CSV conversion rules
The full list of changes done from the previous step could be seen in
diffs/01-to-02.diff. There is a
single bit of change there not covered so far: CSV conversion rules in
lloyds.rules
are added as a dependency for autogenerated Lloyds
journals so that changes in the rules files will cause all journals to
be regenerated:
extraDeps file
| "//lloyds//*.journal" ?== file = ["lloyds.rules"]
| otherwise = []
Build rules automatically add in2csv
and csv2journal
as
dependencies to the files they produce, so any changes in those
scripts will cause all relevant outputs to be regenerated.
At this point, our little project looks like this:
$ tree
.
├── 2017.journal
├── all.journal
├── commodities.journal
├── export
│ ├── 2017-all.journal
│ ├── 2017-balance-sheet.txt
│ ├── 2017-cash-flow.txt
│ ├── 2017-income-expenses.txt
│ ├── export
│ ├── export.hi
│ ├── export.hs
│ └── export.o
├── export.sh
├── import
│ └── lloyds
│ ├── csv
│ │ └── 99966633_20171223_1844.csv
│ ├── csv2journal
│ ├── in
│ │ └── 99966633_20171223_1844.csv
│ ├── in2csv
│ ├── journal
│ │ └── 99966633_20171223_1844.journal
│ └── lloyds.rules
└── README.md
6 directories, 19 files
Now that you have converted a single statement, let's get the full history of a single account, save and convert all of them.
- Key principles and practices
- Getting started
- Getting data in
- Getting full history of the account
- Adding more accounts
- Creating CSV import rules
- Maintaining CSV rules
- Investments - easy approach
- Mortgages
- Remortgage
- Foreign currency
- Sorting unknowns
- File-specific CSV rules
- Tax returns
- Speeding things up
- Tracking commodity lost manually
- Fetching prices automatically
- ChangeLog