forked from jbms/beancount-import
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathofx.py
1552 lines (1333 loc) · 64.8 KB
/
ofx.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
830
831
832
833
834
835
836
837
838
839
840
841
842
843
844
845
846
847
848
849
850
851
852
853
854
855
856
857
858
859
860
861
862
863
864
865
866
867
868
869
870
871
872
873
874
875
876
877
878
879
880
881
882
883
884
885
886
887
888
889
890
891
892
893
894
895
896
897
898
899
900
901
902
903
904
905
906
907
908
909
910
911
912
913
914
915
916
917
918
919
920
921
922
923
924
925
926
927
928
929
930
931
932
933
934
935
936
937
938
939
940
941
942
943
944
945
946
947
948
949
950
951
952
953
954
955
956
957
958
959
960
961
962
963
964
965
966
967
968
969
970
971
972
973
974
975
976
977
978
979
980
981
982
983
984
985
986
987
988
989
990
991
992
993
994
995
996
997
998
999
1000
"""OFX transaction source.
This is based on the original ledgerhub OFX parsing by Martin Blais, but has
been heavily modified.
It imports transactions, balance information, and price information.
This is known to work for various types of checking, investment, retirement, and
credit card accounts. However, to support any particular institution it may be
necessary to make some minor changes.
Data format
===========
To use, first download transaction data into a directory on the filesystem. You
might have a directory structure like:
financial/
data/
institution/
account_id/
*.ofx
where financial/ also contains your beancount journal files, and `institution`
and `account_id` are replaced by the actual institution and account id.
If your institution supports the OFX protocol, you can use the finance_dl.ofx
module to automatically download data in the requisite format. You can also
manually download OFX-format statements and place them in a directory.
Duplicate transactions, as determined by the FITID field, are automatically
excluded. Therefore, if downloading manually, you should just ensure that there
are no gaps in the date ranges selected; overlap will not cause any problems.
Specifying the source to beancount_import
=========================================
Within your Python script for invoking beancount_import, you might use an
expression like the following to specify the ofx source:
dict(module='beancount_import.source.ofx',
ofx_filenames=(
glob.glob(os.path.join(journal_dir, 'data/institution1/*/*.ofx'))
+ glob.glob(os.path.join(journal_dir, 'data/institution2/*/*.ofx'))
),
cache_filename=os.path.join(journal_dir, 'data/ofx_cache.pickle'),
)
where `journal_dir` refers to the financial/ directory.
The `cache_filename` key is optional, but is recommended to speed up parsing if
you have a large amount of OFX data. When using the `cache_filename` option,
adding and deleting OFX files is fine, but if you modify existing OFX files, you
must delete the cahe file manually.
Specifying individual accounts
==============================
The source specification does not itself indicate anything about the mapping to
accounts. That information is instead specified in the Beancount journal
itself.
OFX files contain three tags that are used to identify the account: `ORG`,
`BROKERID`, and `ACCTID`. You can associate a given OFX account with a
Beancount account/account prefix as follows:
2014-01-01 open Assets:Investment:Vanguard
ofx_org: "Vanguard"
ofx_broker_id: "vanguard.com"
account_id: "XXXXXXXXXXXX"
ofx_account_type: "securities_only"
The `ofx_org`, `ofx_broker_id`, `account_id`, and `ofx_account_type` metadata
fields must all be specified. If the OFX file does not include an `ORG` or
`BROKERID` tag, specify the empty string `""` as the `ofx_org` or
`ofx_broker_id`, respectively.
The `ofx_account_type` must be one of the following:
- `"cash_only"`: Indicates that the account holds cash only, no securities.
Transactions are posted directly to the specified Beancount account, rather
than to a `:Cash` suffix. Any transactions involving securities found in
the OFX file will be assumed to due to a bank deposit sweep program, or
similar, and are ignored.
- `"securities_and_cash"`: Indicates that the account holds both securities and
cash. All BUY* transactions are funded from a `:Cash`-suffixed account, and
the proceeds of all SELL* and INCOME transactions are also posted to the
`:Cash`-suffixed account.
- `"securities_only"`: Indicates that the account holds securities, and no cash
(e.g. many types of Vanguard accounts). All BUY* transactions are funded
from an unknown, external account (which may be predicted automatically), and
the proceeds of all SELL* and INCOME transactions are also posted to an
unknown, external account.
- `"securities_and_pending_cash"`: Like the `"securities_only"` case, indicates
that the account holds securities, and no cash. However, a fictitious
`:Cash`-suffixed account is still used to fund all BUY* transactions and for
the proceeds from all SELL* and INCOME transactions (like the
`"securities_and_cash"` case). Additionally, for each such transaction, a
separate transfer transaction between the `:Cash`-suffixed account and an
unknown external account is also created, in order to restore the balance of
the `:Cash` account to zero. In most cases the `"securities_only"` account
type will be preferable to this account type, but this account type may make
it easier to manually enter transactions, may be more compatible with
existing transactions not imported using this module, and may also help avoid
balance discrepancies in the case that two institutions post a transaction on
different dates.
Depending on the type of the account and the type of transactions to be
imported, some additional metadata fields must also be specified for the
account.
For all types of accounts, the following optional metadata fields may be
specified:
- `quantity_round_digits`: Rounds the number of units to this many decimal
places. This can be useful for avoiding balance errors in some cases. For
example, Vanguard 401k accounts may benefit from a `quantity_round_digits`
value of 3.
- `ignore_transaction_regexp`: Specifies a regular expression (in the syntax
accepted by the Python `re` module) to be matched against the start of the
narration of each generated transaction. Transactions that match are
skipped. This is useful for dealing with spurious transactions of a
particular form.
For accounts with non-cash holdings, the following metadata fields may be
specified:
- `xxx_income_account`, where `xxx` is one of `div`, `interest`, `cgshort`,
`cglong`, or `misc`: Specifies the income account prefix used for
transactions with an `INCOMETYPE` of `DIV`, `INTEREST`, `CGSHORT`, `CGLONG`,
or `MISC`, respectively. If you specify a `div_income_account` of
`"Income:MyBank:Dividends"`, and then receive a dividend from a security
`XYZ`, the income will be posted to `Income:MyBank:Dividends:XYZ`.
- `capital_gains_account`: Specifies the account prefix used for capital gains
and losses due to the sale of any security. If you specify a
`capital_gains_account` of `"Income:MyBank:Capital-Gains"` and then sell
shares of a security `XYZ`, the gains or losses will be posted to
`Income:MyBank:Capital-Gains:XYZ`.
- `fees_account`: Brokerage fees associated with individual transactions are
posted to this account. A typical value would be
`"Expenses:Investment-Fees:MyBank:Brokerage-Fees"`.
- `commission_account`: Commission fees associated with individual transactions
are posted to this account. A typical value would be
`"Expenses:Investment-Fees:MyBank:Commission-Fees"`.
Each of the above fields is required if, and only if, an imported transaction
requires it. For example, you need not specify a `div_income_account` if you
have no dividend transactions.
For 401(k) accounts, several additional metadata fields may also be specified:
- `xxx_account`, where `xxx` is one of `pretax`, `aftertax`, `match`,
`profitsharing`, `rollover`: Specifies the account prefix to use for the
PRETAX, AFTERTAX, MATCH, PROFITSHARING, ROLLOVER sub-accounts, respectively.
If not specified, defaults to a suffix of `:PreTax`, `:AfterTax`, `:Match`,
`:ProfitSharing`, `:Rollover`, respectively on the main account. Using a
typical `pretax_account` value of
`"Assets:Retirement:MyBank:Company401k:PreTax"`, shares of security XYZ would
be held in `Assets:Retirement:MyBank:Company401k:PreTax:XYZ` and
`Assets:Retirement:MyBank:Company401k:PreTax:Cash` would be used as the cash
account (if the `account_type` is `securities_and_cash` or
`securities_and_pending_cash`).
- `xxx_contribution_account`, where `xxx` is one of `pretax`, `aftertax`,
`match`, `profitsharing`, `rollover`: Specifies the account from which BUY*
transactions for the specified account type are funded. If not specified,
generated BUY* transactions will leave the funding source unknown
(i.e. `Expenses:FIXME`). For example, you may wish to specify a
`match_contribution_account` of `"Income:MyCompany:Match"`.
IRA account example
-------------------
2014-01-01 open Assets:Investment:Vanguard
ofx_org: "Vanguard"
ofx_broker_id: "vanguard.com"
account_id: "XXXXXXXXXXXX"
ofx_account_type: "securities_only"
div_income_account: "Income:Vanguard:Dividends"
capital_gains_account: "Income:Vanguard:Capital-Gains"
Investment account with cash holdings example
---------------------------------------------
2014-01-01 open Assets:Investment:MyBank
ofx_org: "Vanguard"
ofx_broker_id: "vanguard.com"
account_id: "XXXXXXXXXXXX"
ofx_account_type: "securities_and_cash"
div_income_account: "Income:MyBank:Dividends"
interest_income_account: "Income:MyBank:Interest"
fees_account: "Expenses:Financial:Investment-Fees:MyBank"
capital_gains_account: "Income:MyBank:Capital-Gains"
401(k) account example
----------------------
2014-01-01 open Assets:Retirement:Vanguard:Company401k
ofx_org: "Vanguard"
ofx_broker_id: "vanguard.com"
account_id: "XXXXXXX"
ofx_account_type: "securities_only"
div_income_account: "Income:Vanguard:Dividends"
fees_account: "Expenses:Financial:Investment-Fees:Vanguard"
match_contribution_account: "Income:Company:Match"
aftertax_account: "Assets:Retirement:Vanguard:Company401k:AfterTax"
pretax_account: "Assets:Retirement:Vanguard:Company401k:PreTax"
match_account: "Assets:Retirement:Vanguard:Company401k:Match"
capital_gains_account: "Income:Vanguard:Capital-Gains"
quantity_round_digits: 3
Checking, savings, or credit card account
-----------------------------------------
2014-01-01 open Assets:Checking
ofx_org: "MyBank"
ofx_broker_id: "MyBank"
ofx_account_type: "cash_only"
account_id: "XXXXXXXX"
Dealing with missing ticker names
---------------------------------
By default, the TICKER tag specified for each security in the OFX file is used
as the currency name and sub-account name within the beancount file. However,
sometimes the TICKER field is missing or not a valid Beancount currency
identifier. In that case, you must manually specify the mapping from CUSIP to a
symbol name by creating a commodity directive in your beancount journal as
follows:
1900-01-01 commodity VANGUARD-92204E878
cusip: "92204E878"
Dealing with cash-equivalent securities
---------------------------------------
In some accounts, a part or all of the cash holdings may be held under a
cash-equivalent security (with a unit price of 1) corresponding to a Bank
Deposit Sweep Program, or something similar. By default, this would result in a
bunch of spurious BUY/SELL transactions that convert between cash and this
cash-equivalent security. To avoid this problem, for any cash-equivalent
security, you can create a commodity directive with a `equivalent_currency`
metadata field specifying the currency to which it is equivalent. For example:
1900-01-01 commodity QTSAQ
equivalent_currency: "USD"
Imported transaction format
===========================
Cash-only transactions
----------------------
2012-07-27 * "INVBANKTRAN - TRANSFERRED FROM VS X10-08144-1"
Assets:Savings:Fidelity 115.83 USD
date: 2012-07-27
ofx_fitid: "X0000000000000000000002"
ofx_memo: "TRANSFERRED FROM VS X10-08144-1"
ofx_type: "INVBANKTRAN"
Expenses:FIXME -115.83 USD
The `ofx_fitid` metadata field associates the posting with the corresponding
entry in the OFX file. The `ofx_name`/`ofx_memo` and `ofx_type` metadata fields
provide features for predicting the unknown `Expenses:FIXME` account.
Buy and sell transactions for `"securities_and_cash"` accounts
--------------------------------------------------------------
For BUY* and SELL* OFX transactions when the `account_type` is
`"securities_and_cash"`, Beancount transactions of the following form are
generated:
2018-08-01 * "BUYSTOCK"
Assets:Investment:MyBank:SDVMV 60.01318 SDVMV {67.141053527 USD}
date: 2018-08-01
ofx_fitid: "aedf1852aa39a54-623ee.4d104.5"
ofx_type: "BUYSTOCK"
Assets:Investment:MyBank:Cash -4115.86 USD
ofx_fitid: "aedf1852aa39a54-623ee.4d104.5"
Expenses:Investment:MyBank:Fees 63.4869 USD
Expenses:Investment:MyBank:Commission 23.0233 USD
2018-08-01 * "SELLSTOCK"
Assets:Investment:MyBank:EEBHF -41.50416 EEBHF {} @ 83.661853593 USD
date: 2018-08-01
ofx_fitid: "4a5141ead2c672e8a559.65-80e.b"
ofx_type: "SELLSTOCK"
Income:MyBank:Capital-Gains:EEBHF
Assets:Investment:MyBank:Cash 3382.60 USD
ofx_fitid: "4a5141ead2c672e8a559.65-80e.b"
Expenses:Investment:MyBank:Fees 31.9944 USD
Expenses:Investment:MyBank:Commission 57.7239 USD
Note that the cost of the shares is not specified in the generated SELL
transaction, because the OFX transaction does not provide lot information. For
the same reason, the amount for the `Capital-Gains` posting is also left
unspecified. Instead, the Beancount booking mechanism will determine the cost
and the capital gains automatically.
Buy and sell transactions for `"securities_and_pending_cash"` accounts
----------------------------------------------------------------------
When the `account_type` is `"securities_and_pending_cash"`, exactly the same
Beancount transactions are generated as for an `account_type` of
`"securities_and_cash"`, but in addition a separate transfer transaction is also
created for posting to the `:Cash` account:
2011-07-15 * "SELLMF - THIS IS A MEMO"
Assets:Investment:Vanguard:VFINX -42.123 VFINX {} @ 100 USD
ofx_fitid: "01234567890.0123.07152011.0"
date: 2011-07-15
ofx_memo: "THIS IS A MEMO"
ofx_type: "SELLMF"
Income:Vanguard:Capital-Gains:VFINX
Assets:Investment:Vanguard:Cash 4212.30 USD
ofx_fitid: "01234567890.0123.07152011.0"
2011-07-15 * "Transfer due to: SELLMF - THIS IS A MEMO"
Assets:Investment:Vanguard:Cash -4212.30 USD
ofx_fitid: ">01234567890.0123.07152011.0"
date: 2011-07-15
ofx_memo: "THIS IS A MEMO"
ofx_type_transfer: "SELLMF"
Expenses:FIXME 4212.30 USD
The `ofx_type_transfer` and `ofx_name`/`ofx_memo` metadata fields provide
features for predicting the unknown `Expenses:FIXME` account.
REINVEST transactions
---------------------
2018-06-21 * "REINVEST - DIV"
Assets:Retirement:Vanguard:Roth-IRA:TYCDT 31.704 TYCDT {2.94 USD}
date: 2018-06-21
ofx_fitid: "7c9254b784a.a9bd.edcfa27b.b"
ofx_type: "REINVEST"
Income:Vanguard:Dividends:TYCDT -93.21 USD
INCOME transactions
-------------------
For `INCOME` OFX transactions, Beancount transactions of the following form are
generated when the `account_type` is `"securities_and_cash"`:
2018-07-02 * "INCOME - DIV"
Assets:Investment:MyBank:Cash 62.65 USD
date: 2018-07-02
ofx_fitid: "fd2561ce31fca077e.87f.0"
ofx_type: "INCOME"
Income:MyBank:Dividends:URMFO -62.65 USD
Outgoing TRANSFER transactions
------------------------------
For outgoing `TRANSFER` transactions, Beancount transactions of the following
form are generated:
2013-09-05 * "TRANSFER - MATCH - Investment Expense"
Assets:Retirement:Vanguard:Company401k:Match:VANGUARD-92202V351 -0.04241 VANGUARD-92202V351 {} @ 39.37 USD
date: 2013-09-05
ofx_fitid: "1234567890123456795AAA"
ofx_memo: "Investment Expense"
ofx_type: "TRANSFER"
Income:Vanguard:Capital-Gains:VANGUARD-92202V351
Expenses:FIXME 1.67 USD
The `ofx_memo`/`ofx_name` and `ofx_type` metadata fields provide features for
predicting the unknown `Expenses:FIXME` account.
Incoming TRANSFER transactions
------------------------------
For incoming `TRANSFER` transactions, Beancount transactions of the following
form are generated:
2014-06-30 * "TRANSFER"
Assets:Investment:ExampleOrg:BAR 6.800992 BAR {1 USD, "FIXME"}
date: 2014-06-30
ofx_fitid: "XXXXXXXXX2"
ofx_type: "TRANSFER"
Expenses:FIXME -198.69 USD
Because the OFX transaction does not provide lot information, a dummy price and
label of `{1 USD, "FIXME"}` is specified in the transaction. After adding the
generated transaction to your journal, you will need to manually replace the
posting with one or more postings specifying the correct lots. In the modified
transaction, the `date` and `ofx_fitid` fields should also be duplicated on all
of the manually created postings, as shown below:
2014-06-30 * "TRANSFER"
Assets:Investment:ExampleOrg:BAR 6.000000 BAR {23.57 USD, 2012-03-07}
date: 2014-06-30
ofx_fitid: "XXXXXXXXX2"
ofx_type: "TRANSFER"
Assets:Investment:ExampleOrg:BAR 0.800992 BAR {29.41 USD, 2012-04-08}
date: 2014-06-30
ofx_fitid: "XXXXXXXXX2"
ofx_type: "TRANSFER"
Assets:Investment:AcmeOrg:BAR -6.000000 BAR {23.57 USD, 2012-03-07}
date: 2014-07-01
ofx_fitid: "YYYYYYYYY7"
ofx_type: "TRANSFER"
Assets:Investment:AcmeOrg:BAR -0.800992 BAR {29.41 USD, 2012-04-08}
date: 2014-07-01
ofx_fitid: "YYYYYYYYY7"
ofx_type: "TRANSFER"
"""
import pickle
import re
from typing import Set, Tuple, Any, Dict, Union, List, Optional, NamedTuple
import os
import collections
import datetime
import tempfile
import bs4
from atomicwrites import atomic_write
from beancount.core.data import Transaction, Posting, Balance, Commodity, Price, Open, EMPTY_SET
from beancount.core.flags import FLAG_OKAY
from beancount.core.number import D
from beancount.core.number import ZERO
from beancount.core.number import Decimal
from beancount.core.amount import Amount
from beancount.core.position import CostSpec
from beancount.core.number import MISSING
from ..posting_date import get_posting_date, POSTING_DATE_KEY
from . import ImportResult, Source, SourceResults, InvalidSourceReference
from ..journal_editor import JournalEditor
from ..matching import FIXME_ACCOUNT, CHECK_KEY
from ..training import ExampleKeyValuePairs
# find_child and parse_ofx_time were derived from implementation in beancount/ingest/importers/ofx.py{,test}
# Copyright (C) 2016 Martin Blais
# GNU GPLv2
def find_child(node, name, conversion=None):
"""Find a child under the given node and return its value.
Args:
node: A <STMTTRN> bs4.element.Tag.
name: A string, the name of the child node.
conversion: A callable object used to convert the value to a new data type.
Returns:
A string, or None.
"""
child = node.find(name)
if not child:
return None
if not child.contents:
value = ''
else:
value = child.contents[0].strip()
if conversion:
value = conversion(value)
return value
def parse_ofx_time(date_str):
"""Parse an OFX time string and return a datetime object.
Args:
date_str: A string, the date to be parsed.
Returns:
A datetime.datetime instance.
"""
if len(date_str) < 14:
return datetime.datetime.strptime(date_str[:8], '%Y%m%d')
return datetime.datetime.strptime(date_str[:14], '%Y%m%d%H%M%S')
RawBalanceEntry = NamedTuple('RawBalanceEntry', [
('filename', str),
('date', datetime.date),
('uniqueid', Optional[str]),
('units', Optional[Decimal]),
('unitprice', Optional[Decimal]),
('inv401ksource', Optional[str]),
('tolerance', Optional[Decimal]),
])
RawCashBalanceEntry = NamedTuple('RawCashBalanceEntry', [
('filename', str),
('date', datetime.date),
('number', Decimal),
])
RawTransactionEntry = NamedTuple('RawTransactionEntry', [
('filename', str),
('date', datetime.date),
('fitid', str),
('trantype', str),
('total', Decimal),
('incometype', Optional[str]),
('inv401ksource', Optional[str]),
('memo', Optional[str]),
('name', Optional[str]),
('trntype', Optional[str]),
('uniqueid', Optional[str]),
('units', Optional[Decimal]),
('unitprice', Optional[Decimal]),
('tferaction', Optional[str]),
('fees', Optional[Decimal]),
('commission', Optional[Decimal]),
('opttradetype', Optional[str]),
('optsharesperctrct', Optional[Decimal]),
('checknum', Optional[str]),
])
SecurityInfo = NamedTuple('SecurityInfo', [
('uniqueid', str),
('name', Optional[str]),
('ticker', Optional[str]),
])
def get_info(
raw: Union[RawBalanceEntry, RawTransactionEntry]) -> Dict[str, Any]:
return dict(
type='application/x-ofx',
filename=raw.filename,
)
OFX_FITID_KEY = 'ofx_fitid'
FITID_TRANSFER_PREFIX = '>'
OFX_TYPE_KEY = 'ofx_type'
OFX_TYPE_TRANSFER_KEY = 'ofx_type_transfer'
OFX_MEMO_KEY = 'ofx_memo'
OFX_NAME_KEY = 'ofx_name'
CUSIP_KEY = 'cusip'
EQUIVALENT_CURRENCY = 'equivalent_currency'
# Used for generating unique features.
DESC_KEY = 'desc'
OFX_FEATURE_KEYS = [OFX_TYPE_KEY, OFX_TYPE_TRANSFER_KEY, OFX_MEMO_KEY, OFX_NAME_KEY]
cache_version_number = 4
valid_account_types = frozenset([
'cash_only',
'securities_only',
'securities_and_cash',
'securities_and_pending_cash',
])
inv401k_account_keys = {
'PRETAX': ('pretax_account', 'PreTax'),
'AFTERTAX': ('aftertax_account', 'AfterTax'),
'MATCH': ('match_account', 'Match'),
'PROFITSHARING': ('profitsharing_account', 'ProfitSharing'),
'ROLLOVER': ('rollover_account', 'Rollover'),
}
# "Auxiliary" accounts are those given to
# SourceResults.add_skip_training_account because they are neither the source
# nor the target account, and should be ignored while building training
# examples.
AUX_CAPITAL_GAINS_KEY = 'capital_gains'
AUX_INTEREST_KEY = 'interest_income'
AUX_FEE_KEYS = ['fees', 'commission']
AUX_ACCOUNT_KEYS = [AUX_CAPITAL_GAINS_KEY, AUX_INTEREST_KEY] + AUX_FEE_KEYS
def get_aux_account_by_key(account: Open, key: str, results: SourceResults) -> str:
"""Like get_account_by_key. Ensures the account isn't used for training."""
subaccount = account.meta.get(key)
if subaccount is None:
raise KeyError('%s: must specify %s' % (account.account, key))
if subaccount not in results.skip_training_accounts:
raise ValueError('%s is an auxiliary account but was not added to SourceResults skip_traing_accounts; this should be done in PrepareState')
return subaccount
def get_account_by_key(account: Open, key: str, default_suffix: Optional[str] = None) -> str:
result = account.meta.get(key)
if result is not None: return result
if default_suffix is None:
raise KeyError('%s: must specify %s' % (account.account, key))
return account.account + ':' + default_suffix
def normalize_fraction(d: Decimal) -> Decimal:
normalized = d.normalize()
sign, digits, exponent = normalized.as_tuple()
if exponent > 0:
return Decimal((sign, tuple(digits) + (0, ) * exponent, 0))
else:
return normalized
def is_valid_commodity_name(ticker: Optional[str]) -> bool:
return (ticker is not None and
re.match(r'^[A-Z][A-Z0-9-]*', ticker) is not None)
def get_securities(soup: bs4.BeautifulSoup) -> List[SecurityInfo]:
"""Extract the list of securities from the OFX file."""
seclistmsgsrsv = soup.find('seclistmsgsrsv1')
if not seclistmsgsrsv:
return []
securities = []
for secinfo in seclistmsgsrsv.find_all('secinfo'):
uniqueid = find_child(secinfo, 'uniqueid')
secname = find_child(secinfo, 'secname')
ticker = find_child(secinfo, 'ticker')
securities.append(
SecurityInfo(uniqueid=uniqueid, name=secname, ticker=ticker))
return securities
STOCK_BUY_SELL_TYPES = set(
['BUYMF', 'SELLMF', 'SELLSTOCK', 'BUYSTOCK', 'REINVEST', 'BUYDEBT',
'SELLDEBT', 'SELLOTHER'])
SELL_TYPES = set(['SELLMF', 'SELLSTOCK', 'SELLDEBT', 'SELLOTHER'])
OPT_TYPES = set(['BUYOPT', 'SELLOPT'])
RELATED_ACCOUNT_KEYS = ['aftertax_account', 'pretax_account', 'match_account']
# Tolerance allowed in transaction balancing. In units of base currency used, e.g. USD.
TOLERANCE = 0.05
UNITPRICE_ERROR_LOWER_BOUND = 0.2
UNITPRICE_ERROR_UPPER_BOUND = 5.0
class ParsedOfxStatement(object):
def __init__(self, seen_fitids, filename, securities_map, org, stmtrs):
filename = os.path.abspath(filename)
self.filename = filename
self.securities_map = securities_map
self.org = org
account_id = self.account_id = find_child(stmtrs, 'acctid')
self.broker_id = find_child(stmtrs, 'brokerid') or ''
self.currency = find_child(stmtrs, 'curdef')
raw_transactions = self.raw_transactions = []
raw_balance_entries = self.raw_balance_entries = []
raw_cash_balance_entries = self.raw_cash_balance_entries = []
# Set of (date, uniqueid) pairs where there were transactions.
# We don't emit balance entries for dates where there was activity, because there is no way
# to represent the fact that the balance includes those transactions, and we don't want to
# manually modify the balance by subtracting the transactions.
security_activity_dates = self.security_activity_dates = set()
cash_activity_dates = self.cash_activity_dates = set()
self.ofx_id = account_ofx_id = (org, self.broker_id, account_id)
for invtranlist in stmtrs.find_all(re.compile('invtranlist|banktranlist')):
for tran in invtranlist.find_all(
re.compile(
'^(buymf|sellmf|reinvest|buystock|sellstock|buyopt|sellopt|buydebt|selldebt|sellother|transfer|income|invbanktran|stmttrn)$'
)):
fitid = find_child(tran, 'fitid')
date = parse_ofx_time(
find_child(tran, 'dttrade') or
find_child(tran, 'dtposted')).date()
# We include the date along with the FITID because some financial institutions fail
# to produce truly unique FITID values. For example, National Financial Services
# (Fidelity) sometimes produces duplicates when the amount is the same.
full_fitid = (account_ofx_id, date, fitid)
uniqueid = find_child(tran, 'uniqueid')
if uniqueid is not None:
security_activity_dates.add((date, uniqueid))
cash_activity_dates.add(date)
if full_fitid in seen_fitids:
continue
seen_fitids.add(full_fitid)
trantype = tran.name.upper()
if trantype == 'INVBANKTRAN' or trantype == 'STMTTRN':
total = find_child(tran, 'trnamt', D)
else:
total = find_child(tran, 'total', D)
units = find_child(tran, 'units', D)
unitprice = find_child(tran, 'unitprice', D)
if units and total and unitprice:
error_ratio = abs(units * unitprice / total)
if error_ratio > UNITPRICE_ERROR_UPPER_BOUND or error_ratio < UNITPRICE_ERROR_LOWER_BOUND:
id_type = find_child(tran, 'uniqueidtype')
unique_id = find_child(tran, 'uniqueid')
units_x_unitprice = units*unitprice
unitprice = abs(total / units)
print(
f"Transaction [{id_type} {unique_id}]: Mismatch between UNITS * UNITPRICE = {units_x_unitprice:.2f} and TOTAL = {total:.2f}. Inferring price: {unitprice:.3f}")
opttrantype = None
shares_per_contract = find_child(tran, 'shperctrct', D)
if trantype == 'BUYOPT':
opttrantype = find_child(tran, 'optbuytype')
elif trantype == 'SELLOPT':
opttrantype = find_child(tran, 'optselltype')
raw = RawTransactionEntry(
trantype=trantype,
fitid=fitid,
date=date,
total=total,
incometype=find_child(tran, 'incometype'),
inv401ksource=find_child(tran, 'inv401ksource'),
memo=find_child(tran, 'memo'),
name=find_child(tran, 'name'),
trntype=find_child(tran, 'trntype'),
uniqueid=uniqueid,
units=units,
unitprice=unitprice,
tferaction=find_child(tran, 'tferaction'),
fees=find_child(tran, 'fees', D),
commission=find_child(tran, 'commission', D),
checknum=find_child(tran, 'checknum'),
opttradetype=opttrantype,
optsharesperctrct=shares_per_contract,
filename=filename)
raw_transactions.append(raw)
for inv_bal in stmtrs.find_all('invbal'):
availcash = find_child(inv_bal, 'availcash', D)
self.availcash = availcash
for bal in inv_bal.find_all('bal'):
if find_child(bal, 'value', D) == availcash:
date = find_child(bal, 'dtasof', parse_ofx_time)
if date is not None:
date = date.date()
raw_cash_balance_entries.append(
RawCashBalanceEntry(
date=date, number=availcash, filename=filename))
break
for bal in stmtrs.find_all('ledgerbal'):
bal_amount_str = find_child(bal, 'balamt')
if not bal_amount_str.strip():
continue
bal_amount = D(bal_amount_str)
date = find_child(bal, 'dtasof', parse_ofx_time).date()
raw_cash_balance_entries.append(
RawCashBalanceEntry(
date=date, number=bal_amount, filename=filename))
for invposlist in stmtrs.find_all('invposlist'):
for invpos in invposlist.find_all('invpos'):
time_str = find_child(invpos, 'dtpriceasof')
units = find_child(invpos, 'units', D)
unitprice = find_child(invpos, 'unitprice', D)
mktval = find_child(invpos, 'mktval', D)
tolerance = None
if mktval and mktval > 0:
error_ratio = units*unitprice/mktval
# these thresholds are arbitrary and could be tightened
if error_ratio > UNITPRICE_ERROR_UPPER_BOUND or error_ratio < UNITPRICE_ERROR_LOWER_BOUND:
id_type = find_child(invpos, 'uniqueidtype')
unique_id = find_child(invpos, 'uniqueid')
units_x_unitprice = units*unitprice
unitprice = mktval / units if units > 0 else None
print(
f"Balance [{id_type} {unique_id}]: Mismatch between UNITS * UNITPRICE = {units_x_unitprice:.2f} and MKTVAL = {mktval:.2f}. Inferring price: {unitprice:.3f}")
if self.org == "Vanguard":
# For Vanguard balance, tolerance needs to be set. See
# https://beancount.github.io/docs/precision_tolerances.html#explicit-tolerances-on-balance-assertions
tolerance = round(abs(units) * Decimal(0.001))
t = parse_ofx_time(time_str)
date = t.date()
raw_balance_entries.append(
RawBalanceEntry(
date=date,
uniqueid=find_child(invpos, 'uniqueid'),
units=units,
unitprice=unitprice,
inv401ksource=find_child(invpos, 'inv401ksource'),
filename=filename,
tolerance=tolerance))
def get_entries(self, prepare_state):
account = prepare_state.ofx_id_to_account.get(self.ofx_id)
results = prepare_state.results
if account is None:
results.add_warning(
'No account matching OFX ORG, BROKERID, ACCTID triplet: %r. Known accounts: %r' %
(self.ofx_id, prepare_state.ofx_id_to_account.keys()))
return
account_type = account.meta.get('ofx_account_type')
if account_type not in valid_account_types:
results.add_error(
'account %s has invalid ofx_account_type %r, must be one of %r'
% (account.account, account_type, valid_account_types),
account.meta)
quantity_round_digits = account.meta.get('quantity_round_digits', None)
has_securities = (account_type != 'cash_only')
has_cash_account = (account_type != 'securities_only')
has_real_cash_account = (account_type in ('securities_and_cash',
'cash_only'))
has_transfer_cash_account = (account_type == 'securities_and_pending_cash')
ignore_re = account.meta.get('ignore_transaction_regexp')
if quantity_round_digits is not None:
quantity_round_digits = int(quantity_round_digits)
ofx_id = self.ofx_id
matched_transactions = prepare_state.matched_transactions
matched_cash_transactions = prepare_state.matched_cash_transactions
matched_cash_transfer_transactions = prepare_state.matched_cash_transfer_transactions
securities_map = self.securities_map
commodities_by_cusip = prepare_state.commodities_by_cusip
security_activity_dates = self.security_activity_dates
cash_activity_dates = self.cash_activity_dates
cash_securities_map = prepare_state.cash_securities_map
def get_security(unique_id: str) -> Optional[str]:
commodity = commodities_by_cusip.get(unique_id)
if commodity is not None:
return commodity
if unique_id not in securities_map:
results.add_error(
'Missing id for security %r. You must specify it manually using a commodity directive with a cusip metadata field.'
% (unique_id, ))
return None
sec = securities_map[unique_id]
ticker = sec.ticker
# Treasury bill and bond start with 912
if ticker.startswith("912"):
# Prepend "T" to make it a valid ticker
ticker = "T" + ticker
if ticker is None:
results.add_error(
'Missing ticker for security %r. You must specify it manually using a commodity directive with a cusip metadata field.'
% (unique_id, ))
return None
if not is_valid_commodity_name(ticker):
results.add_error(
'Ticker %r for security %r is not a valid commodity name. You must specify it manually using a commodity directive with a cusip metadata field.'
% (ticker, unique_id))
return ticker
def get_subaccount(inv401ksource: Optional[str],
security: Optional[str] = None) -> str:
suffix = ''
if security is not None:
suffix = ':' + security
prefix = account.account
if inv401ksource is not None:
if (inv401ksource == 'OTHERNONVEST') or (inv401ksource == 'OTHERVEST'):
# For balance entries, OTHERNONVEST indicates an aggregate
# balance (at least with Vanguard).
return prefix
key, default_suffix = inv401k_account_keys[inv401ksource]
prefix = get_account_by_key(account, key, default_suffix)
return prefix + suffix
def get_subaccount_cash(inv401ksource: Optional[str] = None) -> str:
return get_subaccount(inv401ksource, 'Cash' if has_securities else None)
for raw in self.raw_transactions:
match_key = (ofx_id, raw.date, raw.fitid)
if has_real_cash_account:
if has_securities and match_key in matched_transactions:
continue
if match_key in matched_cash_transactions: continue
security_transaction_exists = False
cash_transaction_exists = False
elif has_transfer_cash_account:
security_transaction_exists = match_key in matched_transactions
cash_transaction_exists = match_key in matched_cash_transactions
cash_transfer_transaction_exists = match_key in matched_cash_transfer_transactions
if (security_transaction_exists and cash_transaction_exists ==
cash_transfer_transaction_exists):
continue
else:
if match_key in matched_transactions: continue
security_transaction_exists = False
cash_transaction_exists = False
total = raw.total
unitprice = ZERO
memo = raw.memo and ' '.join(raw.memo.split())
name = raw.name and ' '.join(raw.name.split())
if memo == 'Price as of date based on closing price':
memo = None
if name and memo and (memo.startswith(name) or memo.endswith(name)):
# Remove redundant name field.
name = None
if name and memo and (name.startswith(memo) or name.startswith(memo)):
# Remove redundant memo field
memo = None
narration = ' - '.join(
filter(None,
(raw.trantype, raw.incometype, raw.inv401ksource, name,
memo)))
if ignore_re and re.match(ignore_re, narration):
continue
entry = Transaction(
meta=None,
date=raw.date,
flag=FLAG_OKAY,
payee=None,
narration=narration,
tags=EMPTY_SET,
links=EMPTY_SET,
postings=[])
base_meta = [(OFX_FITID_KEY, raw.fitid)]
posting_meta = collections.OrderedDict(base_meta)
posting_meta[POSTING_DATE_KEY] = raw.date
posting_meta[OFX_TYPE_KEY] = raw.trantype
if memo:
posting_meta[OFX_MEMO_KEY] = memo
if name:
posting_meta[OFX_NAME_KEY] = name
if raw.checknum:
stripped_checknum = raw.checknum.lstrip('0')
if stripped_checknum:
posting_meta[CHECK_KEY] = D(stripped_checknum)
fee_total = ZERO
for fee_key in AUX_FEE_KEYS:
amount = getattr(raw, fee_key, None)
if amount is not None and amount != ZERO:
fee_total += amount
entry.postings.append(
Posting(
account=get_aux_account_by_key(
account,
fee_key + '_account',
results),
units=Amount(number=amount, currency=self.currency),
cost=None,
price=None,
flag=None,
meta=None))
cash_transfer_transaction_amount = None
if raw.trantype == 'INCOME' or raw.trantype == 'INVBANKTRAN' or raw.trantype == 'STMTTRN':
# Cash-only transaction
if total == ZERO:
# Cash-only transaction with total amount of zero should be ignored.
# These are sometimes produced by Fidelity.
continue
cur_amount = Amount(number=round(total, 2), currency=self.currency)
if not has_cash_account:
#if raw.trantype != 'INCOME':
raise ValueError('Cash transaction not expected')
# entry.postings.append(
# Posting(
# meta=None,
# account=FIXME_ACCOUNT,
# units=cur_amount,
# cost=None,
# price=None,
# flag=None,
# ))
else:
if has_transfer_cash_account:
cash_transfer_transaction_amount = cur_amount
cash_account_name = get_subaccount_cash(raw.inv401ksource)
results.add_account(cash_account_name)
entry.postings.append(
Posting(
meta=posting_meta,
account=cash_account_name,
units=cur_amount,
cost=None,
price=None,
flag=None,
))
if raw.trantype == 'INCOME':
assert total >= ZERO
security = get_security(raw.uniqueid)
if security is None:
continue