This manual is work in progress and is not complete. For basic commands see ../README.md.
The following sections use Emacs Lisp examples to configure Org Roam
BibTeX. User options can also be set via the Customize interface: run M-x
customize
or from menu click Options -> Customize Emacs -> Top Level
Customization Group
and search for org-roam-bibtex
.
Org Roam BibTeX makes it possible to automatically pre-expand Org-capture
%^{...}
and Org Roam-style ${...}
template placeholders with values of
field or fields of a BibTeX entry for which the note is being created.
Here’s an example of how to add a basic template for a bibliography note to
org-roam-capture-templates
:
(setq org-roam-capture-templates
'(;; ... other templates
;; bibliography note template
("r" "bibliography reference" plain "%?"
:target
(file+head "references/${citekey}.org" "#+title: ${title}\n")
:unnarrowed t)))
If there are more than one template in org-roam-capture-templates
, you will
be prompted for the key of the template you want to use (r
in the example
above). Otherwise, the only template will be used without prompting.
This option defines the format style of a citation key in the ROAM_REFS
property. Supported are Org-ref v2, Org-ref v3 and Org-cite styles:
org-ref-v2
(default): use the old Org-refcite:link
formatorg-ref-v3
: use the new Org-refcite:&link
formatorg-cite
: use the Org-cite@element
format
This can also be a custom format
string.
It should be noted that for a typical Org-roam use these styles are mostly cosmetic.
A list of template placeholders for pre-expanding. Any BibTeX field can be set
for preformatting including Bibtex-completion virtual fields such as key
’ and
’type
’. BibTeX fields can be referred by their aliases defined in
=orb-bibtex-field-aliases=.
Usage example:
(setq orb-preformat-keywords '("citekey" "author" "date"))
(setq org-roam-capture-templates
'(("r" "bibliography reference" plain
"%?
%^{author} published %^{entry-type} in %^{date}: fullcite:%\\1."
:target
(file+head "references/${citekey}.org" "#+title: ${title}\n")
:unnarrowed t)))
By default, orb-preformat-keywords
is configured to expand the following
BibTeX fields: “citekey”, “date”, “entry-type”, “pdf?”, “note?”, “author”,
“editor”, “author-abbrev”, “editor-abbrev”, “author-or-editor-abbrev”.
Special cases:
- The “file” keyword will be treated specially if the value of `orb-process-file-keyword’ is non-nil. See its docstring for an explanation.
- The “title” keyword needs not to be set for preformatting if it is used only
within the
:target
section of a template.
This variable takes effect when orb-preformat-templates
is set to t
(default). See also orb-edit-note
for further details.
Consult the =bibtex-completion= package for additional information about BibTeX field names.
If orb-process-file-keyword
is non-nil, the “file” field will be treated
specially. If the field contains only one file name, its value will be used for
template expansion. If it contains several file names, the user will be
prompted to choose one. The file names can be filtered based on their
extensions by setting the orb-attached-file-extensions
variable, so that only
those matching the extension or extensions will be considered for
retrieval. The “file” keyword must be set for preformatting as usual. Consult
the docstrings of these variables for additional customization options.
Non-nil to force abbreviation of file names by orb-get-attached-file.
When this option is set to a non-nil value, the filename returned
by orb-get-attached-file
will get the home directory part
abbreviated to ~/. Symlinked directories will be abbreviated
according to directory-abbrev-alist, see abbreviate-file-name
for details.
An as-is value will be used otherwise.
When retrieving an attached file, keep files with only these extensions.
This is a list of file extensions without a dot as case-insensitive strings.
Set it to nil to keep all file names regardless of their extensions.
BibTeX entries are searched for attached files according to
bibtex-completion-pdf-field
(default file) and in
BibDesk-specific Bdsk-File-N
fields.
Whether to look up BibDesk-specific file fields `Bdsk-File’.
If this is non-nil, attachments given in BibDesk-specific file fields will be
considered in addition to those found through the bibtex-completion-find-pdf
mechanism when performing a template expansion, opening an attachment with
orb-note-actions
or scraping a PDF with ORB PDF Scrapper.
Duplicates will be resolved, but since duplicate comparison is performed using
file-truename, this will lead to expansion of symlink paths if such are used in
the normal BibTeX file field, for example. See also orb-abbreviate-file-name
on how to abbreviate the retrieved filenames.
Set this to symbol only to look up only BibDesk attachments and
do not use bibtex-completion-find-pdf
.
The command orb-insert-link
can be used to create Org links to
bibliographic notes of type [[id:note_id][Description]]
. It is similar to
the Org-roam’s command org-roam-node-insert
. The difference between the two
is that the Org-roam’s version creates a link to any existing Org-roam note
(“node”) or creates a new note if it does not exist. The ORB’s version
consults the bibliography file and lets you create a link to an existing note
associated with a BibTeX entry or create a new note for an entry that doesn’t
have one yet.
The Description
part of the link is controlled by the user option
orb-insert-link-description
, which see. The global setting can be overriden
for a single invocation with a numerical prefix:
C-1 M-x orb-insert-link
forcestitle
C-2 M-x orb-insert-link
forcescitekey
C-8 M-x orb-insert-link
forcescitation-org-cite
C-9 M-x orb-insert-link
forcescitation-org-ref-3
C-0 M-x orb-insert-link
forcescitation-org-ref-2
If a region of text is active (selected) when calling orb-insert-link
, the
text in the region will be replaced with a link and the region’s text will be
used as link description — similar to org-roam-node-insert
.
Normally, the case of the link description will be preserved. It is possible
to force lowercase by supplying either one or three universal arguments C-u
.
Finally, bibtex-completion-cache
will be re-populated if either two or three
universal arguments C-u
are supplied.
Interface to use with orb-insert
. Supported interfaces are helm-bibtex
,
ivy-bibtex
, and generic
(orb-insert-generic
)
When using helm-bibtex
or ivy-bibtex
as orb-insert-interface
, choosing
the action "Edit note & insert a link" will insert the desired link. For
convenience, this action is made default for the duration of an
orb-insert-link
session. It will not persist when helm-bibtex
or
ivy-bibtex
proper are run. Otherwise, the command is just the usual
helm-bibtex=/=ivy-bibtex
. For example, it is possible to run other
helm-bibtex
or ivy-bibtex
actions. When action other than "Edit note &
insert a link" is run, no link will be inserted, although the session can be
resumed later with helm-resume
or ivy-resume
, respectively, where it will
be possible to select the "Edit note & insert a link" action.
When using the generic
interface, a simple list of available citation keys is
presented using completion-read
and after choosing a candidate the
appropriate link will be inserted.
Please note that this variable should be set using the Customize interface,
use-package
’s :custom
keyword, or Doom’s setq!
macro. Simple setq
will
not work.
This variable determines what piece of information should be used as link
description when creating a link with orb-insert-link
:
This variable determines the ‘Description’ part from the example above. It is
an s-format
string, where special placeholders of form “${field}” will be
expanded with data from the respective BibTeX field of the associated BibTeX
entry. If the value of the field cannot be retrieved, the user will be
prompted to input a value interactively. When retrieving BibTeX data, the user
options orb-bibtex-field-aliases
and orb-bibtex-entry-get-value-function
are respected.
This variable can also be one of the following symbols:
title
- equivalent to “${title}”citekey
- equivalent to “${citekey}”
When this is set to one of the following symbols, create a citation instead of an Org link:
citation-org-ref-2
- insert an Org-ref v2 citation link, useorg-ref-default-citation-link
, default “cite:citation-key”citation-org-ref-3
- insert an Org-ref v3 citation link, useorg-ref-default-citation-link
, default “cite:&citation-key”citation-org-cite
- insert an Org-cite citation [cite:@citation-key]
In other words, orb-insert-link
can behave like a BibTeX-aware version of
org-roam-node-insert
and like an Org-roam-aware version of org-cite-insert
(or org-ref-insert-cite-link
or citar-insert-citation
) depending on the
user choice.
The global vale of this option can be overriden for a single invocation of
orb-insert-link
with a numerical prefix:
C-1 M-x orb-insert-link
forcestitle
C-2 M-x orb-insert-link
forcescitekey
C-8 M-x orb-insert-link
forcescitation-org-cite
C-9 M-x orb-insert-link
forcescitation-org-ref-3
C-0 M-x orb-insert-link
forcescitation-org-ref-2
Whether to follow the newly created link.
How the selection candidates should be presented when usinggeneric
interface:
key
- only citation keys. Fast and pretty, but too little contextual informationentry
- formatted entry. More information, but not particluarly pretty. Consider usinghelm-bibtex
orivy-bibtex
instead.
(setq org-roam-capture-templates
'(("r" "bibliography reference" plain
(file "/path/to/template.org") ; <-- template store in a separate file
:target
(file+head "references/${citekey}.org" "#+title: ${title}\n")
:unnarrowed t)))
Content of path/to/template.org
:
#+PROPERTY: type %^{entry-type}
#+FILETAGS: %^{keywords}
#+PROPERTY: authors %^{author}
In this %\1 %\3 concluded that %?
fullcite:%\1
You can also use a function to generate the template on the fly, see
org-capture-templates
for details.
Below is an example of a template ready for use with org-noter or interleave:
(setq orb-preformat-keywords
'("citekey" "title" "url" "author-or-editor" "keywords" "file")
orb-process-file-keyword t
orb-attached-file-extensions '("pdf"))
(setq org-roam-capture-templates
'(("r" "bibliography reference" plain
(file "/path/to/template")
:target
(file+head "references/${citekey}.org" "#+title: ${title}\n"))))
Content of path/to/template.org
:
- tags ::
- keywords :: %^{keywords}
* %^{title}
:PROPERTIES:
:Custom_ID: %^{citekey}
:URL: %^{url}
:AUTHOR: %^{author-or-editor}
:NOTER_DOCUMENT: %^{file} ; <== special file keyword: if more than one filename
:NOTER_PAGE: ; is available, the user will be prompted to choose
:END:
Type M-x orb-note-actions
or bind this command to a key such as C-c n a
to
quickly access additional commands that take the note’s BibTeX key as an input
and process it to perform some useful actions.
Note actions are divided into three groups: default
, extra
, and user
set
via orb-note-actions-default
, orb-note-actions-extra
,
orb-note-actions-user
, respectively. There is no big conceptual difference
between the three except that the default
note actions are commands provided
by bibtex-completion
, extra
note actions are extra commands provided by
org-roam-bibtex
, and user
note actions are left for user customization.
default
(using completing-read
), ido
, ivy
, helm
and hydra
. The interface can be set via the
orb-note-actions-interface
user variable.
(setq orb-note-actions-interface 'hydra)
Alternatively, orb-note-actions-interface
can be set to a custom function
that will provide completion for available note actions. The function must take
one argument CITEKEY, which is a list whose car
is the current note’s
citation key:
(setq orb-note-actions-interface #'my-orb-note-actions-interface)
NOTE: This variable should be set using the Customize interface,
use-package
’s :custom
keyword, or Doom’s setq!
macro. Simple setq
will
not work.
:PROPERTIES:
:ID: uuid1234-...
:ROAM_REFS: cite:Doe2020
:END:
#+title: My note
(defun my-orb-note-actions-interface (citekey) ;;; For the above note, (car citekey) => "Doe2020" ...)To install a note action, add a cons cell of format
(DESCRIPTION . FUNCTION)
to one of the note actions variables:
(with-eval-after-load 'orb-note-actions (add-to-list 'orb-note-actions-user (cons "My note action" #'my-note-action)))
A note action must take a single argument CITEKEY, which is a list whose car is the current note’s citation key:
(defun my-note-action (citekey) (let ((key (car citekey))) ...))ORB PDF Scrapper is an Emacs interface to =anystyle=, an open-source software based on powerful machine-learning algorithms. It requires
anystyle-cli
, which can be installed with
[sudo] gem install anystyle-cli
. Note that ruby
and gem
must
already be present in the system. ruby
is shipped with MacOS, but you
will have to install it on other operating systems; please refer to the
relevant section in the official documentation for ruby
. You may also
want to consult the =anystyle=
documentation to learn more about how it works.
Once anystyle-cli
is installed, ORB PDF Scrapper can be launched with
orb-note-actions
while in an Org-roam buffer containing a
#+ROAM_KEY:
BibTeX key. References are retrieved from a PDF file
associated with the note which is retrieved from the corresponding
BibTeX record.
The reference-retrieval process consists of three interactive steps described below.
In the first step, the PDF file is searched for references, which are eventually output in the ORB PDF Scrapper buffer as plain text. The buffer is in thetext-mode
major-mode for editing general text files.
You need to review the retrieved references and prepare them for the next step in such a way that there is only one reference per line. You may also need to remove any extra text captured together with the references. Some PDF files will produce a nicely-formed list of references that will require little to no manual editing, while others will need a different degree of manual intervention.
Generally, it is possible to train a custom anystyle
finder model
responsible for PDF-parsing to improve the output quality, but this is
not currently supported by ORB PDF Scrapper. As a small and somewhat
naïve aid, the sanitize text
command bound to C-c C-u
may assist in
putting each reference onto a separate line.
After you are finished with editing the text data, press C-c C-c
to
proceed to the second step.
Press C-x C-s
to save your progress or C-x C-w
to write the text
references into a file.
Press C-c C-k
anytime to abort the ORB PDF Scrapper process.
bibtex-mode
, which is helpful for reviewing and editing
the BibTeX data and correcting possible parsing errors.
Again, depending on the citation style used in the particular book or
article, the parsing quality can vary greatly and might require more or
less manual post-editing. It is possible to train a custom anystyle
parser model to improve the parsing quality. See
Training a Parser model for more details.
Press C-c C-u
to generate BibTeX keys for the records in the buffer or
C-u C-c C-u
to generate a key for the record at point. See
ORB Autokey configuration on how to
configure the BibTeX key generation. During key generation, it is also
possible to automatically set the values of BibTeX fields: see
orb-pdf-scrapper-set-fields
docstring for more details.
Press C-x C-s
to save your progress or C-x C-w
to write the BibTeX
entries into a file.
Press C-c C-r
to return to the text-editing mode in its last state.
Note that all the progress in BibTeX mode will be lost.
Press C-c C-c
to proceed to the third step. If the BibTeX buffer was
edited and the changes were not saved, e.g. by pressing C-x C-s
, you
will be prompted to generated BibTeX keys by default. The variable
orb-pdf-prompt-to-generate-keys
more finely controls this behaviour.
org-mode
.
The processing involves sorting the references into four groups under
the respective Org headlines: in-roam
, in-bib
, valid
, and
invalid
, and inserting the grouped references as either an Org
plain-list of org-ref
-style citations, or an Org table with
columns corresponding to different BibTeX fields.
in-roam
— These references have notes with the respective#+ROAM_KEY:
citation keys in theorg-roam
database.in-bib
— These references are not yet in theorg-roam
database but they are present in user BibTeX file(s) (seebibtex-completion-bibliography
).invalid
— These references matched againstorb-pdf-scrapper-invalid-key-pattern
and are considered invalid. Adjust this variable to your criteria of validity.valid
— All other references fall into this group. They look fine but are not yet in user Org-roam and BibTeX databases.
Set orb-pdf-scrapper-group-references
to nil if you do not need
reference grouping.
Review and edit the generated Org data, or press C-c C-c
to
insert the references into the note’s buffer and finish the ORB PDF
Scrapper.
Press C-x C-s
to save your progress or C-x C-w
to write the Org data
into a file.
Press C-c C-r
to return to BibTeX editing mode in its last state. Note
that all the progress in current mode will be lost.
The following user variables control the appearance of the generated Org data:
orb-pdf-scrapper-group-references
, orb-pdf-scrapper-grouped-export
,
orb-pdf-scrapper-ungrouped-export
, orb-pdf-scrapper-table-export-fields
,
orb-pdf-scrapper-list-style
, orb-pdf-scrapper-reference-numbers
,
orb-pdf-scrapper-citekey-format
. These variables can be set through the
Customize interface or with setq
. Refer to their respective docstrings in
Emacs for more information.
orb-pdf-scrapper-export-options
. Consult its docstring for a detailed
explanation. The following example demonstrates various possibilities.
(setq orb-pdf-scrapper-export-options '((org ;; <= TYPE ;; Export to a heading in the buffer of origin (heading "References (extracted by ORB PDF Scrapper)" ;; ^ ^ ;; TARGET LOCATION ;; PROPERTIES ;; v :property-drawer ("PDF_SCRAPPER_TYPE" "PDF_SCRAPPER_SOURCE" "PDF_SCRAPPER_DATE"))) (txt ;; Export to a file "references.org" (path "references.org" ;; under a heading "New references" :placement (heading "New references" :property-drawer ("PDF_SCRAPPER_TYPE" "PDF_SCRAPPER_SOURCE" "PDF_SCRAPPER_DATE") ;; Put the new heading in front of other headings :placement prepend))) (bib ;; Export to a file in an existing directory. The file name will be CITEKEY.bib (path "/path/to/references-dir/" :placement prepend ;; Include only the references that are not in the target file ;; *and* the file(s) specified in bibtex-completion-bibliography :filter-bib-entries bibtex-completion-bibliography))))Currently, the core data set (explained below) must be installed manually by the user as follows:
- Use
find
,locate
or similar tools to find the filecore.xml
buried inres/parser/
subdirectory ofanystyle
gem, e.g. =locate core.xml | grep anystyle=. On MacOS, withanystyle
installed as a system gem, the file path would look similar to:"/Library/Ruby/Gems/2.6.0/gems/anystyle-1.3.11/res/parser/core.xml"
The actual path will vary slightly depending on the currently-installed versions of
ruby
andanystyle
.On Linux and Windows, this path will be different.
- Copy this file into the location specified in
orb-anystyle-parser-training-set
, or anywhere else where you have disk-write access, and adjust the aforementioned variable accordingly.
C-c C-t
in the ORB PDF Scrapper buffer in either
text-mode or BibTeX-mode. In each case, the plain-text references
obtained in the text mode
step described above will be used to
generate source XML data for a training set.
The generated XML data replaces the text or the BibTeX references in the
ORB PDF Scrapper buffer, and the major-mode switches to xml-mode
.
The XML data must be edited manually—this is the whole point of creating a custom training model—which usually consists in simply correcting the placement of bibliographic data within the XML elements (data fields). It is extremely important to review the source data carefully since any mistakes here will make its way into the model, thereby leading to poorer parsing in the future.
It would be quite tedious to create the whole data-set by hand—
hundreds or thousands of individual bibliographic records—so the best
workflow for making a good custom data-set is to use the core data-set
shipped with anystyle
and append to it several data-sets generated in
ORB PDF Scrapper training sessions from individual PDF files,
incrementally re-training the model in between. This approach is
implemented in ORB PDF Scrapper. From personal experience, adding
references data incrementally from 4–5 PDF files raises the parser
success rate to virtually 100%. Follow the instructions described in
Prerequisites to install the core
data-set.
Once the editing is done, press C-c C-c
to train the model. The XML
data in the ORB PDF Scrapper buffer will be automatically appended to
the custom core.xml
file which will be used for training.
Alternatively, press C-c C-t
to review the updated core.xml
file and
press C-c C-c
when finished.
The major mode will now switch to fundamental-mode
, and the anystyle
stdout
output will appear in the buffer. Training the model can take
several minutes, depending on the size of the training data-set and
the computing resources available on your device. The process is run in
a shell subprocess, so you will be able to continue your work and return
to ORB PDF Scrapper buffer later.
Once the training is complete, press C-c C-c
to return to the previous
editing-mode. You can now re-generate the BibTeX data and see the
improvements achieved with the re-trained model.
orb-autokey-format
variable through the Customize interface, or by
adding a setq
form in your Emacs configuration file.
ORB Autokey format currently supports the following wildcards:
Wildcard | Field | Description |
---|---|---|
%a | author | first author’s (or editor’s) last name |
%t | title | first word of title |
%f{field} | field | first word of arbitrary field |
%y | year | year YYYY (date or year field) |
%p | page | first page |
%e{(expr)} | elisp | elisp expression |
(setq orb-autokey-format "%a%y") => "doe2020"
- Capitalized versions:
Wildcard | Field | Description |
---|---|---|
%A | author | |
%T | title | Same as %a,%t,%f{field} but |
%F{field} | field | preserve the original capitalization |
(setq orb-autokey-format "%A%y") => "Doe2020"
- Starred versions
Wildcard | Field | Description |
---|---|---|
%a, %A | author | - include author’s (editor’s) initials |
%t, %T | title | - do not ignore words in orb-autokey-titlewords-ignore |
%y | year | - year’s last two digits __YY |
%p | page | - use “pagetotal” field instead of default “pages” |
(setq orb-autokey-format "%A*%y") => "DoeJohn2020"
- Optional parameters
Wildcard | Field | Description |
---|---|---|
%a[N][M][D] | author | |
%t[N][M][D] | title | > include first N words/names |
%f{field}[N][M][D] | field | > include at most M first characters of word/name |
%p[D] | page | > put delimiter D between words |
N
and M
should be a single digit 1-9
. Putting more digits or any
other symbols will lead to ignoring the optional parameter and those
following it altogether. D
should be a single alphanumeric symbol or
one of -_.:|
.
Optional parameters work both with capitalized and starred versions where applicable.
(setq orb-autokey-format "%A*[1][4][-]%y") => "DoeJ2020" (setq orb-autokey-format "%A*[2][7][-]:%y") => "DoeJohn-DoeJane:2020"
- Elisp expression
- can be anything
- should return a string or nil
- will be evaluated before expanding other wildcards and therefore can be used to insert other wildcards
- will have entry variable bound to the value of BibTeX entry the key is being generated for, as returned by bibtex-completion-get-entry. The variable may be safely manipulated in a destructive manner.
%e{(or (bibtex-completion-get-value "volume" entry) "N/A")} %e{(my-function entry)}Check variables
orb-autokey-invalid-symbols
,
orb-autokey-empty-field-token
, orb-autokey-titlewords-ignore
for
additional settings.
You may be add each of the below sections to org-roam-mode-sections
to show them information in the org-roam-buffer
, for example, the
below configuration will show a formatted BibTeX reference, formatted
BibTeX abstract, and any backlinks/reflinks to the node in the
org-roam-buffer
.
(setf org-roam-mode-sections '(orb-section-reference
orb-section-abstract
org-roam-backlinks-section
org-roam-reflinks-section))
The orb-section-reference
section shows a formatted reference for
the current node.
This variable may be set to a function or an alist. If set to a
function, it should take a BibTeX key and return a formatted
reference. By default, it is set to the function
bibtex-completion-apa-format-reference
. When setting this to an
alist, keys should be the various types of BibTeX entry as strings,
and values should be an s-format
-compatible format string for the
BibTeX entry.
The orb-section-abstract
section shows a formatted abstract for the
current node.
This variable allows configuration of how the abstract is retrieved
and formatted. It can be a function which takes a BibTeX key and
returns a formatted abstract, or it can be one of two symbols. The
first is :org-format
, which will retrieve the org-formatted text
from the BibTeX record and fontify it as appropriate. The second is
:pandoc-from-tex
which will retrieve a LaTeX-formatted abstract from
the BibTeX record, use pandoc to convert it to org-format, and then
formatted as in :org-format
.
The orb-section-file
section shows a link to the PDF or similar file
for a node.
orb-anystyle
provides a convenient Elisp key–value
interface to anystyle-cli
, and can be used anywhere else within Emacs.
Check its docstring for more information. You may also want to consult
=anystyle-cli= documentation.
This Elisp expression:
(orb-anystyle 'parse :format 'bib :stdout nil :overwrite t :input "Doe2020.txt " :output "bib" :parser-model "/my/custom/model.mod")
…executes the following anystyle call:
anystyle --no-stdout --overwrite -F "/my/custom/model.mod" -f bib parse "Doe2020.txt" "bib"
The following variables can be used to configure orb-anystyle
and the
default command-line options that will be passed to anystyle
:
orb-anystyle-executable
orb-anystyle-user-directory
orb-anystyle-default-buffer
orb-anystyle-find-crop
orb-anystyle-find-layout
orb-anystyle-find-solo
orb-anystyle-finder-training-set
orb-anystyle-finder-model
orb-anystyle-parser-model
orb-anystyle-parser-training-set
orb-anystyle-pdfinfo-executable
orb-anystyle-pdftotext-executable