diff --git a/CHANGELOG.md b/CHANGELOG.md
index f403363..326d9d6 100644
--- a/CHANGELOG.md
+++ b/CHANGELOG.md
@@ -10,7 +10,7 @@ Well, at least we try!
## [0.2.3] - 2020-05-10
### Added
-- `orb--replace-virtual-field` and `orb--virtual-fields-alist` for
+- `orb--replace-virtual-fields` and `orb--virtual-fields-alist` for
mapping `bibtex-completion` virtual field names to more conventional
words, namely these:
``` elisp
diff --git a/README.md b/README.md
index b2d077e..77671ce 100644
--- a/README.md
+++ b/README.md
@@ -249,14 +249,19 @@ notes from the completion-list.
Type `M-x orb-note-actions` to easily access additional commands useful
in note's context. These commands are run with the note's BibTeX key
as an argument. The key is taken from the `#+ROAM_KEY:` file property.
-See section [`Orb Note Actions`](#orb-note-actions-section) for
+See section [ORB Note Actions](#orb-note-actions-section) for
details.
Configuration
---------------
-### Org Roam BibTeX - BibTeX aware capture template expansion
+The following sections use Emacs Lisp examples to configure Org Roam
+BibTeX. If you are not comfortable with Lisp yet, remember you can
+always use the Customize interface to achieve the same, run `M-x
+customize` or from menu click `Options -> Customize Emacs -> Top Level
+Customization Group` and search for `org-roam-bibtex`.
+### Org Roam BibTeX - BibTeX aware capture template expansion
#### `orb-templates`
This variable specifies the templates to use when creating a new
@@ -427,7 +432,7 @@ Below shows how this can be used to integrate with
Do not forget to escape the quotes inside the `%`-escapes form!
-### Orb Note Actions - BibTeX record-related commands
+### ORB Note Actions - BibTeX record-related commands
#### Overview
Type `M-x orb-note-actions` or bind this command to a key such as `C-c
@@ -474,9 +479,8 @@ is the current note's citation key:
#### Adding new note actions
-To install a note action, add a cons
-cell of format `(DESCRIPTION . FUNCTION)` to one of the note actions
-variables:
+To install a note action, add a cons cell of format `(DESCRIPTION
+. FUNCTION)` to one of the note actions variables:
``` el
(with-eval-after-load 'orb-note-actions
@@ -489,13 +493,321 @@ whose car is the current note's citation key:
``` el
(defun my-note-action (citekey)
(let ((key (car citekey)))
- ...
- ))
+ ...))
```
+### ORB PDF Scrapper - Retrieve references from PDFs
+#### Overview
+
+ORB PDF Scrapper is an Emacs interface to
+[`anystyle`](https://github.com/inukshuk/anystyle), an open-source software
+based on powerful machine-learning algorithms. It requires `anystyle-cli`,
+which can be installed with `[sudo] gem install anystyle-cli`. Note that
+`ruby` and `gem` must already be present in the system. `ruby` is shipped
+with MacOS, but you will have to install it on other operating systems; please
+refer to the relevant section in the official documentation for `ruby`. You
+may also want to consult the [`anystyle`
+documentation](https://rubydoc.info/gems/anystyle) to learn more about how it
+works.
+
+Once `anystyle-cli` is installed, ORB PDF Scrapper can be launched with
+`orb-note-actions` while in an Org-roam buffer containing a `#+ROAM_KEY:`
+BibTeX key. References are retrieved from a PDF file associated with the note
+which is retrieved from the corresponding BibTeX record.
+
+The reference-retrieval process consists of three interactive steps described
+below.
+
+#### Text mode
+In the first step, the PDF file is searched for references, which are
+eventually output in the ORB PDF Scrapper buffer as plain text. The
+buffer is in the `text-mode` major-mode for editing general text
+files.
+
+You need to review the retrieved references and prepare them for the next step
+in such a way that there is only one reference per line. You may also need to
+remove any extra text captured together with the references. Some PDF files
+will produce a nicely-formed list of references that will require little to no
+manual editing, while others will need a different degree of manual
+intervention.
+
+Generally, it is possible to train a custom `anystyle` finder model
+responsible for PDF-parsing to improve the output quality, but this is
+not currently supported by ORB PDF Scrapper. As a small and somewhat
+naïve aid, the `sanitize text` command bound to `C-c C-u` may assist
+in putting each reference onto a separate line.
+
+After you are finished with editing the text data, press `C-c C-c` to
+proceed to the second step.
+
+Press `C-c C-k` anytime to abort the ORB PDF Scrapper process.
+
+#### BibTeX mode
+In the second step, the obtained list of plain text references, one
+reference per line, is parsed and converted into BibTeX format. The
+resulting BibTeX records are presented to the user in the ORB PDF
+Scrapper buffer replacing the text references. The buffer's major
+mode switches to `bibtex-mode`, which is helpful for reviewing and
+editing the BibTeX data and correcting possible parsing errors.
+
+Again, depending on the citation style used in the particular book or article,
+the parsing quality can vary greatly and might require more or less manual
+post-editing. It is possible to train a custom `anystyle` parser model to
+improve the parsing quality. See [Training a Parser
+model](#training-a-parser-model) for more details.
+
+Press `C-c C-u` to generate BibTeX keys for the records in the buffer or `C-u
+C-c C-u` to generate a key for the record at point. See [ORB Autokey
+configuration](#orb-autokey-configuration) on how to configure the BibTeX key
+generation. During key generation, it is also possible to automatically set
+the values of BibTeX fields: see `orb-pdf-scrapper-set-fields` docstring for
+more details.
+
+Press `C-c C-r` to return to the text-editing mode in its last state. Note
+that all the progress in BibTeX mode will be lost.
+
+Press `C-c C-c` to proceed to the third step.
+
+#### Org mode
+In the third step, the BibTeX records are processed internally by ORB PDF
+Scrapper, and the result replaces the BibTeX data in the ORB PDF Scrapper,
+which switches to `org-mode`.
+
+The processing involves sorting the references into four groups under
+the respective Org-mode headlines: `in-roam`, `in-bib`, `valid`, and
+`invalid`, and inserting the grouped references as either an Org-mode
+plain-list of `org-ref`-style citations, or an Org-mode table with
+columns corresponding to different BibTeX fields.
+
+* `in-roam` --- These references have notes with the respective
+ `#+ROAM_KEY:` citation keys in the `org-roam` database.
+* `in-bib` --- These references are not yet in the `org-roam` database
+ but they are present in user BibTeX file(s) (see
+ `bibtex-completion-bibliography`).
+* `invalid` --- These references matched against
+ `orb-pdf-scrapper-invalid-key-pattern` and are considered invalid.
+ Adjust this variable to your criteria of validity.
+* `valid` --- All other references fall into this group. They look
+ fine but are not yet in user Org-roam and BibTeX databases.
+
+Review and edit the generated Org-mode data, or press `C-c C-c` to
+insert the references into the note's buffer and finish the ORB PDF
+Scrapper.
+
+Press `C-c C-r` to return to BibTeX editing mode in its last state.
+Note that all the progress in current mode will be lost.
+
+The following user variables control the appearance of the generated
+Org-mode data: `orb-pdf-scrapper-refsection-headings`,
+`orb-pdf-scrapper-export-fields`. These variables can be set through
+the Customize interface or with `setq`. Refer to their respective
+docstrings in Emacs for more information.
+
+#### Training a Parser model
+##### Prerequisites
+Currently, the core data set (explained below) must be installed manually by the user as follows:
+
+1. Use `find`, `locate` or similar tools to find the file `core.xml` buried in
+ `res/parser/` subdirectory of `anystyle` gem, e.g. `locate core.xml | grep
+ anystyle`. On MacOS, with `anystyle` installed as a system gem, the file
+ path would look similar to:
+
+ `"/Library/Ruby/Gems/2.6.0/gems/anystyle-1.3.11/res/parser/core.xml"`
+
+ The actual path will vary slightly depending on the currently-installed
+ versions of `ruby` and `anystyle`.
+
+ On Linux and Windows, this path will be different.
+2. Copy this file into the location specified in
+ `orb-anystyle-parser-training-set`, or anywhere else where you have
+ disk-write access, and adjust the aforementioned variable accordingly.
+
+##### Running a training session
+Training a custom parser model on custom user data will greatly improve the
+parsing of plain-text references. A training session can be initiated by
+pressing `C-c C-t` in the ORB PDF Scrapper buffer in either text-mode or
+BibTeX-mode. In each case, the plain-text references obtained in the `text
+mode` step described above will be used to generate source XML data for
+a training set.
+
+The generated XML data replaces the text or the BibTeX references in the
+ORB PDF Scrapper buffer, and the major-mode switches to `xml-mode`.
+
+The XML data must be edited manually---this is the whole point of creating
+a custom training model---which usually consists in simply correcting the
+placement of bibliographic data within the XML elements (data fields). It is
+extremely important to review the source data carefully since any mistakes
+here will make its way into the model, thereby leading to poorer parsing in
+the future.
+
+It would be quite tedious to create the whole data-set by hand--- hundreds or
+thousands of individual bibliographic records---so the best workflow for
+making a good custom data-set is to use the core data-set shipped with
+`anystyle` and append to it several data-sets generated in ORB PDF Scrapper
+training sessions from individual PDF files, incrementally re-training the
+model in between. This approach is implemented in ORB PDF Scrapper. From
+personal experience, adding references data incrementally from 4--5 PDF files
+raises the parser success rate to virtually 100%. Follow the instructions
+described in [Prerequisites](#parser-model-prerequisites) to install the core
+data-set.
+
+Once the editing is done, press `C-c C-c` to train the model. The XML data in
+the ORB PDF Scrapper buffer will be automatically appended to the custom
+`core.xml` file which will be used for training. Alternatively, press `C-c
+C-t` to review the updated `core.xml` file and press `C-c C-c` when finished.
+
+The major mode will now switch to `fundamental-mode`, and the `anystyle`
+`stdout` output will appear in the buffer. Training the model can take
+_several minutes_, depending on the size of the training data-set and the
+computing resources available on your device. The process is run in a shell
+subprocess, so you will be able to continue your work and return to ORB PDF
+Scrapper buffer later.
+
+Once the training is complete, press `C-c C-c` to return to the previous
+editing-mode. You can now re-generate the BibTeX data and see the
+improvements achieved with the re-trained model.
+
+#### ORB Autokey configuration
+#### `orb-autokey-format`
+You can specify the format of autogenerated BibTeX keys by setting the
+`orb-autokey-format` variable through the Customize interface, or by adding
+a `setq` form in your Emacs configuration file.
+
+ORB Autokey format currently supports the following wildcards:
+
+###### Basic
+
+| Wildcard | Field | Description |
+|:-----------|:-------|:---------------------------------------|
+| %a | author | first author's (or editor's) last name |
+| %t | title | first word of title |
+| %f{field} | field | first word of arbitrary field |
+| %y | year | year YYYY (date or year field) |
+| %p | page | first page |
+| %e{(expr)} | elisp | elisp expression |
+
+``` el
+(setq orb-autokey-format "%a%y") => "doe2020"
+```
+
+###### Extended
+
+1. Capitalized versions:
+
+| Wildcard | Field | Description |
+|:----------|:-------|:-------------------------------------|
+| %A | author | |
+| %T | title | Same as %a,%t,%f{field} but |
+| %F{field} | field | preserve the original capitalization |
+
+``` el
+(setq orb-autokey-format "%A%y") => "Doe2020"
+```
+
+2. Starred versions
+
+| Wildcard | Field | Description |
+|:---------|:-------|:-------------------------------------------------------|
+| %a, %A | author | - include author's (editor's) initials |
+| %t, %T | title | - do not ignore words in orb-autokey-titlewords-ignore |
+| %y | year | - year's last two digits __YY |
+| %p | page | - use "pagetotal" field instead of default "pages" |
+
+``` el
+(setq orb-autokey-format "%A*%y") => "DoeJohn2020"
+```
+
+3. Optional parameters
+
+| Wildcard | Field | Description |
+|:-------------------|:-------|:--------------------------------------------------|
+| %a[N][M][D] | author | |
+| %t[N][M][D] | title | > include first N words/names |
+| %f{field}[N][M][D] | field | > include at most M first characters of word/name |
+| %p[D] | page | > put delimiter D between words |
+
+`N` and `M` should be a single digit `1-9`. Putting more digits or any
+other symbols will lead to ignoring the optional parameter and those
+following it altogether. `D` should be a single alphanumeric symbol or
+one of `-_.:|`.
+
+Optional parameters work both with capitalized and starred versions
+where applicable.
+
+``` el
+(setq orb-autokey-format "%A*[1][4][-]%y") => "DoeJ2020"
+(setq orb-autokey-format "%A*[2][7][-]:%y") => "DoeJohn-DoeJane:2020"
+```
+
+4. Elisp expression
+
+* can be anything
+* should return a string or nil
+* will be evaluated before expanding other wildcards and therefore can
+ be used to insert other wildcards
+* will have entry variable bound to the value of BibTeX entry the key
+ is being generated for, as returned by
+ bibtex-completion-get-entry. The variable may be safely manipulated
+ in a destructive manner.
+
+``` el
+%e{(or (bibtex-completion-get-value "volume" entry) "N/A")}
+%e{(my-function entry)}
+```
+
+##### Other variables
+
+Check variables `orb-autokey-invalid-symbols`,
+`orb-autokey-empty-field-token`, `orb-autokey-titlewords-ignore` for
+additional settings.
+
+#### Orb Anystyle
+
+The function `orb-anystyle` provides a convenient Elisp key--value interface
+to `anystyle-cli`, and can be used anywhere else within Emacs. Check its
+docstring for more information. You may also want to consult [`anystyle-cli`
+documentation](https://rubydoc.info/gems/anystyle).
+
+###### Example
+This Elisp expression:
+``` el
+(orb-anystyle 'parse
+ :format 'bib
+ :stdout nil
+ :overwrite t
+ :input "Doe2020.txt "
+ :output "bib"
+ :parser-model "/my/custom/model.mod")
+```
+
+…executes the following anystyle call:
+
+``` sh
+anystyle --no-stdout --overwrite -F "/my/custom/model.mod" -f bib parse "Doe2020.txt" "bib"
+```
+
+The following variables can be used to configure `orb-anystyle` and
+the default command-line options that will be passed to `anystyle`:
+
+###### `orb-anystyle`
+* `orb-anystyle-executable`
+* `orb-anystyle-user-directory`
+* `orb-anystyle-default-buffer`
+
+###### Default command-line options
+* `orb-anystyle-find-crop`
+* `orb-anystyle-find-layout`
+* `orb-anystyle-find-solo`
+* `orb-anystyle-finder-training-set`
+* `orb-anystyle-finder-model`
+* `orb-anystyle-parser-model`
+* `orb-anystyle-parser-training-set`
+* `orb-anystyle-pdfinfo-executable`
+* `orb-anystyle-pdftotext-executable`
Community
---------------
-For help, support, or if you just want to hang out with us, you can find us here:
+For help, support, or if you just want to
+hang out with us, you can find us here:
* **IRC**: channel **#org-roam** on [freenode](https://freenode.net/kb/answer/chat)
* **Slack**: channel **#org-roam-bibtex** on [Org Roam](https://join.slack.com/t/orgroam/shared_invite/zt-deoqamys-043YQ~s5Tay3iJ5QRI~Lxg)
diff --git a/orb-anystyle.el b/orb-anystyle.el
new file mode 100644
index 0000000..0ee69aa
--- /dev/null
+++ b/orb-anystyle.el
@@ -0,0 +1,394 @@
+;;; orb-anystyle.el --- Orb Roam BibTeX: Elisp interface to anystyle -*- coding: utf-8; lexical-binding: t -*-
+
+;; Copyright © 2020 Mykhailo Shevchuk
+;; Copyright © 2020 Leo Vivier
+
+;; Author: Mykhailo Shevchuk
+;; Leo Vivier
+;; URL: https://github.com/org-roam/org-roam-bibtex
+;; Keywords: org-mode, roam, convenience, bibtex, helm-bibtex, ivy-bibtex, org-ref
+;; Version: 0.2.3
+
+;; This file is NOT part of GNU Emacs.
+
+;; This program is free software; you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+;;
+;; This program is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+;; GNU General Public License for more details.
+;;
+;; You should have received a copy of the GNU General Public License
+;; along with GNU Emacs; see the file COPYING. If not, write to the
+;; Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor,
+;; Boston, MA 02110-1301, USA.
+
+;; N.B. This file contains code snippets adopted from other
+;; open-source projects. These snippets are explicitly marked as such
+;; in place. They are not subject to the above copyright and
+;; authorship claims.
+
+;;; Commentary:
+;;
+
+;;; Code:
+;; * Library requires
+
+(require 'orb-core)
+
+(eval-when-compile
+ (require 'subr-x)
+ (require 'cl-macs))
+
+;; * Customize definitions
+
+(defcustom orb-anystyle-executable "anystyle"
+ "Anystyle executable path or program name."
+ :type '(choice (const "anystyle")
+ (file :tag "Path to executable" :must-match t))
+ :group 'orb-anystyle)
+
+(defcustom orb-anystyle-pdfinfo-executable nil
+ "Path to pdfinfo executable to be passed to anystyle.
+When this is nil, anystyle will look for it in the system path."
+ :type '(choice
+ (file :tag "Path to executable")
+ (const nil))
+ :group 'orb-anystyle)
+
+(defcustom orb-anystyle-pdftotext-executable nil
+ "Path to pdftotext executable to be passed to anystyle.
+When this is nil, anystyle will look for it in the system path."
+ :type '(choice
+ (file :tag "Path to executable")
+ (const nil))
+ :group 'orb-anystyle)
+
+(defcustom orb-anystyle-parser-model nil
+ "Path to anystyle custom parser model."
+ :type '(choice
+ (file :tag "Path to file" :must-match t)
+ (const :tag "Built-in" nil))
+ :group 'orb-anystyle)
+
+(defcustom orb-anystyle-finder-model nil
+ "Path to anystyle custom finder model."
+ :type '(choice
+ (file :tag "Path to file" :must-match t)
+ (const :tag "Built-in" nil))
+ :group 'orb-anystyle)
+
+;; --crop is currently broken upstream
+
+(defcustom orb-anystyle-find-crop nil
+ "Crop value in pt to be passed to `anystyle find'.
+An integer or a conc cell of integers."
+ :type '(choice (integer :tag "Top and bottom")
+ (cons :tag "Top, bottom, left and right"
+ (integer :tag "Top and bottom")
+ (integer :tag "Left and right"))
+ (const :tag "Do not crop" nil))
+ :group 'orb-anystyle)
+
+(defcustom orb-anystyle-find-solo nil
+ "Non-nil to pass the `--solo' flag."
+ :type '(choice (const :tag "Yes" t)
+ (const :tag "No" nil))
+ :group 'orb-anystyle)
+
+(defcustom orb-anystyle-find-layout nil
+ "Non-nil to pass the `--layout' flag."
+ :type '(choice (const :tag "Yes" t)
+ (const :tag "No" nil))
+ :group 'orb-anystyle)
+
+(defcustom orb-anystyle-default-buffer "*Orb Anystyle Output*"
+ "Default buffer name for anystyle output."
+ :type 'string
+ :group 'orb-anystyle)
+
+(defcustom orb-anystyle-user-directory
+ (concat (file-name-as-directory user-emacs-directory) "anystyle")
+ "Directory to keep anystyle user files."
+ :type 'directory
+ :group 'orb-anystyle)
+
+(defcustom orb-anystyle-parser-training-set
+ (concat (file-name-as-directory orb-anystyle-user-directory) "core.xml")
+ "XML file containing parser training data."
+ :type '(file :must-match t)
+ :group 'anystyle)
+
+(defcustom orb-anystyle-finder-training-set
+ (f-join (file-name-as-directory orb-anystyle-user-directory) "ttx/")
+ "Directory containing finder training data (.ttx files)."
+ :type 'directory
+ :group 'anystyle)
+
+;; * Main functions
+
+;;;###autoload
+(cl-defun orb-anystyle (command
+ &key (exec orb-anystyle-executable)
+ verbose help version adapter
+ ((:finder-model fmodel) orb-anystyle-finder-model)
+ ((:parser-model pmodel) orb-anystyle-parser-model)
+ (pdfinfo orb-anystyle-pdfinfo-executable)
+ (pdftotext orb-anystyle-pdftotext-executable)
+ format stdout overwrite
+ (crop orb-anystyle-find-crop)
+ (solo orb-anystyle-find-solo)
+ (layout orb-anystyle-find-layout)
+ input output
+ (buffer orb-anystyle-default-buffer))
+ "Run anystyle COMMAND with `shell-command'.
+ARGS is a plist with the following recognized keys:
+
+Anystyle CLI options
+==========
+1) EXEC :exec => string (valid executable)
+- default value can be set through `orb-anystyle-executable'
+
+2) COMMAND :command => symbol or string
+- valid values: find parse help check license train
+
+3) Global options can be passed with the following keys.
+
+FMODEL :finder-model => string (valid file path)
+PMODEL :parser-model => string (valid file path)
+PDFINFO :pdfinfo => string (valid executable)
+PDFTOTEXT :pdftotext => string (valid executable)
+ADAPTER :adapter => anything
+STDOUT :stdout => boolean
+HELP :help => boolean
+VERBOSE :verbose => boolean
+VERSION :version => boolean
+OVERWRITE :overwrite => boolean
+FORMAT :format => string, symbol or list of unquoted symbols
+
+- FORMAT must be one or more output formats accepted by anystyle commands:
+ parse => bib csl json ref txt xml
+ find => bib csl json ref txt ttx xml
+- string must be space- or comma-separated, additional spaces are
+ ignored
+
+Default values for some of these options can be set globally via
+the following variables: `orb-anystyle-finder-model',
+`orb-anystyle-parser-model', `orb-anystyle-pdfinfo-executable',
+`orb-anystyle-pdftotext-executable'.
+
+4) Command options can be passed with the following keys:
+
+CROP :crop => integer or cons cell of integers
+LAYOUT :layout => boolean
+SOLO :solo => boolean
+
+- Command options are ignored for commands other than find
+- anystyle help -c flag is not supported
+
+Default values for these options can be set globally via the
+following variables: `orb-anystyle-find-crop',
+`orb-anystyle-find-layout', `orb-anystyle-find-solo'.
+
+5) INPUT :input => string (file path)
+
+6) OUTPUT :output => string (file path)
+
+`shell-command'-related options
+==========
+
+7) BUFFER :buffer => buffer-or-name
+
+- `shell-command''s OUTPUT-BUFFER
+- can be a cons cell (OUTPUT-BUFFER . ERROR-BUFFER)
+- when nil, defaults to `orb-anystyle-default-buffer'
+
+anystyle CLI command synopsis:
+anystyle [global options] command [command options] [arguments...].
+
+Homepage: https://anystyle.io
+Github: https://github.com/inukshuk/anystyle-cli
+Courtesy of its authors."
+ (declare (indent 1))
+ (let* ((commands '(list find parse check train help license))
+ (exec (executable-find exec))
+ (buf (if (consp buffer) buffer (list buffer)))
+ ;; '(a b c) => "a,b,c"
+ (to-string (lambda (str)
+ (--reduce-from
+ (format "%s,%s" acc it)
+ (car str) (cdr str))))
+ ;; debug
+ ;; (anystyle-run (lambda (str)
+ ;; (message "command: %s \nbuffers: %s and %s" str (car buf) (cdr buf))))
+ (anystyle-run (lambda (str)
+ (if (eq command 'train)
+ ;; train can take minutes, so run it in a sub-process
+ (start-process-shell-command
+ "anystyle" (car buf) str)
+ (shell-command str
+ (car buf) (cdr buf)))))
+ global-options command-options anystyle)
+ ;; executable is a must
+ (unless exec
+ (user-error "Anystyle executable not found! \
+Install anystyle-cli before running Orb PDF Scrapper"))
+ ;; we process :version and :help before checking command
+ ;; since with this global flag command is not required
+ (cond
+ ;; help flag takes priority
+ (help
+ (setq global-options " --help"
+ command-options ""
+ input nil
+ output nil))
+ ;; anystyle ignores everything with --version flag except the
+ ;; --help flag, which we've just resolved above
+ (version
+ (setq global-options "--version"
+ command nil
+ command-options ""
+ input nil
+ output nil))
+ ;; otherwise command is a must
+ ((not command)
+ (user-error "Anystyle command required: \
+find, parse, check, train, help or license")))
+ (when (stringp command)
+ (setq command (intern command)))
+ ;; command must be a valid command
+ (unless (memq command commands)
+ (user-error "Invalid command %s. Valid commands are \
+find, parse, check, train, help and license" command))
+ ;;
+ ;; command specific arguments
+ (cl-case command
+ ('help
+ (when (stringp input)
+ (setq input (intern input)))
+ (unless (or (and global-options
+ (string= global-options " --help"))
+ (memq input commands))
+ (user-error "Invalid input %s. Valid input for 'anystyle help': \
+find, parse, check, train, help or license" input)))
+ ('license
+ (setq input nil
+ output nil
+ global-options ""
+ command-options ""))
+ ('check
+ (setq output nil))
+ ('find
+ ;; pdfinfo and pdftotext must be present in the system
+ (when (and pdfinfo (not (executable-find pdfinfo)))
+ (user-error "Executable not found: pdfinfo, %s" pdfinfo))
+ (when (and pdftotext (not (executable-find pdftotext)))
+ (user-error "Executable not found: pdftotext, %s" pdftotext))
+ (setq global-options
+ (orb--format "%s" global-options
+ " --pdfinfo=\"%s\"" pdfinfo
+ " --pdftotext=\"%s\"" pdftotext))
+ ;; Command options
+ ;; N.B. Help command accepts a command option -c but it's totally
+ ;; irrelevant for us:
+ ;;
+ ;; [COMMAND OPTIONS]
+ ;; -c - List commands one per line, to assist with shell completion
+ ;; so we do not implement it
+ ;;
+ ;; :crop value should be integer; if no value was explicitly supplied,
+ ;; use the default from `orb-anystyle-find-crop'
+ (when crop
+ (unless (consp crop)
+ (setq crop (list crop)))
+ (let ((x (car crop))
+ (y (or (cdr crop) 0)))
+ (unless (and (integerp x)
+ (integerp y))
+ (user-error "Invalid value %s,%y. Number expected" x y))
+ (setq crop (format "%s,%s" x y))))
+ ;; parse only accepts --[no]-layout, so we ignore the rest
+ ;; append command options to command
+ (setq command-options
+ (orb--format " --crop=%s" crop
+ " --layout" (cons layout " --no-layout")
+ " --solo" (cons solo " --no-solo")))))
+ ;; Arguments relevant for more than one command
+ ;;
+ ;; find, parse:
+ ;; format option should be one of accepted types if present
+ (when (and (memq command '(find parse))
+ format)
+ (when (stringp format)
+ (setq format
+ (-map #'intern
+ (split-string (string-trim format)
+ "[, ]" t " "))))
+ (unless (listp format)
+ (setq format (list format)))
+ (let ((accepted-formats
+ (cl-case command
+ ('find '(bib csl json ref txt ttx xml))
+ ('parse '(bib csl json ref txt xml)))))
+ (when (--none? (memq it accepted-formats) format)
+ (user-error
+ "Invalid format(s) %s. Valid formats for command %s: %s"
+ (funcall to-string format)
+ command
+ (funcall to-string accepted-formats)))
+ ;; convert format to a comma-separated string and append
+ ;; it to global options
+ (setq global-options
+ (orb--format "%s" global-options
+ " -f %s" (funcall to-string format)))))
+ ;; find, parse, check accept
+ ;; finder and parser models
+ (when (memq command '(find parse check))
+ (when (and fmodel (not (f-exists? fmodel)))
+ (display-warning 'org-roam-bibtex
+ "Finder model file not found: %s, \
+using the default one" fmodel)
+ (setq fmodel nil))
+ (when (and pmodel (not (f-exists? pmodel)))
+ (display-warning 'org-roam-bibtex
+ "Finder model file not found: %s, \
+using the default one" pmodel)
+ (setq pmodel nil))
+ (setq global-options (orb--format "%s" global-options
+ " -F \"%s\"" fmodel
+ " -P \"%s\"" pmodel)))
+ ;; find, train, parse and check:
+ ;; 1) require input, which should be a valid path
+ ;; 2) something called ruby adapter, probably a right place here
+ ;; 3) --verbose, --stdout, --overwrite if non-nil
+ (when (memq command '(find train parse check))
+ (unless input
+ (user-error "Input required for command %s" command))
+ (unless (and (stringp input) (f-exists? input))
+ (user-error "Invalid input file or directory %s" input))
+ (setq global-options
+ (orb--format
+ "%s" global-options
+ " --verbose" (cons verbose " --no-verbose")
+ ;; this flag does nothing for check
+ " --stdout" (cons stdout " --no-stdout")
+ " --adapter=\"%s\"" adapter
+ " --overwrite" (cons overwrite " --no-overwrite"))))
+ ;; Set arguments and run the program
+ ;;
+ (setq anystyle (orb--format "%s" exec
+ "%s" global-options
+ " %s" command
+ "%s" command-options
+ " \"%s\"" input
+ " \"%s\"" output))
+ (funcall anystyle-run anystyle)))
+
+(provide 'orb-anystyle)
+;;; orb-anystyle.el ends here
+;; Local Variables:
+;; fill-column: 79
+;; End:
diff --git a/orb-compat.el b/orb-compat.el
index d68de7c..dda466e 100644
--- a/orb-compat.el
+++ b/orb-compat.el
@@ -1,4 +1,4 @@
-;;; org-roam-bibtex-compat.el --- Connector between Org-roam, BibTeX-completion, and Org-ref -*- coding: utf-8; lexical-binding: t -*-
+;;; org-roam-bibtex-compat.el --- Org Roam BibTeX: Obsolete definitions -*- coding: utf-8; lexical-binding: t -*-
;; Copyright © 2020 Mykhailo Shevchuk
;; Copyright © 2020 Leo Vivier
diff --git a/orb-core.el b/orb-core.el
index cf9b2ef..b07278f 100644
--- a/orb-core.el
+++ b/orb-core.el
@@ -43,23 +43,50 @@
(require 'org-roam)
(require 'orb-utils)
+(require 'orb-compat)
+
+(eval-when-compile
+ (require 'cl-macs)
+ (require 'subr-x)
+ (require 'rx))
(declare-function
bibtex-completion-get-entry "bibtex-completion" (entry-key))
+(declare-function
+ bibtex-completion-get-value "bibtex-completion" (field entry &optional default))
(declare-function
bibtex-completion-find-pdf (key-or-entry &optional find-additional))
-;; Customize groups
+;; * Customize groups (global)
+;; All modules should put their `defgroup' definitions here
+
(defgroup org-roam-bibtex nil
"Org-ref and Bibtex-completion integration for Org-roam."
:group 'org-roam
:prefix "orb-")
(defgroup orb-note-actions nil
- "Orb Note Actions - run actions useful in note's context."
+ "Orb Note Actions - run actions in note's context."
:group 'org-roam-bibtex
:prefix "orb-note-actions-")
+(defgroup orb-pdf-scrapper nil
+ "Orb PDF Scrapper - retrieve references from PDF."
+ :group 'org-roam-bibtex
+ :prefix "orb-pdf-scrapper-")
+
+(defgroup orb-anystyle nil
+ "Elisp interface to `anystyle-cli`."
+ :group 'org-roam-bibtex
+ :prefix "orb-anystyle-")
+
+(defgroup orb-autokey nil
+ "Automatic generation of BibTeX citation keys."
+ :group 'org-roam-bibtex
+ :prefix "orb-autokey-")
+
+;; Various utility functions
+
;;;###autoload
(defun orb-process-file-field (citekey)
"Process the 'file' BibTeX field and resolve if there are multiples.
@@ -84,6 +111,322 @@ Returns the path to the note file, or nil if it doesn’t exist."
(let* ((completions (org-roam--get-ref-path-completions)))
(plist-get (cdr (assoc citekey completions)) :path)))
+;; * Automatic generation of citation keys
+
+(defcustom orb-autokey-format "%a%y%T[4][1]"
+ "Format string for automatically generated citation keys.
+
+Supported wildcards:
+
+Basic
+==========
+
+ %a |author| - first author's (or editor's) last name
+ %t |title | - first word of title
+ %f{field} |field | - first word of arbitrary field
+ %y |year | - year YYYY
+ %p |page | - first page
+ %e{(expr)} |elisp | - execute elisp expression
+
+Extended
+==========
+
+1. Capitalized versions:
+
+ %A |author| >
+ %T |title | > Same as %a,%t,%f{field} but
+ %F{field} |field | > preserve original capitalization
+
+2. Starred versions
+
+ %a*, %A* |author| - include author's (editor's) initials
+ %t*, %T* |title | - do not ignore words in `orb-autokey-titlewords-ignore'
+ %y* |year | - year's last two digits __YY
+ %p* |page | - use \"pagetotal\" field instead of default \"pages\"
+
+3. Optional parameters
+
+ %a[N][M][D] |author| >
+ %t[N][M][D] |title | > include first N words/names
+ %f{field}[N][M][D] |field | > include at most M first characters of word/name
+ %p[D] |page | > put delimiter D between words
+
+N and M should be a single digit 1-9. Putting more digits or any
+other symbols will lead to ignoring the optional parameter and
+those following it altogether. D should be a single alphanumeric
+symbol or one of `-_.:|'.
+
+Optional parameters work both with capitalized and starred
+versions where applicable.
+
+4. Elisp expression
+
+ - can be anything
+ - should return a string or nil
+ - will be evaluated before expanding other wildcards and therefore
+can insert other wildcards
+ - will have `entry' variable bound to the value of BibTeX entry the key
+is being generated for, as returned by `bibtex-completion-get-entry'.
+The variable may be safely manipulated in a destructive manner.
+
+%e{(or (bibtex-completion-get-value \"volume\" entry) \"N/A\")}
+%e{(my-function entry)}
+
+Key generation is performed by `orb-autokey-generate-key'."
+ :risky t
+ :type 'string
+ :group 'org-roam-bibtex)
+
+(defcustom orb-autokey-titlewords-ignore
+ '("A" "An" "On" "The" "Eine?" "Der" "Die" "Das"
+ "[^[:upper:]].*" ".*[^[:upper:][:lower:]0-9].*")
+ "Patterns from title that will be ignored during key generation.
+Every element is a regular expression to match parts of the title
+that should be ignored during automatic key generation. Case
+sensitive."
+ ;; Default value was take from `bibtex-autokey-titleword-ignore'.
+ :type '(repeat :tag "Regular expression" regexp)
+ :group 'orb-autokey)
+
+(defcustom orb-autokey-empty-field-token "N/A"
+ "String to use when BibTeX field is nil or empty."
+ :type 'string
+ :group 'orb-autokey)
+
+(defcustom orb-autokey-invalid-symbols
+ " \"'()={},~#%\\"
+ "Characters not allowed in a BibTeX key.
+The key will be stripped of these characters."
+ :type 'string
+ :group 'orb-autokey)
+
+;;;
+;;;###autoload
+(defun orb-autokey-generate-key (entry &optional control-string)
+ "Generate citation key from ENTRY according to `orb-autokey-format'.
+Return a string. If optional CONTROL-STRING is non-nil, use it
+instead of `orb-autokey-format'."
+ (let* ((case-fold-search nil)
+ (str (or control-string orb-autokey-format))
+ ;; star regexp: group 3!
+ (star '(opt (group-n 3 "*")))
+ ;; optional parameters: regexp groups 4-6!
+ (opt1 '(opt (and "[" (opt (group-n 4 digit)) "]")))
+ (opt2 '(opt (and "[" (opt (group-n 5 digit)) "]")))
+ (opt3 '(opt (and "[" (opt (group-n 6 (any alnum "_.:|-"))) "]")))
+ ;; capital letters: regexp group 2!
+ ;; author wildcard regexp
+ (a-rx (macroexpand
+ `(rx (group-n 1 (or "%a" (group-n 2 "%A"))
+ ,star ,opt1 ,opt2 ,opt3))))
+ ;; title wildcard regexp
+ (t-rx (macroexpand
+ `(rx (group-n 1 (or "%t" (group-n 2 "%T"))
+ ,star ,opt1 ,opt2 ,opt3))))
+ ;; any field wildcard regexp
+ ;; required parameter: group 7!
+ (f-rx (macroexpand
+ `(rx (group-n 1 (or "%f" (group-n 2 "%F"))
+ (and "{" (group-n 7 (1+ letter)) "}")
+ ,opt1 ,opt2 ,opt3))))
+ ;; year wildcard regexp
+ (y-rx (rx (group-n 1 "%y" (opt (group-n 3 "*")))))
+ ;; page wildcard regexp
+ (p-rx (macroexpand `(rx (group-n 1 "%p" ,star ,opt3))))
+ ;; elisp expression wildcard regexp
+ ;; elisp sexp: group 8!
+ (e-rx (rx (group-n 1 "%e"
+ "{" (group-n 8 "(" (1+ ascii) ")") "}"))))
+ ;; Evaluating elisp expression should go the first because it can produce
+ ;; additional wildcards
+ (while (string-match e-rx str)
+ (setq str (replace-match
+ (save-match-data
+ (orb--autokey-evaluate-expression
+ (match-string 8 str) entry)) t nil str 1)))
+ ;; Expanding all other wildcards are actually
+ ;; variations of calls to `orb--autokey-format-field' with many
+ ;; commonalities, so we wrap it into a macro
+ (cl-macrolet
+ ((expand
+ (wildcard &key field value entry capital
+ starred words characters delimiter)
+ (let ((cap (or capital '(match-string 2 str)))
+ (star (or starred '(match-string 3 str)))
+ (opt1 (or words '(match-string 4 str)))
+ (opt2 (or characters '(match-string 5 str)))
+ (opt3 (or delimiter '(match-string 6 str))))
+ `(while (string-match ,wildcard str)
+ (setq str (replace-match
+ ;; we can safely pass nil values
+ ;; `orb--autokey-format-field' should
+ ;; handle them correctly
+ (orb--autokey-format-field ,field
+ :entry ,entry :value ,value
+ :capital ,cap :starred ,star
+ :words ,opt1 :characters ,opt2 :delimiter ,opt3)
+ t nil str 1))))))
+ ;; Handle author wildcards
+ (expand a-rx
+ :field "=name="
+ :value (or (bibtex-completion-get-value "author" entry)
+ (bibtex-completion-get-value "editor" entry)))
+ ;; Handle title wildcards
+ (expand t-rx
+ :field "title"
+ :value (or (bibtex-completion-get-value "title" entry) ""))
+ ;; Handle custom field wildcards
+ (expand f-rx
+ :field (match-string 7 str)
+ :entry entry)
+ ;; Handle pages wildcards %p*[-]
+ (expand p-rx
+ :field (if (match-string 3 str)
+ "pagetotal" "pages")
+ :entry entry
+ :words "1"))
+ ;; Handle year wildcards
+ ;; it's simple, so we do not use `orb--autokey-format-field' here
+ ;; year should be well-formed: YYYY
+ ;; TODO: put year into cl-macrolet
+ (let ((year (or (bibtex-completion-get-value "year" entry)
+ (bibtex-completion-get-value "date" entry))))
+ (if (or (not year)
+ (string-empty-p year)
+ (string= year orb-autokey-empty-field-token))
+ (while (string-match y-rx str)
+ (setq str (replace-match orb-autokey-empty-field-token
+ t nil str 1)))
+ (while (string-match y-rx str)
+ (setq year (format "%04d" (string-to-number year))
+ str (replace-match
+ (format "%s" (if (match-string 3 str)
+ (substring year 2 4)
+ (substring year 0 4)))
+ t nil str 1)))))
+ str))
+
+(defun orb--autokey-format-field (field &rest specs)
+ "Return BibTeX FIELD formatted according to plist SPECS.
+
+Recognized keys:
+==========
+:entry - BibTeX entry to use
+:value - Value of BibTeX field to use
+ instead retrieving it from :entry
+:capital - capitalized version
+:starred - starred version
+:words - first optional parameter (number of words)
+:characters - second optional parameter (number of characters)
+:delimiter - third optional parameter (delimiter)
+
+All values should be strings, including those representing numbers.
+
+This function is used internally by `orb-autokey-generate-key'."
+ (declare (indent 1))
+ (-let* (((&plist :entry entry
+ :value value
+ :capital capital
+ :starred starred
+ :words words
+ :characters chars
+ :delimiter delim) specs)
+ ;; field values will be split into a list of words. `separator' is a
+ ;; regexp for word separators: either a whitespace, one or more
+ ;; dashes, or en dash, or em dash
+ (separator "\\([ \n\t]\\|[-]+\\|[—–]\\)")
+ (invalid-chars-rx
+ (rx-to-string `(any ,orb-autokey-invalid-symbols) t))
+ (delim (or delim ""))
+ result)
+ ;; 0. virtual field "=name=" is used internally here and in
+ ;; `orb-autokey-generate-key'; it stands for author or editor
+ (if (string= field "=name=")
+ ;; in name fields, logical words are full names consisting of several
+ ;; words and containing spaces and punctuation, separated by a logical
+ ;; separator, the word "and"
+ (setq separator " and "
+ value (or value
+ (bibtex-completion-get-value "author" entry)
+ (bibtex-completion-get-value "editor" entry)))
+ ;; otherwise proceed with value or get it from entry
+ (setq value (or value
+ (bibtex-completion-get-value field entry))))
+ (if (or (not value)
+ (string-empty-p value))
+ (setq result orb-autokey-empty-field-token)
+ (when (> (length value) 0)
+ (save-match-data
+ ;; 1. split field into words
+ (setq result (split-string value separator t "[ ,.;:-]+"))
+ ;; 1a) only for title;
+ ;; STARRED = include words from `orb-autokey-titlewords-ignore
+ ;; unstarred version filters the keywords, starred ignores this block
+ (when (and (string= field "title")
+ (not starred))
+ (let ((ignore-rx (concat "\\`\\(:?"
+ (mapconcat #'identity
+ orb-autokey-titlewords-ignore
+ "\\|") "\\)\\'"))
+ (words ()))
+ (setq result (dolist (word result (nreverse words))
+ (unless (string-match-p ignore-rx words)
+ (push word words))))))
+ ;; 2. take number of words equal to WORDS if that is set
+ ;; or just the first word; also 0 = 1.
+ (if words
+ (setq words (string-to-number words)
+ result (-take (if (> words (length result))
+ (length result)
+ words)
+ result))
+ (setq result (list (car result))))
+ ;; 2a) only for "=name=" field, i.e. author or editor
+ ;; STARRED = include initials
+ (when (string= field "=name=")
+ ;; NOTE: here we expect name field 'Doe, J. B.'
+ ;; should ideally be able to handle 'Doe, John M. Longname, Jr'
+ (let ((r-x (if starred
+ "[ ,.\t\n]"
+ "\\`\\(.*?\\),.*\\'"))
+ (rep (if starred "" "\\1"))
+ (words ()))
+ (setq result
+ (dolist (name result (nreverse words))
+ (push (s-replace-regexp r-x rep name) words)))))
+ ;; 3. take at most CHARS number of characters from every word
+ (when chars
+ (let ((words ()))
+ (setq chars (string-to-number chars)
+ result (dolist (word result (nreverse words))
+ (push
+ (substring word 0
+ (if (< chars (length word))
+ chars
+ (length word)))
+ words)))))
+ ;; 4. almost there: concatenate words, include DELIMiter
+ (setq result (mapconcat #'identity result delim))
+ ;; 5. CAPITAL = preserve case
+ (unless capital
+ (setq result (downcase result))))))
+ ;; return result stripped of the invalid characters
+ (s-replace-regexp invalid-chars-rx "" result t)))
+
+(defun orb--autokey-evaluate-expression (expr &optional entry)
+ "Evaluate arbitrary elisp EXPR passed as readable string.
+The expression will have value of ENTRY bound to `entry' variable
+at its disposal. ENTRY should be a BibTeX entry as returned by
+`bibtex-completion-get-entry'. The result returned should be a
+string or nil."
+ (let ((result (eval `(let ((entry (quote ,(copy-tree entry))))
+ ,(read expr)))))
+ (unless (or (stringp result)
+ (not result))
+ (user-error "Result: %s, invalid type. \
+Expression must be string or nil" result))
+ (or result "")))
+
(provide 'orb-core)
;;; orb-core.el ends here
;; Local Variables:
diff --git a/orb-note-actions.el b/orb-note-actions.el
index 97841cc..efc21b3 100644
--- a/orb-note-actions.el
+++ b/orb-note-actions.el
@@ -54,6 +54,8 @@
(declare-function org-ref-format-entry "org-ref-bibtex" (key))
+(declare-function orb-pdf-scrapper-run "orb-pdf-scrapper" (key))
+
;; * Customize definitions
(defcustom orb-note-actions-frontend 'default
@@ -94,7 +96,8 @@ Each action is a cons cell DESCRIPTION . FUNCTION."
:group 'orb-note-actions)
(defcustom orb-note-actions-extra
- '(("Save citekey to kill-ring and clipboard" . orb-note-actions-copy-citekey))
+ '(("Save citekey to kill-ring and clipboard" . orb-note-actions-copy-citekey)
+ ("Run Orb PDF Scrapper" . orb-note-actions-scrap-pdf))
"Extra actions for `orb-note-actions'.
Each action is a cons cell DESCRIPTION . FUNCTION."
:risky t
@@ -127,7 +130,7 @@ CANDIDATES. NAME is a string formatted with
constructed from `orb-note-actions-default',
`orb-note-actions-extra', and `orb-note-actions-user."
(declare (indent 1) (debug (symbolp &rest form)))
- (let* ((frontend-name (symbol-name (eval frontend)))
+ (let* ((frontend-name (symbol-name frontend))
(fun-name (intern (concat "orb-note-actions--" frontend-name))))
`(defun ,fun-name (citekey)
,(format "Provide note actions using %s interface.
@@ -140,18 +143,18 @@ CITEKEY is the citekey." (capitalize frontend-name))
orb-note-actions-user))))
,@body))))
-(orb-note-actions--frontend! 'default
+(orb-note-actions--frontend! default
(let ((f (cdr (assoc (completing-read name candidates) candidates))))
(funcall f (list citekey))))
-(orb-note-actions--frontend! 'ido
+(orb-note-actions--frontend! ido
(let* ((c (cl-map 'list 'car candidates))
(f (cdr (assoc (ido-completing-read name c) candidates))))
(funcall f (list citekey))))
(declare-function orb-note-actions-hydra/body "orb-note-actions" nil t)
-(orb-note-actions--frontend! 'hydra
+(orb-note-actions--frontend! hydra
;; we don't use candidates here because for a nice hydra we need each
;; group of completions separately (default, extra, user), so just
;; silence the compiler
@@ -187,7 +190,7 @@ CITEKEY is the citekey." (capitalize frontend-name))
Falling back to default.")
(orb-note-actions--default citekey)))
-(orb-note-actions--frontend! 'ivy
+(orb-note-actions--frontend! ivy
(if (fboundp 'ivy-read)
(ivy-read name
candidates
@@ -199,7 +202,7 @@ Falling back to default.")
Falling back to default.")
(orb-note-actions--default citekey)))
-(orb-note-actions--frontend! 'helm
+(orb-note-actions--frontend! helm
(if (fboundp 'helm)
(helm :sources
`(((name . ,name)
@@ -218,13 +221,19 @@ Falling back to default.")
;; * Note actions
(defun orb-note-actions-copy-citekey (citekey)
- "Save note's citekey to `kill-ring' and copy it to clipboard.
-Since CITEKEY is actually a list of one element, the car of the
-list is used."
+ "Save note's citation key to `kill-ring' and copy it to clipboard.
+CITEKEY is a list whose car is a citation key."
(with-temp-buffer
(insert (car citekey))
(copy-region-as-kill (point-min) (point-max))))
+(defun orb-note-actions-scrap-pdf (citekey)
+ "Wrapper around `orb-pdf-scrapper-insert'.
+CITEKEY is a list whose car is a citation key."
+ (require 'orb-pdf-scrapper)
+ (orb-pdf-scrapper-run (car citekey)))
+
+
;; * Main functions
;;;###autoload
diff --git a/orb-pdf-scrapper.el b/orb-pdf-scrapper.el
new file mode 100644
index 0000000..33362c8
--- /dev/null
+++ b/orb-pdf-scrapper.el
@@ -0,0 +1,1133 @@
+;;; orb-pdf-scrapper.el --- Orb Roam BibTeX: PDF reference scrapper -*- coding: utf-8; lexical-binding: t -*-
+
+;; Copyright © 2020 Mykhailo Shevchuk
+;; Copyright © 2020 Leo Vivier
+
+;; Author: Mykhailo Shevchuk
+;; Leo Vivier
+;; URL: https://github.com/org-roam/org-roam-bibtex
+;; Keywords: org-mode, roam, convenience, bibtex, helm-bibtex, ivy-bibtex, org-ref
+;; Version: 0.2.3
+
+;; This file is NOT part of GNU Emacs.
+
+;; This program is free software; you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation; either version 3, or (at your option)
+;; any later version.
+;;
+;; This program is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+;; GNU General Public License for more details.
+;;
+;; You should have received a copy of the GNU General Public License
+;; along with GNU Emacs; see the file COPYING. If not, write to the
+;; Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor,
+;; Boston, MA 02110-1301, USA.
+
+;; N.B. This file contains code snippets adopted from other
+;; open-source projects. These snippets are explicitly marked as such
+;; in place. They are not subject to the above copyright and
+;; authorship claims.
+
+;;; Commentary:
+;;
+
+;;; Code:
+;; * Library requires
+
+(require 'orb-core)
+(require 'orb-anystyle)
+
+;; it's fine here since `orb-pdf-scrapper' is autoloaded
+(require 'bibtex-completion)
+
+(require 'bibtex)
+(require 'rx)
+(require 'cl-extra)
+
+(eval-when-compile
+ (require 'cl-lib)
+ (require 'cl-macs)
+ (require 'subr-x))
+
+(declare-function bibtex-set-field "org-ref" (field value &optional nodelim))
+
+;; * Customize definitions
+
+;; TODO: make these defcustom
+
+(defcustom orb-pdf-scrapper-refsection-headings
+ '((parent "References")
+ (in-roam "In Org Roam database" list)
+ (in-bib "In BibTeX file" list)
+ (valid "Valid citation keys" table)
+ (invalid "Invalid citation keys" table))
+ "Determines appearence of Org-mode data generated by Org PDF Scrapper.
+A list of five elements of form (GROUP TITLE TYPE).
+
+GROUP must be one of the symbols `parent', `in-roam', `in-bib',
+`valid' or `invalid'.
+
+TITLE is an arbitrary string, which will be the title of the
+group's headline.
+
+TYPE must be one of the symbols `list' or `table' determining how
+the generated citations will appear under the group's headline.
+TYPE is ignored for the `parent' group and defaults to `list' for
+other groups when set to nil."
+ :type '(list (list :tag "Parent headline"
+ (const :format "" parent)
+ (string :tag "Title"))
+ (list :tag "\nIn-roam"
+ (const :format "" in-roam)
+ (string :tag "Title")
+ (radio :tag "Type" :value list
+ (const list) (const table)))
+ (list :tag "\nIn-bib"
+ (const :format "" in-bib)
+ (string :tag "Title")
+ (radio :tag "Type" :value list
+ (const list) (const table)))
+ (list :tag "\nValid"
+ (const :format "" valid)
+ (string :tag "Title")
+ (radio :tag "Type" :value table
+ (const list) (const table)))
+ (list :tag "\nInvalid"
+ (const :format "" invalid)
+ (string :tag "Title")
+ (radio :tag "Type" :value table
+ (const list) (const table))))
+ :group 'orb-pdf-scrapper)
+
+(defcustom orb-pdf-scrapper-set-fields
+ '(("author" orb-pdf-scrapper--invalidate-nil-value)
+ ("editor" orb-pdf-scrapper--invalidate-nil-value
+ "book" "collection")
+ ("title" orb-pdf-scrapper--invalidate-nil-value)
+ ("journal" orb-pdf-scrapper--invalidate-nil-value
+ "article")
+ ("date" orb-pdf-scrapper--invalidate-nil-value)
+ ("volume" orb-pdf-scrapper--invalidate-nil-value
+ "article" "incollection")
+ ("pages" orb-pdf-scrapper--fix-or-invalidate-range
+ "article" "incollection"))
+ "BibTeX fields to set during key generation.
+A list in which each element is the of the form (FIELD FUNCTION . TYPES).
+
+FIELD is a BibTeX field name to be set.
+
+FUNCTION is a function that will be called to generate the value,
+it takes one argument ENTRY, which is the current entry.
+
+TYPES is a list of strings corresponding to BibTeX entry types
+for which the FIELD should be set. If it is nil, set the FIELD
+for all entry types."
+ :risky t
+ :type '(repeat
+ (list :tag "Item"
+ (string :tag "Field")
+ (function :tag "Function")
+ (repeat :tag "Entry types" :inline t
+ (string :tag "Type"))))
+ :group 'orb-pdf-scrapper)
+(defcustom orb-pdf-scrapper-export-fields
+ '("author" "editor" "journal" "date" "volume" "pages")
+ "BibTeX fields to export into Org mode table.
+A list in which each element is of form (FIELD . TYPES).
+
+FIELD is a field to export.
+
+TYPES is a list of strings corresponding to BibTeX entry types
+for which the FIELD should be set. If it is nil, set the FIELD
+for all entry types."
+ :type '(repeat (string :tag "Field"))
+ :group 'org-pdf-scrapper)
+
+(defcustom orb-pdf-scrapper-invalid-key-pattern "\\`.*N/A.*\\'"
+ "Regexp to match an invalid key."
+ :type 'regexp
+ :group 'orb-pdf-scrapper)
+
+;; * Helper functions: citekey generation
+
+(defvar orb-pdf-scrapper--refs nil)
+
+(defun orb-pdf-scrapper--invalidate-nil-value (field entry)
+ "Return value of FIELD or `orb-autokey-empty-field-token' if it is nil.
+ENTRY is a BibTeX entry."
+ (bibtex-completion-get-value field entry orb-autokey-empty-field-token))
+
+(defun orb-pdf-scrapper--fix-or-invalidate-range (field entry)
+ "Replace missing or non-standard delimiter between two strings with \"--\".
+FIELD is the name of a BibTeX field from ENTRY. Return
+`orb-autokey-empty-field-token' if the value is nil.
+
+This function is primarily intended for fixing anystyle parsing
+artefacts such as those often encountered in \"pages\" field,
+where two numbers have only spaces between them."
+ (replace-regexp-in-string "\\`[[:alnum:]]*?\\([- –]+\\)[[:alnum:]]*\\'"
+ "--"
+ (bibtex-completion-get-value
+ field entry orb-autokey-empty-field-token)
+ nil nil 1))
+
+(defun orb-pdf-scrapper--get-entry-info (entry &optional collect-only)
+ "Collect some information from and about the BibTeX ENTRY for further use.
+Take a bibtex entry as returned by `bibtex-completion-get-entry'\
+and return a plist with the following keys set:
+
+:key |string | citekey generated with `orb-autokey-generate-key'
+:validp |boolean| according to `orb-pdf-scrapper-invalid-key-pattern'
+:set-fields |(cons) | as per `orb-pdf-scrapper-set-fields'
+:export-fields |(cons) | as per `orb-pdf-scrapper-export-fields'
+
+Each element of `:set-fields' and `:export-fields' lists is a
+a cons cell (FIELD . VALUE).
+
+If optional COLLECT-ONLY is non-nil, do not generate the key,
+`:set-fields' is set to nil."
+ (let ((type (bibtex-completion-get-value "=type=" entry))
+ ;; return values
+ key validp set-fields export-fields
+ ;; internal variable
+ fields)
+ ;; when requested to collect keys, just do that
+ (if collect-only
+ (setq key (bibtex-completion-get-value "=key=" entry)
+ fields entry)
+ ;; otherwise
+ ;; prepare fields for setting
+ (dolist (set-field orb-pdf-scrapper-set-fields)
+ (let ((field-name (car set-field))
+ (export-types (cddr set-field)))
+ ;; push the field for setting only when entry type is one of the
+ ;; specified types or nil, which means set the field regardless of
+ ;; entry type
+ (when (or (not export-types)
+ (member type export-types))
+ (push (cons field-name
+ ;; call the function if provided
+ (if-let ((fn (cadr set-field)))
+ (funcall fn field-name entry)
+ ;; otherwise get the value from current entry
+ (bibtex-completion-get-value field-name entry "")))
+ set-fields))))
+ ;; prioritize fields from set-fields over entry fields
+ ;; for autokey generation
+ (let ((-compare-fn (lambda (x y)
+ (string= (car x) (car y)))))
+ (setq fields (-union set-fields entry)
+ key (orb-autokey-generate-key fields))))
+ ;; validate the new shiny key (or the old existing one)
+ ;; not sure if save-match-data is needed here
+ ;; but it seems to be always a good choice
+ (save-match-data
+ (setq validp (and (not (string-match-p
+ orb-pdf-scrapper-invalid-key-pattern key))
+ t)))
+ ;; list fields for org export
+ (dolist (field orb-pdf-scrapper-export-fields)
+ (let ((value (bibtex-completion-get-value field fields "")))
+ ;; truncate author list to first three names, append et.al instead
+ ;; of the remaining names
+ ;; This is a hard-coded "reasonable default"
+ ;; and it may be replaced with something more
+ ;; flexible in the future
+ (when (or (string= field "author")
+ (string= field "editor"))
+ (setq value (split-string value " and " t "[ ,.;:-]+")
+ value (if (> (length value) 3)
+ (append (-take 3 value) '("et.al."))
+ value)
+ value (concat (mapconcat #'identity value "; "))))
+ (push (cons field value) export-fields)))
+ ;; return the entry
+ (list :key key
+ :validp validp
+ :set-fields set-fields
+ :export-fields (nreverse export-fields))))
+
+(defun orb-pdf-scrapper--update-record-at-point (&optional collect-only)
+ "Generate citation key and update the BibTeX record at point.
+Calls `orb-pdf-scrapper--get-entry-info' to get information about
+BibTeX record at point and updates it accordingly. If optional
+COLLECT-ONLY is non-nil, do not generate the key and do not set
+the fields.
+
+This is an auxiliary function for command
+`orb-pdf-scrapper-generate-keys'."
+ (let* ((entry (parsebib-read-entry (parsebib-find-next-item)))
+ (key-plist (orb-pdf-scrapper--get-entry-info entry collect-only))
+ (new-key (plist-get key-plist :key))
+ (validp (plist-get key-plist :validp))
+ (fields-to-set (plist-get key-plist :set-fields))
+ (formatted-entry (plist-get key-plist :export-fields)))
+ (unless collect-only
+ (save-excursion
+ ;; update citekey
+ ;; adjusted from bibtex-clean-entry
+ (bibtex-beginning-of-entry)
+ (re-search-forward bibtex-entry-maybe-empty-head)
+ (if (match-beginning bibtex-key-in-head)
+ (delete-region (match-beginning bibtex-key-in-head)
+ (match-end bibtex-key-in-head)))
+ (insert new-key)
+ ;; set the bibtex fields
+ (when fields-to-set
+ (dolist (field fields-to-set)
+ (bibtex-set-field (car field) (cdr field))))))
+ ;; return the result ((NEW-KEY . ENTRY) . VALIDP)
+ ;; TODO: for testing until implemented
+ (cons new-key (cons formatted-entry validp))))
+
+(defun orb-pdf-scrapper--sort-refs (refs)
+ "Sort references REFS.
+Auxiliary function for `orb-pdf-scrapper-generate-keys'.
+REFS should be an alist of form ((CITEKEY . FORMATTED-ENTRY) . VALIDP).
+
+References marked valid by `orb-pdf-scrapper-keygen-function' function
+are further sorted into four groups:
+
+'in-roam - available in the `org-roam' database;
+'in-bib - available in `bibtex-completion-bibliography' file(s);
+'valid - marked valid by the keygen function but are not
+available in the user databases;
+'invalid - marked invalid by the keygen function."
+ (let* ((bibtex-completion-bibliography (orb-pdf-scrapper--get :global-bib))
+ ;; When using a quoted list here, sorted-refs is not erased in
+ ;; consecutive runs
+ (sorted-refs (list (list 'in-roam) (list 'in-bib)
+ (list 'valid) (list 'invalid))))
+ (dolist (ref refs)
+ (cond ((org-roam-db-query [:select [ref]
+ :from refs
+ :where (= ref $s1)]
+ (format "%s" (car ref)))
+ (push
+ (cons (format "cite:%s" (car ref)) (cadr ref))
+ (cdr (assoc 'in-roam sorted-refs))))
+ ((bibtex-completion-get-entry (car ref))
+ (push
+ (cons (format "cite:%s" (car ref)) (cadr ref))
+ (cdr (assoc 'in-bib sorted-refs))))
+ ((cddr ref)
+ (push
+ (cons (format "cite:%s" (car ref)) (cadr ref))
+ (cdr (assoc 'valid sorted-refs))))
+ (t
+ (push
+ (cons (format "cite:%s" (car ref)) (cadr ref))
+ (cdr (assoc 'invalid sorted-refs))))))
+ sorted-refs))
+
+;; * Helper functions: dispatcher
+
+(defvar orb-pdf-scrapper--plist nil
+ "Communication channel for Orb PDF Scrapper.")
+
+(defvar orb-pdf-scrapper--buffer "*Orb PDF Scrapper*"
+ "Orb PDF Scrapper special buffer.")
+
+(defmacro orb--with-scrapper-buffer! (&rest body)
+ "Execute BODY with `orb-pdf-scrapper--buffer' as current.
+If the buffer does not exist it will be created."
+ (declare (indent 0) (debug t))
+ `(save-current-buffer
+ (set-buffer (get-buffer-create orb-pdf-scrapper--buffer))
+ ,@body))
+
+(defmacro orb--when-current-context! (context &rest body)
+ "Execute BODY if CONTEXT is current context.
+Run `orb-pdf-scrapper-keygen-function' with `error' context
+otherwise. If CONTEXT is a list then current context must be a
+member of that list."
+ (declare (indent 1) (debug t))
+ `(if (not (orb-pdf-scrapper--current-context-p ,context))
+ (orb-pdf-scrapper-dispatcher 'error)
+ ,@body))
+
+(defun orb-pdf-scrapper--current-context-p (context)
+ "Return t if CONTEXT is current context.
+CONTEXT can also be a list, in which case t is returned when
+current context is its memeber."
+ (if (listp context)
+ (memq (orb-pdf-scrapper--get :context) context)
+ (eq (orb-pdf-scrapper--get :context) context)))
+
+(defun orb-pdf-scrapper--refresh-mode (mode)
+ "Restart `orb-pdf-scrapper-mode' with new major MODE."
+ (cl-case mode
+ ('txt
+ (text-mode)
+ (orb-pdf-scrapper--put :callee 'edit-bib
+ :context 'start
+ :caller 'edit-txt))
+ ('bib
+ (bibtex-mode)
+ ;; anystyle uses biblatex dialect
+ (bibtex-set-dialect 'biblatex t)
+ (orb-pdf-scrapper--put :callee 'edit-org
+ :context 'start
+ :caller 'edit-bib))
+ ('org
+ (org-mode)
+ (orb-pdf-scrapper--put :callee 'checkout
+ :context 'start
+ :caller 'edit-org))
+ ('xml
+ (xml-mode)
+ (cl-case (orb-pdf-scrapper--get :context)
+ ;; since :callee is not used in training session, we set :callee here to
+ ;; the original :caller, so that we can return to the editing mode we
+ ;; were called from if the training session is to be cancelled
+ ('start
+ (orb-pdf-scrapper--put :callee (orb-pdf-scrapper--get :caller)
+ :context 'edit
+ :caller 'edit-xml))))
+ ('train
+ (fundamental-mode)
+ (cl-case (orb-pdf-scrapper--get :context)
+ ('train
+ (orb-pdf-scrapper--put :context 'train
+ :caller 'train))
+ ;; Since the session was not cancelled, we return to text, as everything
+ ;; else should be regenerated anyway.
+ ('finished
+ (orb-pdf-scrapper--put :callee 'edit-txt
+ :context 'continue
+ :caller 'train))))
+ (t
+ (unwind-protect
+ (error "Oops...something went wrong. \
+Pressing the RED button, just in case")
+ (orb-pdf-scrapper-dispatcher 'error))))
+ (set-buffer-modified-p nil)
+ (setq mark-active nil)
+ (orb-pdf-scrapper-mode -1)
+ (orb-pdf-scrapper-mode +1)
+ (goto-char (point-min)))
+
+(defun orb-pdf-scrapper--edit-txt ()
+ "Edit text references in `orb-pdf-scrapper--buffer'."
+ ;; callee will be overridden in case of error
+ (cl-case (orb-pdf-scrapper--get :context)
+ ;; parse pdf file and switch to text editing mode
+ ('start
+ (let ((temp-txt (orb--temp-file "orb-pdf-scrapper-" ".txt"))
+ (pdf-file (orb-pdf-scrapper--get :pdf-file)))
+ (orb-pdf-scrapper--put :temp-txt temp-txt)
+ (let ((same-window-buffer-names (list orb-pdf-scrapper--buffer)))
+ (pop-to-buffer orb-pdf-scrapper--buffer))
+ (setq buffer-file-name nil)
+ (orb--with-message! (format "Scrapping %s.pdf" (f-base pdf-file))
+ (erase-buffer)
+ (orb-anystyle 'find
+ :format 'ref
+ :layout nil
+ :finder-model orb-anystyle-finder-model
+ :input pdf-file
+ :stdout t
+ :buffer orb-pdf-scrapper--buffer))
+ (setq buffer-undo-list nil)
+ (orb-pdf-scrapper--refresh-mode 'txt)))
+ ;; read the previously generated text file
+ ('continue
+ (if-let ((temp-txt (orb-pdf-scrapper--get :temp-txt))
+ (f-exists? temp-txt))
+ (progn
+ (pop-to-buffer orb-pdf-scrapper--buffer)
+ (erase-buffer)
+ (insert-file-contents temp-txt)
+ (setq buffer-undo-list (orb-pdf-scrapper--get :txt-undo-list))
+ (orb-pdf-scrapper--refresh-mode 'txt))
+ (orb-pdf-scrapper-dispatcher 'error)))
+ (t
+ (orb-pdf-scrapper-dispatcher 'error))))
+
+(defun orb-pdf-scrapper--edit-bib ()
+ "Generate and edit BibTeX data in `orb-pdf-scrapper--buffer'."
+ (pop-to-buffer orb-pdf-scrapper--buffer)
+ (cl-case (orb-pdf-scrapper--get :context)
+ ('start
+ (let* ((temp-bib (or (orb-pdf-scrapper--get :temp-bib)
+ (orb--temp-file "orb-pdf-scrapper-" ".bib"))))
+ (orb-pdf-scrapper--put :temp-bib temp-bib)
+ ;; save previous progress in txt buffer
+ (write-region (orb--buffer-string)
+ nil (orb-pdf-scrapper--get :temp-txt) nil -1)
+ (orb-pdf-scrapper--put :txt-undo-list (copy-tree buffer-undo-list))
+ (orb--with-message! "Generating BibTeX data"
+ ;; Starting from Emacs 27, whether shell-command erases buffer
+ ;; is controlled by `shell-command-dont-erase-buffer', so we
+ ;; make sure the buffer is clean
+ (erase-buffer)
+ (orb-anystyle 'parse
+ :format 'bib
+ :parser-model orb-anystyle-parser-model
+ :input (orb-pdf-scrapper--get :temp-txt)
+ :stdout t
+ :buffer orb-pdf-scrapper--buffer)
+ (write-region (orb--buffer-string) nil temp-bib nil -1))
+ (setq buffer-undo-list nil))
+ (orb-pdf-scrapper--refresh-mode 'bib))
+ ('continue
+ (if-let ((temp-bib (orb-pdf-scrapper--get :temp-bib))
+ (f-exists? temp-bib))
+ (progn
+ (erase-buffer)
+ (insert-file-contents temp-bib)
+ (setq buffer-undo-list (orb-pdf-scrapper--get :bib-undo-list))
+ (orb-pdf-scrapper--refresh-mode 'bib))
+ (orb-pdf-scrapper-dispatcher 'error)))
+ (t
+ (orb-pdf-scrapper-dispatcher 'error))))
+
+(defun orb-pdf-scrapper--insert-org-as-list (ref-alist)
+ "Insert REF-ALIST as Org-mode list."
+ (dolist (ref ref-alist)
+ (insert "- " (car ref) "\n" )))
+
+(defun orb-pdf-scrapper--insert-org-as-table (ref-alist)
+ "Insert REF-ALIST as Org-mode table."
+ (insert
+ (concat "|citekey|"
+ (mapconcat #'identity
+ orb-pdf-scrapper-export-fields "|")
+ "|\n"))
+ (forward-line -1)
+ (org-table-insert-hline)
+ (forward-line 2)
+ (let ((table ""))
+ (dolist (ref ref-alist)
+ (setq table
+ (format "%s|%s|%s|\n" table (car ref)
+ (mapconcat
+ (lambda (field)
+ (bibtex-completion-get-value field (cdr ref) ""))
+ orb-pdf-scrapper-export-fields "|"))))
+ (insert table))
+ (forward-line -1)
+ (org-table-align))
+
+(defun orb-pdf-scrapper--edit-org ()
+ "Edit generated Org-mode data."
+ (pop-to-buffer orb-pdf-scrapper--buffer)
+ (cl-case (orb-pdf-scrapper--get :context)
+ ('start
+ ;; if the BibTeX buffer was modified, save it and maybe generate keys
+ (orb-pdf-scrapper-generate-keys
+ nil
+ (if (buffer-modified-p)
+ ;; TODO: it's clumsy
+ ;; not "yes" means generate
+ ;; not "no" means collect only
+ (not (y-or-n-p "Generate BibTeX keys? "))
+ t))
+ (when (> (cl-random 100) 98)
+ (orb--with-message! "Pressing the RED button"))
+ (write-region (orb--buffer-string)
+ nil (orb-pdf-scrapper--get :temp-bib) nil 1)
+ (orb-pdf-scrapper--put :bib-undo-list (copy-tree buffer-undo-list))
+ ;; generate Org-mode buffer
+ (let* ((temp-org (or (orb-pdf-scrapper--get :temp-org)
+ (orb--temp-file "orb-pdf-scrapper-" ".org"))))
+ (orb-pdf-scrapper--put :temp-org temp-org
+ :caller 'edit-org)
+ ;; we must change the mode in the beginning to get all the Org-mode
+ ;; facilities
+ (orb-pdf-scrapper--refresh-mode 'org)
+ (orb--with-message! "Generating Org data"
+ (erase-buffer)
+ ;; insert parent heading
+ (org-insert-heading nil nil t)
+ (insert
+ (concat
+ (cadr (assoc 'parent orb-pdf-scrapper-refsection-headings))
+ " (retrieved by Orb PDF Scrapper from "
+ (f-filename (orb-pdf-scrapper--get :pdf-file)) ")"))
+ (org-end-of-subtree)
+ ;; insert child headings: in-roam, in-bib, valid, invalid
+ (dolist (ref-group
+ (orb-pdf-scrapper--sort-refs orb-pdf-scrapper--refs))
+ (when-let* ((group (car ref-group))
+ (refs (cdr ref-group))
+ (heading
+ (cdr (assoc group
+ orb-pdf-scrapper-refsection-headings)))
+ (title (car heading))
+ (type (cadr heading)))
+ (org-insert-heading '(16) nil t)
+ ;; insert heading
+ (insert (format "%s\n" title))
+ (org-demote)
+ (org-end-of-subtree)
+ ;; insert references
+ (insert (format "\n#+name: %s\n" group))
+ (cl-case type
+ ('table
+ (orb-pdf-scrapper--insert-org-as-table refs))
+ (t
+ (orb-pdf-scrapper--insert-org-as-list refs)))))
+ (write-region (orb--buffer-string) nil temp-org nil -1)
+ (setq buffer-undo-list nil)
+ (set-buffer-modified-p nil)
+ (goto-char (point-min)))))
+ ('continue
+ (if-let ((temp-org (orb-pdf-scrapper--get :temp-org))
+ (f-exists? temp-org))
+ (progn
+ (erase-buffer)
+ (insert-file-contents temp-org)
+ (setq buffer-undo-list (orb-pdf-scrapper--get :org-undo-list))
+ (orb-pdf-scrapper--refresh-mode 'org))
+ (orb-pdf-scrapper-dispatcher 'error)))))
+
+(defun orb-pdf-scrapper--edit-xml ()
+ "Edit XML data."
+ (pop-to-buffer orb-pdf-scrapper--buffer)
+ (cl-case (orb-pdf-scrapper--get :context)
+ ('start
+ (let* ((temp-xml (or (orb-pdf-scrapper--get :temp-xml)
+ (orb--temp-file "orb-pdf-scrapper-" ".xml"))))
+ (orb-pdf-scrapper--put :temp-xml temp-xml)
+ (orb--with-message! "Generating XML data"
+ ;; save progress in text mode when called from there if called from
+ ;; anywhere else, text mode progress is already saved, other data will
+ ;; be re-generated anyway
+ (when (eq (orb-pdf-scrapper--get :caller) 'edit-txt)
+ (write-region (orb--buffer-string)
+ nil (orb-pdf-scrapper--get :temp-txt) nil -1)
+ (orb-pdf-scrapper--put :txt-undo-list (copy-tree buffer-undo-list)))
+ (erase-buffer)
+ (orb-anystyle 'parse
+ :format 'xml
+ :parser-model orb-anystyle-parser-model
+ :input (orb-pdf-scrapper--get :temp-txt)
+ :stdout t
+ :buffer orb-pdf-scrapper--buffer)
+ (write-region (orb--buffer-string) nil temp-xml nil -1)
+ (setq buffer-undo-list nil)
+ (orb-pdf-scrapper--refresh-mode 'xml))))
+ ('edit-master
+ (progn
+ (erase-buffer)
+ (insert-file-contents orb-anystyle-parser-training-set)
+ ;; we allow the user to see which file they are editing
+ (setq buffer-file-name orb-anystyle-parser-training-set)
+ (setq buffer-undo-list nil)
+ (orb-pdf-scrapper--refresh-mode 'xml)))
+ (t
+ (orb-pdf-scrapper-dispatcher 'error))))
+
+(defun orb-pdf-scrapper--update-master-file ()
+ "Append generated XML data to `orb-anystyle-parser-training-set'."
+ (orb--with-scrapper-buffer!
+ (orb--with-message! (format "Appending to master training set %s"
+ orb-anystyle-parser-training-set)
+ ;; save any progress in XML mode
+ (write-region (orb--buffer-string) nil
+ (orb-pdf-scrapper--get :temp-xml) nil -1)
+ (let (new-data)
+ ;; strip down the header and footer tokens from our data
+ (save-excursion
+ (save-match-data
+ (let* (beg end)
+ (goto-char (point-min))
+ (re-search-forward "\\(^[ \t]*[ \t]*\n\\)" nil t)
+ (setq beg (or (match-end 1)
+ (point-min)))
+ (re-search-forward "\\(^[ \t]*[ \t]*\n\\)" nil t)
+ (setq end (or (match-beginning 1)
+ (point-max)))
+ (setq new-data (orb--buffer-string beg end)))))
+ ;; append our data to the master file
+ (with-temp-buffer
+ (insert-file-contents orb-anystyle-parser-training-set)
+ ;; backup the master file
+ (let ((master-backup (concat orb-anystyle-parser-training-set ".back")))
+ (orb-pdf-scrapper--put :master-backup master-backup)
+ (rename-file orb-anystyle-parser-training-set master-backup t))
+ (goto-char (point-max))
+ (forward-line -1)
+ (insert new-data)
+ (f-touch orb-anystyle-parser-training-set)
+ (write-region (orb--buffer-string) nil
+ orb-anystyle-parser-training-set nil -1))))))
+
+(defun orb-pdf-scrapper--train (&optional review)
+ "Update parser training set and run anystyle train.
+If optional REVIEW is non-nil, run `orb-pdf-scrapper--edit-xml'
+in `:edit-master' context."
+ (pop-to-buffer orb-pdf-scrapper--buffer)
+ ;; edit the master file or proceed to training
+ (if review
+ ;; we've been requested to review the master file
+ (progn
+ (orb-pdf-scrapper--update-master-file)
+ (orb-pdf-scrapper--put :context 'edit-master)
+ (orb-pdf-scrapper--edit-xml))
+ ;; start the training process otherwise
+ (orb-pdf-scrapper--update-master-file)
+ (message "Training anystyle parser model...")
+ (when buffer-file-name
+ (save-buffer))
+ (setq buffer-file-name nil)
+ (erase-buffer)
+ (orb-pdf-scrapper--put :context 'train)
+ (orb-pdf-scrapper--refresh-mode 'train)
+ (insert (format "\
+This can take several minutes depending on the size of your training set.
+You can continue your work meanwhile and return here later.\n
+Training set => %s
+Parser model => %s\n
+anystyle output:
+=====================\n"
+ orb-anystyle-parser-model
+ orb-anystyle-parser-training-set))
+ (goto-char (point-min))
+ ;; normally, anystyle runs with `shell-command', anystyle train, however,
+ ;; can take minutes on large files, so it runs in a shell sub-process
+ (let ((training-process
+ (orb-anystyle 'train
+ :stdout t
+ :overwrite t
+ :input orb-anystyle-parser-training-set
+ :output orb-anystyle-parser-model
+ :buffer orb-pdf-scrapper--buffer)))
+ (orb-pdf-scrapper--put :training-process training-process)
+ ;; finalize
+ (set-process-sentinel
+ training-process
+ (lambda (_p result)
+ (orb--with-scrapper-buffer!
+ (if (string= result "finished\n")
+ (orb--with-scrapper-buffer!
+ (goto-char (point-max))
+ (insert "=====================\n\nDone!")
+ (message "Training anystyle parser model...done")
+ (orb-pdf-scrapper--put :context 'finished
+ :training-process nil)
+ (orb-pdf-scrapper--refresh-mode 'train))
+ (orb-pdf-scrapper--put :context 'error
+ :training-process nil))))))))
+
+(defun orb-pdf-scrapper--checkout ()
+ "Finalize Orb PDF Scrapper process.
+Insert generated Org data into the note buffer that started the
+process."
+ (cl-case (orb-pdf-scrapper--get :context)
+ ('start
+ (pop-to-buffer (orb-pdf-scrapper--get :original-buffer))
+ (save-restriction
+ (save-excursion
+ (widen)
+ (goto-char (point-max))
+ (insert-file-contents (orb-pdf-scrapper--get :temp-org))))
+ (orb-pdf-scrapper-dispatcher 'kill))
+ (t
+ (orb-pdf-scrapper-dispatcher 'error))))
+
+(defun orb-pdf-scrapper--cleanup ()
+ "Clean up before and after Orb Pdf Scrapper process."
+ (setq orb-pdf-scrapper--refs ())
+ (dolist (prop (list :running :callee :context :caller
+ :current-key :prevent-concurring
+ :temp-txt :temp-bib :temp-org :temp-xml
+ :pdf-file :global-bib :master-backup
+ :txt-undo-list :bib-undo-list :org-undo-list
+ :training-process :window-conf :original-buffer))
+ (orb-pdf-scrapper--put prop nil)))
+
+
+;; * Minor mode
+
+;;; Code in this section was adopted from org-capture.el
+;;
+;; Copyright (C) 2010-2020 Free Software Foundation, Inc.
+;; Author: Carsten Dominik
+(defvar orb-pdf-scrapper-mode-map
+ (let ((map (make-sparse-keymap)))
+ (define-key map "\C-c\C-k" #'orb-pdf-scrapper-kill)
+ map)
+ "Keymap for `orb-pdf-scrapper-mode' minor mode.
+The keymap is updated automatically according to the Orb PDF
+Scrapper process context. It is not supposed to be modified
+directly by user." )
+
+(defcustom orb-pdf-scrapper-mode-hook nil
+ "Hook for the `orb-pdf-scrapper-mode' minor mode."
+ :type 'hook
+ :group 'orb-pdf-scrapper)
+
+(define-minor-mode orb-pdf-scrapper-mode
+ "Minor mode for special key bindings in a orb-pdf-scrapper buffer.
+Turning on this mode runs the normal hook `orb-pdf-scrapper-mode-hook'."
+ nil " OPS" orb-pdf-scrapper-mode-map
+ (when orb-pdf-scrapper-mode
+ (orb-pdf-scrapper--update-keymap)
+ (setq-local
+ header-line-format
+ (orb-pdf-scrapper--format-header-line))))
+
+(defun orb-pdf-scrapper--put (&rest props)
+ "Add properties PROPS to `orb-pdf-scrapper--plist'.
+Returns the new plist."
+ (while props
+ (setq orb-pdf-scrapper--plist
+ (plist-put orb-pdf-scrapper--plist
+ (pop props)
+ (pop props)))))
+
+(defun orb-pdf-scrapper--get (prop)
+ "Get PROP from `orb-pdf-scrapper--plist'."
+ (plist-get orb-pdf-scrapper--plist prop))
+;;;
+;;; End of code adopted from org-capture.el
+
+;; TODO combine `orb-pdf-scrapper--format-header-line'
+;; and `orb-pdf-scrapper--update-keymap' into one
+;; function and use a macro to generate each entry
+(defun orb-pdf-scrapper--format-header-line ()
+ "Return formatted buffer header line depending on context."
+ (substitute-command-keys
+ (format "\\Orb PDF Scrapper: %s. %s"
+ (orb-pdf-scrapper--get :current-key)
+ (cl-case (orb-pdf-scrapper--get :caller)
+ ('edit-txt
+ "\
+Generate BibTeX `\\[orb-pdf-scrapper-dispatcher]', \
+sanitize text `\\[orb-pdf-scrapper-sanitize-text]', \
+train parser `\\[orb-pdf-scrapper-training-session]', \
+abort `\\[orb-pdf-scrapper-kill]'.")
+ ('edit-bib
+ "\
+Generate Org `\\[orb-pdf-scrapper-dispatcher]', \
+generate keys `\\[orb-pdf-scrapper-generate-keys]', \
+return to text `\\[orb-pdf-scrapper-cancel]', \
+train parser `\\[orb-pdf-scrapper-training-session], \
+abort `\\[orb-pdf-scrapper-kill]'.")
+ ('edit-org
+ "\
+Finish `\\[orb-pdf-scrapper-dispatcher]', \
+return to BibTeX `\\[orb-pdf-scrapper-cancel]', \
+abort `\\[orb-pdf-scrapper-kill]'.")
+ ('edit-xml
+ (cl-case (orb-pdf-scrapper--get :context)
+ ('edit
+ (format "\
+Train `\\[orb-pdf-scrapper-training-session]', \
+review %s `\\[orb-pdf-scrapper-review-master-file]', \
+cancel `\\[orb-pdf-scrapper-cancel], \
+abort `\\[orb-pdf-scrapper-kill]'."
+ (file-name-nondirectory
+ orb-anystyle-parser-training-set)))
+ ('edit-master
+ "\
+Train `\\[orb-pdf-scrapper-training-session]', \
+cancel `\\[orb-pdf-scrapper-cancel], \
+abort `\\[orb-pdf-scrapper-kill]'.")))
+ ('train
+ (cl-case (orb-pdf-scrapper--get :context)
+ ('train
+ "\
+Abort `\\[orb-pdf-scrapper-kill]'.")
+ ('continue
+ "\
+Finish `\\[orb-pdf-scrapper-dispatcher]', \
+abort `\\[orb-pdf-scrapper-kill]'.")))
+ (t
+ "\
+Press the RED button `\\[orb-pdf-scrapper-kill]'.")))))
+
+(defun orb-pdf-scrapper--update-keymap ()
+ "Update `orb-pdf-scrapper-mode-map' according to current editing mode.
+Context is read from `orb-pdf-scrapper--plist' property `:context'."
+ (let ((map orb-pdf-scrapper-mode-map))
+ (cl-case (orb-pdf-scrapper--get :caller)
+ ;;
+ ('edit-txt
+ (define-key map "\C-c\C-c" #'orb-pdf-scrapper-dispatcher)
+ (define-key map "\C-c\C-u" #'orb-pdf-scrapper-sanitize-text)
+ (define-key map "\C-C\C-t" #'orb-pdf-scrapper-training-session)
+ (define-key map "\C-c\C-r" nil))
+ ;;
+ ('edit-bib
+ (define-key map "\C-c\C-c" #'orb-pdf-scrapper-dispatcher)
+ (define-key map "\C-c\C-u" #'orb-pdf-scrapper-generate-keys)
+ (define-key map "\C-C\C-t" #'orb-pdf-scrapper-training-session)
+ (define-key map "\C-c\C-r" #'orb-pdf-scrapper-cancel))
+ ;;
+ ('edit-org
+ (define-key map "\C-c\C-c" #'orb-pdf-scrapper-dispatcher)
+ (define-key map "\C-c\C-u" nil)
+ (define-key map "\C-C\C-t" nil)
+ (define-key map "\C-c\C-r" #'orb-pdf-scrapper-cancel))
+ ('edit-xml
+ (cl-case (orb-pdf-scrapper--get :context)
+ ('edit
+ (define-key map "\C-c\C-c" #'orb-pdf-scrapper-training-session)
+ (define-key map "\C-c\C-u" nil)
+ (define-key map "\C-C\C-t" #'orb-pdf-scrapper-review-master-file)
+ (define-key map "\C-c\C-r" #'orb-pdf-scrapper-cancel))
+ ('edit-master
+ (define-key map "\C-c\C-c" #'orb-pdf-scrapper-training-session)
+ (define-key map "\C-c\C-u" nil)
+ (define-key map "\C-C\C-t" nil)
+ (define-key map "\C-c\C-r" #'orb-pdf-scrapper-cancel))))
+ ('train
+ (cl-case (orb-pdf-scrapper--get :context)
+ ('train
+ (define-key map "\C-c\C-c" nil)
+ (define-key map "\C-c\C-r" nil)
+ (define-key map "\C-c\C-u" nil)
+ (define-key map "\C-c\C-t" nil))
+ ('continue
+ (define-key map "\C-c\C-c" #'orb-pdf-scrapper-dispatcher))))
+ (t
+ (define-key map "\C-c\C-u" nil)
+ (define-key map "\C-c\C-t" nil)
+ (define-key map "\C-c\C-r" nil)))))
+
+;; * Interactive functions
+
+(defun orb-pdf-scrapper-generate-keys (&optional at-point collect-only)
+ "Generate BibTeX citation keys in the current buffer.
+\\
+While the Orb PDF Scrapper interactive process, when editing
+BibTeX data, press \\[orb-pdf-scrapper-generate-keys] to generate
+citation keys using the function specified in
+`orb-pdf-scrapper-keygen-function'. When called interactively
+with a \\[universal-argument] prefix argument AT-POINT, generate
+key only for the record at point.
+
+When called from Lisp, if optional COLLECT-ONLY is non-nil, do
+not generate the key and update the records, just collect records
+for future use."
+ (interactive "P")
+ (orb--with-message! "Generating citation keys"
+ (let ((bibtex-help-message nil)
+ (bibtex-contline-indentation 2)
+ (bibtex-text-indentation 2))
+ (save-excursion
+ (if (equal at-point '(4))
+ ;; generate key at point
+ (progn
+ (bibtex-beginning-of-entry)
+ (let* ((old-key (save-excursion
+ (re-search-forward
+ bibtex-entry-maybe-empty-head)
+ (bibtex-key-in-head)))
+ (old-ref (assoc old-key orb-pdf-scrapper--refs))
+ (new-ref (orb-pdf-scrapper--update-record-at-point
+ collect-only)))
+ (if old-ref
+ (setf (car old-ref) (car new-ref)
+ (cdr old-ref) (cdr new-ref))
+ (cl-pushnew new-ref orb-pdf-scrapper--refs :test 'equal))))
+ ;; generate keys in the buffer otherwise
+ (let ((refs ()))
+ (goto-char (point-min))
+ (bibtex-skip-to-valid-entry)
+ (while (not (eobp))
+ (cl-pushnew (orb-pdf-scrapper--update-record-at-point
+ collect-only)
+ refs)
+ (bibtex-skip-to-valid-entry))
+ (setq orb-pdf-scrapper--refs refs)))))
+ (write-region (orb--buffer-string) nil
+ (orb-pdf-scrapper--get :temp-bib) nil -1)
+ (set-buffer-modified-p nil)))
+
+(defun orb-pdf-scrapper-sanitize-text (&optional contents)
+ "Run string processing in current buffer.
+Try to get every reference onto newline. Return this buffer's
+contents (`orb--buffer-string').
+
+If optional string CONTENTS was specified, run processing on this
+string instead. Return modified CONTENTS."
+ (interactive)
+ (let* ((rx1 '(and "(" (** 1 2 (any "0-9")) ")"))
+ (rx2 '(and "[" (** 1 2 (any "0-9")) "]"))
+ (rx3 '(and "(" (any "a-z") (opt (any space)) ")"))
+ (rx4 '(and " " (any "a-z") ")"))
+ (regexp (rx-to-string
+ `(group-n 1 (or (or (and ,rx1 " " ,rx3)
+ (and ,rx2 " " ,rx3))
+ (or (and ,rx1 " " ,rx4)
+ (and ,rx2 " " ,rx4))
+ (or ,rx1 ,rx2)
+ (or ,rx3 ,rx4))) t)))
+ (if contents
+ (--> contents
+ (s-replace "\n" " " it)
+ (s-replace-regexp regexp "\n\\1" it))
+ (goto-char (point-min))
+ (while (re-search-forward "\n" nil t)
+ (replace-match " " nil nil))
+ (goto-char (point-min))
+ (while (re-search-forward regexp nil t)
+ (replace-match "\n\\1" nil nil))
+ (goto-char (point-min))
+ (orb--buffer-string))))
+
+(defun orb-pdf-scrapper-training-session (&optional context)
+ "Run training session subroutines depending on CONTEXT.
+If context is not provided, it will be read from
+`orb-pdf-scrapper--plist''s `:context'."
+ (interactive)
+ (pop-to-buffer orb-pdf-scrapper--buffer)
+ (let ((context (or context (orb-pdf-scrapper--get :context))))
+ (orb-pdf-scrapper--put :context context)
+ (cl-case context
+ ('start
+ ;; generate xml
+ (orb-pdf-scrapper--edit-xml))
+ ((edit edit-master)
+ (orb-pdf-scrapper--train nil))
+ ('finished
+ (orb-pdf-scrapper-dispatcher 'edit-txt 'continue))
+ (t (orb-pdf-scrapper-dispatcher 'error)))))
+
+(defun orb-pdf-scrapper-review-master-file ()
+ "Review parser training set (master file)."
+ (interactive)
+ (orb-pdf-scrapper--train t))
+
+(defun orb-pdf-scrapper-dispatcher (&optional callee context)
+ "Call Orb PDF Scrapper subroutine CALLEE in context CONTEXT.
+CALLEE and CONTEXT can be passed directly as optional variables,
+or they will be read from `orb-pdf-scrapper--plist''s
+respectively `:collee' and `:context' properties.
+
+Recognized CALLEEs are:
+==========
+'edit-txt - `orb-pdf-scrapper--edit-txt'
+'edit-bib - `orb-pdf-scrapper--edit-bib'
+'edit-org - `orb-pdf-scrapper--edit-org'
+'train - `orb-pdf-scrapper-training-session'
+'checkout - `orb-pdf-scrapper--checkout'
+
+Passing or setting any other CALLEE will kill the process.
+
+This function also checks `:prevent-concurring' property in
+`orb-pdf-scrapper--plist' and will suggest to restart the process
+if its value is non-nil."
+ ;; TODO: check for whether the user killed any of the buffers
+ (interactive)
+ (let ((callee (or callee (orb-pdf-scrapper--get :callee)))
+ (context (or context (orb-pdf-scrapper--get :context))))
+ ;; in case context was passed as an argument
+ (orb-pdf-scrapper--put :callee callee
+ :context context)
+ (if
+ ;; Prevent another Orb PDF Scrapper process from running
+ ;; Ask user whether to kill the currently running process
+ (orb-pdf-scrapper--get :prevent-concurring)
+ (if (y-or-n-p
+ (format "Another Orb PDF Scrapper process is running: %s. \
+Kill it and start a new one %s? "
+ (orb-pdf-scrapper--get :current-key)
+ (orb-pdf-scrapper--get :new-key)))
+ ;; Kill the process and start a new one
+ (progn
+ (orb--with-message! "Killing current process"
+ (orb-pdf-scrapper--cleanup))
+ (orb-pdf-scrapper-run (orb-pdf-scrapper--get :new-key)))
+ ;; Do nothing
+ (orb-pdf-scrapper--put :prevent-concurring nil))
+ ;; Finilize the requested context otherwise
+ (cl-case callee
+ ('edit-txt
+ (orb-pdf-scrapper--edit-txt))
+ ('edit-bib
+ (orb-pdf-scrapper--edit-bib))
+ ;; edit org
+ ('edit-org
+ (orb-pdf-scrapper--edit-org))
+ ('checkout
+ ;; currently, this is unnecessary but may be useful
+ ;; if some recovery options are implemented
+ (orb--with-scrapper-buffer!
+ (write-region (orb--buffer-string)
+ nil (orb-pdf-scrapper--get :temp-org) nil 1))
+ (orb-pdf-scrapper--checkout))
+ (t
+ ;; 1 in 100 should not be too annoying
+ (when (> (cl-random 100) 98)
+ (message "Oops...")
+ (sleep-for 1)
+ (message "Oops...Did you just ACCIDENTALLY press the RED button?")
+ (sleep-for 1)
+ (message "Activating self-destruction subroutine...")
+ (sleep-for 1)
+ (message "Activating self-destruction subroutine...Bye-bye")
+ (sleep-for 1))
+ (let ((kill-buffer-query-functions nil))
+ (and (get-buffer orb-pdf-scrapper--buffer)
+ (kill-buffer orb-pdf-scrapper--buffer)))
+ (set-window-configuration (orb-pdf-scrapper--get :window-conf))
+ (orb-pdf-scrapper--cleanup))))))
+
+(defun orb-pdf-scrapper-cancel ()
+ "Discard edits and return to previous editing mode."
+ (interactive)
+ (cl-case (orb-pdf-scrapper--get :caller)
+ ('edit-bib
+ (orb--with-scrapper-buffer!
+ (orb-pdf-scrapper--put :bib-undo-list nil))
+ (orb-pdf-scrapper-dispatcher 'edit-txt 'continue))
+ ('edit-org
+ (orb-pdf-scrapper-dispatcher 'edit-bib 'continue))
+ ('edit-xml
+ (when-let ((master-backup (orb-pdf-scrapper--get :master-backup)))
+ (rename-file master-backup orb-anystyle-parser-training-set t))
+ (orb-pdf-scrapper-dispatcher (orb-pdf-scrapper--get :callee) 'continue))
+ (t
+ (orb-pdf-scrapper-dispatcher 'error))))
+
+(defun orb-pdf-scrapper-kill ()
+ "Kill the interactive Orb PDF Scrapper process."
+ (interactive)
+ (when-let (process (orb-pdf-scrapper--get :training-process))
+ (kill-process process))
+ (orb-pdf-scrapper-dispatcher 'kill))
+
+
+;; * Main functions
+
+;; entry point
+
+;;;###autoload
+(defun orb-pdf-scrapper-run (key)
+ "Run Orb PDF Scrapper interactive process.
+KEY is note's citation key."
+ (interactive)
+ (if (orb-pdf-scrapper--get :running)
+ (progn
+ (orb-pdf-scrapper--put :prevent-concurring t
+ :new-key key)
+ (orb-pdf-scrapper-dispatcher))
+ ;; in case previous process was not killed properly
+ (orb-pdf-scrapper--cleanup)
+ (orb-pdf-scrapper--put :callee 'edit-txt
+ :context 'start
+ :caller 'run
+ :current-key key
+ :new-key nil
+ :pdf-file (file-truename
+ (orb-process-file-field key))
+ :running t
+ :prevent-concurring nil
+ :global-bib bibtex-completion-bibliography
+ :original-buffer (current-buffer)
+ :window-conf (current-window-configuration))
+ (orb-pdf-scrapper-dispatcher)))
+
+(provide 'orb-pdf-scrapper)
+;;; orb-pdf-scrapper.el ends here
+;; Local Variables:
+;; fill-column: 79
+;; End:
diff --git a/orb-utils.el b/orb-utils.el
index ae88848..b13a197 100644
--- a/orb-utils.el
+++ b/orb-utils.el
@@ -8,7 +8,6 @@
;; URL: https://github.com/org-roam/org-roam-bibtex
;; Keywords: org-mode, roam, convenience, bibtex, helm-bibtex, ivy-bibtex, org-ref
;; Version: 0.2.3
-;; Package-Requires: ((emacs "26.1"))
;; This file is NOT part of GNU Emacs.
@@ -27,6 +26,11 @@
;; Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor,
;; Boston, MA 02110-1301, USA.
+;; N.B. This file contains code snippets adopted from other
+;; open-source projects. These snippets are explicitly marked as such
+;; in place. They are not subject to the above copyright and
+;; authorship claims.
+
;;; Commentary:
;;
;; This file contains utility macros and helper functions used accross
@@ -36,16 +40,24 @@
;;; Code:
;; * Library requires
-(require 'orb-compat)
(defvar orb-citekey-format)
;; * Macros
+(defmacro orb--with-message! (message &rest body)
+ "Put MESSAGE before and after BODY.
+Append \"...\" to the first message and \"...done\" to the second.
+Return result of evaluating the BODY."
+ (declare (indent 1) (debug (stringp &rest form)))
+ `(prog2
+ (message "%s..." ,message)
+ (progn ,@body)
+ (message "%s...done" ,message)))
;; * Functions
-(defun orb-unformat-citekey (citekey)
+(defun orb--unformat-citekey (citekey)
"Remove format from CITEKEY.
Format is `orb-citekey-format'."
(string-match "\\(.*\\)%s\\(.*\\)" orb-citekey-format)
@@ -55,6 +67,123 @@ Format is `orb-citekey-format'."
(length orb-citekey-format)))))
(substring citekey beg end)))
+(defun orb--buffer-string (&optional start end)
+ "Retun buffer (sub)string with no text porperties.
+Like `buffer-substring-no-properties' but START and END are
+optional and equal to (`point-min') and (`point-max'),
+respectively, if nil."
+ (buffer-substring-no-properties (or start (point-min))
+ (or end (point-max))))
+
+(defun orb--format (&rest args)
+ "Format ARGS conditionally and return a string.
+ARGS must be a plist, whose keys are `format' control strings and
+values are `format' objects. Thus only one object per control
+string is allowed. The result will be concatenated into a single
+string.
+
+In the simplest case, it behaves as a sort of interleaved `format':
+==========
+
+\(orb--format \"A: %s\" 'hello
+ \" B: %s\" 'world
+ \" C: %s\" \"!\")
+
+ => 'A: hello B: world C: !'
+
+If format object is nil, it will be formatted as empty string:
+==========
+
+\(orb--format \"A: %s\" 'hello
+ \" B: %s\" nil
+ \" C: %s\" \"!\")
+ => 'A: hello C: !'
+
+Object can also be a cons cell. If its car is nil then its cdr
+will be treated as default value and formatted as \"%s\":
+==========
+
+\(orb--format \"A: %s\" 'hello
+ \" B: %s\" '(nil . dworl)
+ \" C: %s\" \"!\")
+ => 'A: hellodworl C: !'
+
+Finally, if the control string is nil, the object will be formatted as \"%s\":
+==========
+
+\(orb--format \"A: %s\" 'hello
+ \" B: %s\" '(nil . \" world\")
+ nil \"!\")
+=> 'A: hello world!'."
+ (let ((res ""))
+ (while args
+ (let ((str (pop args))
+ (obj (pop args)))
+ (unless (consp obj)
+ (setq obj (cons obj nil)))
+ (setq res
+ (concat res
+ (format (or (and (car obj) str) "%s")
+ (or (car obj) (cdr obj) ""))))))
+ res))
+
+;;; Code in this section was adopted from ob-core.el
+;;
+;; Copyright (C) 2009-2020 Free Software Foundation, Inc.
+;;
+;; Authors: Eric Schulte
+;; Dan Davison
+
+(defvar orb--temp-dir)
+(unless (or noninteractive (boundp 'orb--temp-dir))
+ (defvar orb--temp-dir
+ (or (and (boundp 'orb--temp-dir)
+ (file-exists-p orb--temp-dir)
+ orb--temp-dir)
+ (make-temp-file "orb-" t))
+"Directory to hold temporary files created during reference parsing.
+Used by `orb--temp-file'. This directory will be removed on Emacs
+shutdown."))
+
+(defun orb--temp-file (prefix &optional suffix)
+ "Create a temporary file in the `orb--temp-dir'.
+Passes PREFIX and SUFFIX directly to `make-temp-file' with the
+value of variable `temporary-file-directory' temporarily set to
+the value of `orb--temp-dir'."
+ (let ((temporary-file-directory
+ (or (and (boundp 'orb--temp-dir)
+ (file-exists-p orb--temp-dir)
+ orb--temp-dir)
+ temporary-file-directory)))
+ (make-temp-file prefix nil suffix)))
+
+(defun orb--remove-temp-dir ()
+ "Remove `orb--temp-dir' on Emacs shutdown."
+ (when (and (boundp 'orb--temp-dir)
+ (file-exists-p orb--temp-dir))
+ ;; taken from `delete-directory' in files.el
+ (condition-case nil
+ (progn
+ (mapc (lambda (file)
+ ;; This test is equivalent to
+ ;; (and (file-directory-p fn) (not (file-symlink-p fn)))
+ ;; but more efficient
+ (if (eq t (car (file-attributes file)))
+ (delete-directory file)
+ (delete-file file)))
+ (directory-files orb--temp-dir 'full
+ directory-files-no-dot-files-regexp))
+ (delete-directory orb--temp-dir))
+ (error
+ (message "Failed to remove temporary Org-roam-bibtex directory %s"
+ (if (boundp 'orb--temp-dir)
+ orb--temp-dir
+ "[directory not defined]"))))))
+
+(add-hook 'kill-emacs-hook 'orb--remove-temp-dir)
+
+;;; End of code adopted from ob-core.el
+
(provide 'orb-utils)
;;; orb-utils.el ends here
;; Local Variables:
diff --git a/org-roam-bibtex.el b/org-roam-bibtex.el
index 19eb73d..7bfa275 100644
--- a/org-roam-bibtex.el
+++ b/org-roam-bibtex.el
@@ -1,4 +1,4 @@
-;;; org-roam-bibtex.el --- Org Roam meets BibTeX -*- coding: utf-8; lexical-binding: t -*-
+;;; org-roam-bibtex.el --- Org Roam meets BibTeX -*- coding: utf-8; lexical-binding: t -*-
;; Copyright © 2020 Jethro Kuan
;; Copyright © 2020 Mykhailo Shevchuk