diff --git a/CHANGELOG.md b/CHANGELOG.md index f403363..326d9d6 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -10,7 +10,7 @@ Well, at least we try! ## [0.2.3] - 2020-05-10 ### Added -- `orb--replace-virtual-field` and `orb--virtual-fields-alist` for +- `orb--replace-virtual-fields` and `orb--virtual-fields-alist` for mapping `bibtex-completion` virtual field names to more conventional words, namely these: ``` elisp diff --git a/README.md b/README.md index b2d077e..77671ce 100644 --- a/README.md +++ b/README.md @@ -249,14 +249,19 @@ notes from the completion-list. Type `M-x orb-note-actions` to easily access additional commands useful in note's context. These commands are run with the note's BibTeX key as an argument. The key is taken from the `#+ROAM_KEY:` file property. -See section [`Orb Note Actions`](#orb-note-actions-section) for +See section [ORB Note Actions](#orb-note-actions-section) for details. Configuration --------------- -### Org Roam BibTeX - BibTeX aware capture template expansion +The following sections use Emacs Lisp examples to configure Org Roam +BibTeX. If you are not comfortable with Lisp yet, remember you can +always use the Customize interface to achieve the same, run `M-x +customize` or from menu click `Options -> Customize Emacs -> Top Level +Customization Group` and search for `org-roam-bibtex`. +### Org Roam BibTeX - BibTeX aware capture template expansion #### `orb-templates` This variable specifies the templates to use when creating a new @@ -427,7 +432,7 @@ Below shows how this can be used to integrate with Do not forget to escape the quotes inside the `%`-escapes form! -### Orb Note Actions - BibTeX record-related commands +### ORB Note Actions - BibTeX record-related commands #### Overview Type `M-x orb-note-actions` or bind this command to a key such as `C-c @@ -474,9 +479,8 @@ is the current note's citation key: #### Adding new note actions -To install a note action, add a cons -cell of format `(DESCRIPTION . FUNCTION)` to one of the note actions -variables: +To install a note action, add a cons cell of format `(DESCRIPTION +. FUNCTION)` to one of the note actions variables: ``` el (with-eval-after-load 'orb-note-actions @@ -489,13 +493,321 @@ whose car is the current note's citation key: ``` el (defun my-note-action (citekey) (let ((key (car citekey))) - ... - )) + ...)) ``` +### ORB PDF Scrapper - Retrieve references from PDFs +#### Overview + +ORB PDF Scrapper is an Emacs interface to +[`anystyle`](https://github.com/inukshuk/anystyle), an open-source software +based on powerful machine-learning algorithms. It requires `anystyle-cli`, +which can be installed with `[sudo] gem install anystyle-cli`. Note that +`ruby` and `gem` must already be present in the system. `ruby` is shipped +with MacOS, but you will have to install it on other operating systems; please +refer to the relevant section in the official documentation for `ruby`. You +may also want to consult the [`anystyle` +documentation](https://rubydoc.info/gems/anystyle) to learn more about how it +works. + +Once `anystyle-cli` is installed, ORB PDF Scrapper can be launched with +`orb-note-actions` while in an Org-roam buffer containing a `#+ROAM_KEY:` +BibTeX key. References are retrieved from a PDF file associated with the note +which is retrieved from the corresponding BibTeX record. + +The reference-retrieval process consists of three interactive steps described +below. + +#### Text mode +In the first step, the PDF file is searched for references, which are +eventually output in the ORB PDF Scrapper buffer as plain text. The +buffer is in the `text-mode` major-mode for editing general text +files. + +You need to review the retrieved references and prepare them for the next step +in such a way that there is only one reference per line. You may also need to +remove any extra text captured together with the references. Some PDF files +will produce a nicely-formed list of references that will require little to no +manual editing, while others will need a different degree of manual +intervention. + +Generally, it is possible to train a custom `anystyle` finder model +responsible for PDF-parsing to improve the output quality, but this is +not currently supported by ORB PDF Scrapper. As a small and somewhat +naïve aid, the `sanitize text` command bound to `C-c C-u` may assist +in putting each reference onto a separate line. + +After you are finished with editing the text data, press `C-c C-c` to +proceed to the second step. + +Press `C-c C-k` anytime to abort the ORB PDF Scrapper process. + +#### BibTeX mode +In the second step, the obtained list of plain text references, one +reference per line, is parsed and converted into BibTeX format. The +resulting BibTeX records are presented to the user in the ORB PDF +Scrapper buffer replacing the text references. The buffer's major +mode switches to `bibtex-mode`, which is helpful for reviewing and +editing the BibTeX data and correcting possible parsing errors. + +Again, depending on the citation style used in the particular book or article, +the parsing quality can vary greatly and might require more or less manual +post-editing. It is possible to train a custom `anystyle` parser model to +improve the parsing quality. See [Training a Parser +model](#training-a-parser-model) for more details. + +Press `C-c C-u` to generate BibTeX keys for the records in the buffer or `C-u +C-c C-u` to generate a key for the record at point. See [ORB Autokey +configuration](#orb-autokey-configuration) on how to configure the BibTeX key +generation. During key generation, it is also possible to automatically set +the values of BibTeX fields: see `orb-pdf-scrapper-set-fields` docstring for +more details. + +Press `C-c C-r` to return to the text-editing mode in its last state. Note +that all the progress in BibTeX mode will be lost. + +Press `C-c C-c` to proceed to the third step. + +#### Org mode +In the third step, the BibTeX records are processed internally by ORB PDF +Scrapper, and the result replaces the BibTeX data in the ORB PDF Scrapper, +which switches to `org-mode`. + +The processing involves sorting the references into four groups under +the respective Org-mode headlines: `in-roam`, `in-bib`, `valid`, and +`invalid`, and inserting the grouped references as either an Org-mode +plain-list of `org-ref`-style citations, or an Org-mode table with +columns corresponding to different BibTeX fields. + +* `in-roam` --- These references have notes with the respective + `#+ROAM_KEY:` citation keys in the `org-roam` database. +* `in-bib` --- These references are not yet in the `org-roam` database + but they are present in user BibTeX file(s) (see + `bibtex-completion-bibliography`). +* `invalid` --- These references matched against + `orb-pdf-scrapper-invalid-key-pattern` and are considered invalid. + Adjust this variable to your criteria of validity. +* `valid` --- All other references fall into this group. They look + fine but are not yet in user Org-roam and BibTeX databases. + +Review and edit the generated Org-mode data, or press `C-c C-c` to +insert the references into the note's buffer and finish the ORB PDF +Scrapper. + +Press `C-c C-r` to return to BibTeX editing mode in its last state. +Note that all the progress in current mode will be lost. + +The following user variables control the appearance of the generated +Org-mode data: `orb-pdf-scrapper-refsection-headings`, +`orb-pdf-scrapper-export-fields`. These variables can be set through +the Customize interface or with `setq`. Refer to their respective +docstrings in Emacs for more information. + +#### Training a Parser model +##### Prerequisites +Currently, the core data set (explained below) must be installed manually by the user as follows: + +1. Use `find`, `locate` or similar tools to find the file `core.xml` buried in + `res/parser/` subdirectory of `anystyle` gem, e.g. `locate core.xml | grep + anystyle`. On MacOS, with `anystyle` installed as a system gem, the file + path would look similar to: + + `"/Library/Ruby/Gems/2.6.0/gems/anystyle-1.3.11/res/parser/core.xml"` + + The actual path will vary slightly depending on the currently-installed + versions of `ruby` and `anystyle`. + + On Linux and Windows, this path will be different. +2. Copy this file into the location specified in + `orb-anystyle-parser-training-set`, or anywhere else where you have + disk-write access, and adjust the aforementioned variable accordingly. + +##### Running a training session +Training a custom parser model on custom user data will greatly improve the +parsing of plain-text references. A training session can be initiated by +pressing `C-c C-t` in the ORB PDF Scrapper buffer in either text-mode or +BibTeX-mode. In each case, the plain-text references obtained in the `text +mode` step described above will be used to generate source XML data for +a training set. + +The generated XML data replaces the text or the BibTeX references in the +ORB PDF Scrapper buffer, and the major-mode switches to `xml-mode`. + +The XML data must be edited manually---this is the whole point of creating +a custom training model---which usually consists in simply correcting the +placement of bibliographic data within the XML elements (data fields). It is +extremely important to review the source data carefully since any mistakes +here will make its way into the model, thereby leading to poorer parsing in +the future. + +It would be quite tedious to create the whole data-set by hand--- hundreds or +thousands of individual bibliographic records---so the best workflow for +making a good custom data-set is to use the core data-set shipped with +`anystyle` and append to it several data-sets generated in ORB PDF Scrapper +training sessions from individual PDF files, incrementally re-training the +model in between. This approach is implemented in ORB PDF Scrapper. From +personal experience, adding references data incrementally from 4--5 PDF files +raises the parser success rate to virtually 100%. Follow the instructions +described in [Prerequisites](#parser-model-prerequisites) to install the core +data-set. + +Once the editing is done, press `C-c C-c` to train the model. The XML data in +the ORB PDF Scrapper buffer will be automatically appended to the custom +`core.xml` file which will be used for training. Alternatively, press `C-c +C-t` to review the updated `core.xml` file and press `C-c C-c` when finished. + +The major mode will now switch to `fundamental-mode`, and the `anystyle` +`stdout` output will appear in the buffer. Training the model can take +_several minutes_, depending on the size of the training data-set and the +computing resources available on your device. The process is run in a shell +subprocess, so you will be able to continue your work and return to ORB PDF +Scrapper buffer later. + +Once the training is complete, press `C-c C-c` to return to the previous +editing-mode. You can now re-generate the BibTeX data and see the +improvements achieved with the re-trained model. + +#### ORB Autokey configuration +#### `orb-autokey-format` +You can specify the format of autogenerated BibTeX keys by setting the +`orb-autokey-format` variable through the Customize interface, or by adding +a `setq` form in your Emacs configuration file. + +ORB Autokey format currently supports the following wildcards: + +###### Basic + +| Wildcard | Field | Description | +|:-----------|:-------|:---------------------------------------| +| %a | author | first author's (or editor's) last name | +| %t | title | first word of title | +| %f{field} | field | first word of arbitrary field | +| %y | year | year YYYY (date or year field) | +| %p | page | first page | +| %e{(expr)} | elisp | elisp expression | + +``` el +(setq orb-autokey-format "%a%y") => "doe2020" +``` + +###### Extended + +1. Capitalized versions: + +| Wildcard | Field | Description | +|:----------|:-------|:-------------------------------------| +| %A | author | | +| %T | title | Same as %a,%t,%f{field} but | +| %F{field} | field | preserve the original capitalization | + +``` el +(setq orb-autokey-format "%A%y") => "Doe2020" +``` + +2. Starred versions + +| Wildcard | Field | Description | +|:---------|:-------|:-------------------------------------------------------| +| %a, %A | author | - include author's (editor's) initials | +| %t, %T | title | - do not ignore words in orb-autokey-titlewords-ignore | +| %y | year | - year's last two digits __YY | +| %p | page | - use "pagetotal" field instead of default "pages" | + +``` el +(setq orb-autokey-format "%A*%y") => "DoeJohn2020" +``` + +3. Optional parameters + +| Wildcard | Field | Description | +|:-------------------|:-------|:--------------------------------------------------| +| %a[N][M][D] | author | | +| %t[N][M][D] | title | > include first N words/names | +| %f{field}[N][M][D] | field | > include at most M first characters of word/name | +| %p[D] | page | > put delimiter D between words | + +`N` and `M` should be a single digit `1-9`. Putting more digits or any +other symbols will lead to ignoring the optional parameter and those +following it altogether. `D` should be a single alphanumeric symbol or +one of `-_.:|`. + +Optional parameters work both with capitalized and starred versions +where applicable. + +``` el +(setq orb-autokey-format "%A*[1][4][-]%y") => "DoeJ2020" +(setq orb-autokey-format "%A*[2][7][-]:%y") => "DoeJohn-DoeJane:2020" +``` + +4. Elisp expression + +* can be anything +* should return a string or nil +* will be evaluated before expanding other wildcards and therefore can + be used to insert other wildcards +* will have entry variable bound to the value of BibTeX entry the key + is being generated for, as returned by + bibtex-completion-get-entry. The variable may be safely manipulated + in a destructive manner. + +``` el +%e{(or (bibtex-completion-get-value "volume" entry) "N/A")} +%e{(my-function entry)} +``` + +##### Other variables + +Check variables `orb-autokey-invalid-symbols`, +`orb-autokey-empty-field-token`, `orb-autokey-titlewords-ignore` for +additional settings. + +#### Orb Anystyle + +The function `orb-anystyle` provides a convenient Elisp key--value interface +to `anystyle-cli`, and can be used anywhere else within Emacs. Check its +docstring for more information. You may also want to consult [`anystyle-cli` +documentation](https://rubydoc.info/gems/anystyle). + +###### Example +This Elisp expression: +``` el +(orb-anystyle 'parse + :format 'bib + :stdout nil + :overwrite t + :input "Doe2020.txt " + :output "bib" + :parser-model "/my/custom/model.mod") +``` + +…executes the following anystyle call: + +``` sh +anystyle --no-stdout --overwrite -F "/my/custom/model.mod" -f bib parse "Doe2020.txt" "bib" +``` + +The following variables can be used to configure `orb-anystyle` and +the default command-line options that will be passed to `anystyle`: + +###### `orb-anystyle` +* `orb-anystyle-executable` +* `orb-anystyle-user-directory` +* `orb-anystyle-default-buffer` + +###### Default command-line options +* `orb-anystyle-find-crop` +* `orb-anystyle-find-layout` +* `orb-anystyle-find-solo` +* `orb-anystyle-finder-training-set` +* `orb-anystyle-finder-model` +* `orb-anystyle-parser-model` +* `orb-anystyle-parser-training-set` +* `orb-anystyle-pdfinfo-executable` +* `orb-anystyle-pdftotext-executable` Community --------------- -For help, support, or if you just want to hang out with us, you can find us here: +For help, support, or if you just want to +hang out with us, you can find us here: * **IRC**: channel **#org-roam** on [freenode](https://freenode.net/kb/answer/chat) * **Slack**: channel **#org-roam-bibtex** on [Org Roam](https://join.slack.com/t/orgroam/shared_invite/zt-deoqamys-043YQ~s5Tay3iJ5QRI~Lxg) diff --git a/orb-anystyle.el b/orb-anystyle.el new file mode 100644 index 0000000..0ee69aa --- /dev/null +++ b/orb-anystyle.el @@ -0,0 +1,394 @@ +;;; orb-anystyle.el --- Orb Roam BibTeX: Elisp interface to anystyle -*- coding: utf-8; lexical-binding: t -*- + +;; Copyright © 2020 Mykhailo Shevchuk +;; Copyright © 2020 Leo Vivier + +;; Author: Mykhailo Shevchuk +;; Leo Vivier +;; URL: https://github.com/org-roam/org-roam-bibtex +;; Keywords: org-mode, roam, convenience, bibtex, helm-bibtex, ivy-bibtex, org-ref +;; Version: 0.2.3 + +;; This file is NOT part of GNU Emacs. + +;; This program is free software; you can redistribute it and/or modify +;; it under the terms of the GNU General Public License as published by +;; the Free Software Foundation; either version 3, or (at your option) +;; any later version. +;; +;; This program is distributed in the hope that it will be useful, +;; but WITHOUT ANY WARRANTY; without even the implied warranty of +;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +;; GNU General Public License for more details. +;; +;; You should have received a copy of the GNU General Public License +;; along with GNU Emacs; see the file COPYING. If not, write to the +;; Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, +;; Boston, MA 02110-1301, USA. + +;; N.B. This file contains code snippets adopted from other +;; open-source projects. These snippets are explicitly marked as such +;; in place. They are not subject to the above copyright and +;; authorship claims. + +;;; Commentary: +;; + +;;; Code: +;; * Library requires + +(require 'orb-core) + +(eval-when-compile + (require 'subr-x) + (require 'cl-macs)) + +;; * Customize definitions + +(defcustom orb-anystyle-executable "anystyle" + "Anystyle executable path or program name." + :type '(choice (const "anystyle") + (file :tag "Path to executable" :must-match t)) + :group 'orb-anystyle) + +(defcustom orb-anystyle-pdfinfo-executable nil + "Path to pdfinfo executable to be passed to anystyle. +When this is nil, anystyle will look for it in the system path." + :type '(choice + (file :tag "Path to executable") + (const nil)) + :group 'orb-anystyle) + +(defcustom orb-anystyle-pdftotext-executable nil + "Path to pdftotext executable to be passed to anystyle. +When this is nil, anystyle will look for it in the system path." + :type '(choice + (file :tag "Path to executable") + (const nil)) + :group 'orb-anystyle) + +(defcustom orb-anystyle-parser-model nil + "Path to anystyle custom parser model." + :type '(choice + (file :tag "Path to file" :must-match t) + (const :tag "Built-in" nil)) + :group 'orb-anystyle) + +(defcustom orb-anystyle-finder-model nil + "Path to anystyle custom finder model." + :type '(choice + (file :tag "Path to file" :must-match t) + (const :tag "Built-in" nil)) + :group 'orb-anystyle) + +;; --crop is currently broken upstream + +(defcustom orb-anystyle-find-crop nil + "Crop value in pt to be passed to `anystyle find'. +An integer or a conc cell of integers." + :type '(choice (integer :tag "Top and bottom") + (cons :tag "Top, bottom, left and right" + (integer :tag "Top and bottom") + (integer :tag "Left and right")) + (const :tag "Do not crop" nil)) + :group 'orb-anystyle) + +(defcustom orb-anystyle-find-solo nil + "Non-nil to pass the `--solo' flag." + :type '(choice (const :tag "Yes" t) + (const :tag "No" nil)) + :group 'orb-anystyle) + +(defcustom orb-anystyle-find-layout nil + "Non-nil to pass the `--layout' flag." + :type '(choice (const :tag "Yes" t) + (const :tag "No" nil)) + :group 'orb-anystyle) + +(defcustom orb-anystyle-default-buffer "*Orb Anystyle Output*" + "Default buffer name for anystyle output." + :type 'string + :group 'orb-anystyle) + +(defcustom orb-anystyle-user-directory + (concat (file-name-as-directory user-emacs-directory) "anystyle") + "Directory to keep anystyle user files." + :type 'directory + :group 'orb-anystyle) + +(defcustom orb-anystyle-parser-training-set + (concat (file-name-as-directory orb-anystyle-user-directory) "core.xml") + "XML file containing parser training data." + :type '(file :must-match t) + :group 'anystyle) + +(defcustom orb-anystyle-finder-training-set + (f-join (file-name-as-directory orb-anystyle-user-directory) "ttx/") + "Directory containing finder training data (.ttx files)." + :type 'directory + :group 'anystyle) + +;; * Main functions + +;;;###autoload +(cl-defun orb-anystyle (command + &key (exec orb-anystyle-executable) + verbose help version adapter + ((:finder-model fmodel) orb-anystyle-finder-model) + ((:parser-model pmodel) orb-anystyle-parser-model) + (pdfinfo orb-anystyle-pdfinfo-executable) + (pdftotext orb-anystyle-pdftotext-executable) + format stdout overwrite + (crop orb-anystyle-find-crop) + (solo orb-anystyle-find-solo) + (layout orb-anystyle-find-layout) + input output + (buffer orb-anystyle-default-buffer)) + "Run anystyle COMMAND with `shell-command'. +ARGS is a plist with the following recognized keys: + +Anystyle CLI options +========== +1) EXEC :exec => string (valid executable) +- default value can be set through `orb-anystyle-executable' + +2) COMMAND :command => symbol or string +- valid values: find parse help check license train + +3) Global options can be passed with the following keys. + +FMODEL :finder-model => string (valid file path) +PMODEL :parser-model => string (valid file path) +PDFINFO :pdfinfo => string (valid executable) +PDFTOTEXT :pdftotext => string (valid executable) +ADAPTER :adapter => anything +STDOUT :stdout => boolean +HELP :help => boolean +VERBOSE :verbose => boolean +VERSION :version => boolean +OVERWRITE :overwrite => boolean +FORMAT :format => string, symbol or list of unquoted symbols + +- FORMAT must be one or more output formats accepted by anystyle commands: + parse => bib csl json ref txt xml + find => bib csl json ref txt ttx xml +- string must be space- or comma-separated, additional spaces are + ignored + +Default values for some of these options can be set globally via +the following variables: `orb-anystyle-finder-model', +`orb-anystyle-parser-model', `orb-anystyle-pdfinfo-executable', +`orb-anystyle-pdftotext-executable'. + +4) Command options can be passed with the following keys: + +CROP :crop => integer or cons cell of integers +LAYOUT :layout => boolean +SOLO :solo => boolean + +- Command options are ignored for commands other than find +- anystyle help -c flag is not supported + +Default values for these options can be set globally via the +following variables: `orb-anystyle-find-crop', +`orb-anystyle-find-layout', `orb-anystyle-find-solo'. + +5) INPUT :input => string (file path) + +6) OUTPUT :output => string (file path) + +`shell-command'-related options +========== + +7) BUFFER :buffer => buffer-or-name + +- `shell-command''s OUTPUT-BUFFER +- can be a cons cell (OUTPUT-BUFFER . ERROR-BUFFER) +- when nil, defaults to `orb-anystyle-default-buffer' + +anystyle CLI command synopsis: +anystyle [global options] command [command options] [arguments...]. + +Homepage: https://anystyle.io +Github: https://github.com/inukshuk/anystyle-cli +Courtesy of its authors." + (declare (indent 1)) + (let* ((commands '(list find parse check train help license)) + (exec (executable-find exec)) + (buf (if (consp buffer) buffer (list buffer))) + ;; '(a b c) => "a,b,c" + (to-string (lambda (str) + (--reduce-from + (format "%s,%s" acc it) + (car str) (cdr str)))) + ;; debug + ;; (anystyle-run (lambda (str) + ;; (message "command: %s \nbuffers: %s and %s" str (car buf) (cdr buf)))) + (anystyle-run (lambda (str) + (if (eq command 'train) + ;; train can take minutes, so run it in a sub-process + (start-process-shell-command + "anystyle" (car buf) str) + (shell-command str + (car buf) (cdr buf))))) + global-options command-options anystyle) + ;; executable is a must + (unless exec + (user-error "Anystyle executable not found! \ +Install anystyle-cli before running Orb PDF Scrapper")) + ;; we process :version and :help before checking command + ;; since with this global flag command is not required + (cond + ;; help flag takes priority + (help + (setq global-options " --help" + command-options "" + input nil + output nil)) + ;; anystyle ignores everything with --version flag except the + ;; --help flag, which we've just resolved above + (version + (setq global-options "--version" + command nil + command-options "" + input nil + output nil)) + ;; otherwise command is a must + ((not command) + (user-error "Anystyle command required: \ +find, parse, check, train, help or license"))) + (when (stringp command) + (setq command (intern command))) + ;; command must be a valid command + (unless (memq command commands) + (user-error "Invalid command %s. Valid commands are \ +find, parse, check, train, help and license" command)) + ;; + ;; command specific arguments + (cl-case command + ('help + (when (stringp input) + (setq input (intern input))) + (unless (or (and global-options + (string= global-options " --help")) + (memq input commands)) + (user-error "Invalid input %s. Valid input for 'anystyle help': \ +find, parse, check, train, help or license" input))) + ('license + (setq input nil + output nil + global-options "" + command-options "")) + ('check + (setq output nil)) + ('find + ;; pdfinfo and pdftotext must be present in the system + (when (and pdfinfo (not (executable-find pdfinfo))) + (user-error "Executable not found: pdfinfo, %s" pdfinfo)) + (when (and pdftotext (not (executable-find pdftotext))) + (user-error "Executable not found: pdftotext, %s" pdftotext)) + (setq global-options + (orb--format "%s" global-options + " --pdfinfo=\"%s\"" pdfinfo + " --pdftotext=\"%s\"" pdftotext)) + ;; Command options + ;; N.B. Help command accepts a command option -c but it's totally + ;; irrelevant for us: + ;; + ;; [COMMAND OPTIONS] + ;; -c - List commands one per line, to assist with shell completion + ;; so we do not implement it + ;; + ;; :crop value should be integer; if no value was explicitly supplied, + ;; use the default from `orb-anystyle-find-crop' + (when crop + (unless (consp crop) + (setq crop (list crop))) + (let ((x (car crop)) + (y (or (cdr crop) 0))) + (unless (and (integerp x) + (integerp y)) + (user-error "Invalid value %s,%y. Number expected" x y)) + (setq crop (format "%s,%s" x y)))) + ;; parse only accepts --[no]-layout, so we ignore the rest + ;; append command options to command + (setq command-options + (orb--format " --crop=%s" crop + " --layout" (cons layout " --no-layout") + " --solo" (cons solo " --no-solo"))))) + ;; Arguments relevant for more than one command + ;; + ;; find, parse: + ;; format option should be one of accepted types if present + (when (and (memq command '(find parse)) + format) + (when (stringp format) + (setq format + (-map #'intern + (split-string (string-trim format) + "[, ]" t " ")))) + (unless (listp format) + (setq format (list format))) + (let ((accepted-formats + (cl-case command + ('find '(bib csl json ref txt ttx xml)) + ('parse '(bib csl json ref txt xml))))) + (when (--none? (memq it accepted-formats) format) + (user-error + "Invalid format(s) %s. Valid formats for command %s: %s" + (funcall to-string format) + command + (funcall to-string accepted-formats))) + ;; convert format to a comma-separated string and append + ;; it to global options + (setq global-options + (orb--format "%s" global-options + " -f %s" (funcall to-string format))))) + ;; find, parse, check accept + ;; finder and parser models + (when (memq command '(find parse check)) + (when (and fmodel (not (f-exists? fmodel))) + (display-warning 'org-roam-bibtex + "Finder model file not found: %s, \ +using the default one" fmodel) + (setq fmodel nil)) + (when (and pmodel (not (f-exists? pmodel))) + (display-warning 'org-roam-bibtex + "Finder model file not found: %s, \ +using the default one" pmodel) + (setq pmodel nil)) + (setq global-options (orb--format "%s" global-options + " -F \"%s\"" fmodel + " -P \"%s\"" pmodel))) + ;; find, train, parse and check: + ;; 1) require input, which should be a valid path + ;; 2) something called ruby adapter, probably a right place here + ;; 3) --verbose, --stdout, --overwrite if non-nil + (when (memq command '(find train parse check)) + (unless input + (user-error "Input required for command %s" command)) + (unless (and (stringp input) (f-exists? input)) + (user-error "Invalid input file or directory %s" input)) + (setq global-options + (orb--format + "%s" global-options + " --verbose" (cons verbose " --no-verbose") + ;; this flag does nothing for check + " --stdout" (cons stdout " --no-stdout") + " --adapter=\"%s\"" adapter + " --overwrite" (cons overwrite " --no-overwrite")))) + ;; Set arguments and run the program + ;; + (setq anystyle (orb--format "%s" exec + "%s" global-options + " %s" command + "%s" command-options + " \"%s\"" input + " \"%s\"" output)) + (funcall anystyle-run anystyle))) + +(provide 'orb-anystyle) +;;; orb-anystyle.el ends here +;; Local Variables: +;; fill-column: 79 +;; End: diff --git a/orb-compat.el b/orb-compat.el index d68de7c..dda466e 100644 --- a/orb-compat.el +++ b/orb-compat.el @@ -1,4 +1,4 @@ -;;; org-roam-bibtex-compat.el --- Connector between Org-roam, BibTeX-completion, and Org-ref -*- coding: utf-8; lexical-binding: t -*- +;;; org-roam-bibtex-compat.el --- Org Roam BibTeX: Obsolete definitions -*- coding: utf-8; lexical-binding: t -*- ;; Copyright © 2020 Mykhailo Shevchuk ;; Copyright © 2020 Leo Vivier diff --git a/orb-core.el b/orb-core.el index cf9b2ef..b07278f 100644 --- a/orb-core.el +++ b/orb-core.el @@ -43,23 +43,50 @@ (require 'org-roam) (require 'orb-utils) +(require 'orb-compat) + +(eval-when-compile + (require 'cl-macs) + (require 'subr-x) + (require 'rx)) (declare-function bibtex-completion-get-entry "bibtex-completion" (entry-key)) +(declare-function + bibtex-completion-get-value "bibtex-completion" (field entry &optional default)) (declare-function bibtex-completion-find-pdf (key-or-entry &optional find-additional)) -;; Customize groups +;; * Customize groups (global) +;; All modules should put their `defgroup' definitions here + (defgroup org-roam-bibtex nil "Org-ref and Bibtex-completion integration for Org-roam." :group 'org-roam :prefix "orb-") (defgroup orb-note-actions nil - "Orb Note Actions - run actions useful in note's context." + "Orb Note Actions - run actions in note's context." :group 'org-roam-bibtex :prefix "orb-note-actions-") +(defgroup orb-pdf-scrapper nil + "Orb PDF Scrapper - retrieve references from PDF." + :group 'org-roam-bibtex + :prefix "orb-pdf-scrapper-") + +(defgroup orb-anystyle nil + "Elisp interface to `anystyle-cli`." + :group 'org-roam-bibtex + :prefix "orb-anystyle-") + +(defgroup orb-autokey nil + "Automatic generation of BibTeX citation keys." + :group 'org-roam-bibtex + :prefix "orb-autokey-") + +;; Various utility functions + ;;;###autoload (defun orb-process-file-field (citekey) "Process the 'file' BibTeX field and resolve if there are multiples. @@ -84,6 +111,322 @@ Returns the path to the note file, or nil if it doesn’t exist." (let* ((completions (org-roam--get-ref-path-completions))) (plist-get (cdr (assoc citekey completions)) :path))) +;; * Automatic generation of citation keys + +(defcustom orb-autokey-format "%a%y%T[4][1]" + "Format string for automatically generated citation keys. + +Supported wildcards: + +Basic +========== + + %a |author| - first author's (or editor's) last name + %t |title | - first word of title + %f{field} |field | - first word of arbitrary field + %y |year | - year YYYY + %p |page | - first page + %e{(expr)} |elisp | - execute elisp expression + +Extended +========== + +1. Capitalized versions: + + %A |author| > + %T |title | > Same as %a,%t,%f{field} but + %F{field} |field | > preserve original capitalization + +2. Starred versions + + %a*, %A* |author| - include author's (editor's) initials + %t*, %T* |title | - do not ignore words in `orb-autokey-titlewords-ignore' + %y* |year | - year's last two digits __YY + %p* |page | - use \"pagetotal\" field instead of default \"pages\" + +3. Optional parameters + + %a[N][M][D] |author| > + %t[N][M][D] |title | > include first N words/names + %f{field}[N][M][D] |field | > include at most M first characters of word/name + %p[D] |page | > put delimiter D between words + +N and M should be a single digit 1-9. Putting more digits or any +other symbols will lead to ignoring the optional parameter and +those following it altogether. D should be a single alphanumeric +symbol or one of `-_.:|'. + +Optional parameters work both with capitalized and starred +versions where applicable. + +4. Elisp expression + + - can be anything + - should return a string or nil + - will be evaluated before expanding other wildcards and therefore +can insert other wildcards + - will have `entry' variable bound to the value of BibTeX entry the key +is being generated for, as returned by `bibtex-completion-get-entry'. +The variable may be safely manipulated in a destructive manner. + +%e{(or (bibtex-completion-get-value \"volume\" entry) \"N/A\")} +%e{(my-function entry)} + +Key generation is performed by `orb-autokey-generate-key'." + :risky t + :type 'string + :group 'org-roam-bibtex) + +(defcustom orb-autokey-titlewords-ignore + '("A" "An" "On" "The" "Eine?" "Der" "Die" "Das" + "[^[:upper:]].*" ".*[^[:upper:][:lower:]0-9].*") + "Patterns from title that will be ignored during key generation. +Every element is a regular expression to match parts of the title +that should be ignored during automatic key generation. Case +sensitive." + ;; Default value was take from `bibtex-autokey-titleword-ignore'. + :type '(repeat :tag "Regular expression" regexp) + :group 'orb-autokey) + +(defcustom orb-autokey-empty-field-token "N/A" + "String to use when BibTeX field is nil or empty." + :type 'string + :group 'orb-autokey) + +(defcustom orb-autokey-invalid-symbols + " \"'()={},~#%\\" + "Characters not allowed in a BibTeX key. +The key will be stripped of these characters." + :type 'string + :group 'orb-autokey) + +;;; +;;;###autoload +(defun orb-autokey-generate-key (entry &optional control-string) + "Generate citation key from ENTRY according to `orb-autokey-format'. +Return a string. If optional CONTROL-STRING is non-nil, use it +instead of `orb-autokey-format'." + (let* ((case-fold-search nil) + (str (or control-string orb-autokey-format)) + ;; star regexp: group 3! + (star '(opt (group-n 3 "*"))) + ;; optional parameters: regexp groups 4-6! + (opt1 '(opt (and "[" (opt (group-n 4 digit)) "]"))) + (opt2 '(opt (and "[" (opt (group-n 5 digit)) "]"))) + (opt3 '(opt (and "[" (opt (group-n 6 (any alnum "_.:|-"))) "]"))) + ;; capital letters: regexp group 2! + ;; author wildcard regexp + (a-rx (macroexpand + `(rx (group-n 1 (or "%a" (group-n 2 "%A")) + ,star ,opt1 ,opt2 ,opt3)))) + ;; title wildcard regexp + (t-rx (macroexpand + `(rx (group-n 1 (or "%t" (group-n 2 "%T")) + ,star ,opt1 ,opt2 ,opt3)))) + ;; any field wildcard regexp + ;; required parameter: group 7! + (f-rx (macroexpand + `(rx (group-n 1 (or "%f" (group-n 2 "%F")) + (and "{" (group-n 7 (1+ letter)) "}") + ,opt1 ,opt2 ,opt3)))) + ;; year wildcard regexp + (y-rx (rx (group-n 1 "%y" (opt (group-n 3 "*"))))) + ;; page wildcard regexp + (p-rx (macroexpand `(rx (group-n 1 "%p" ,star ,opt3)))) + ;; elisp expression wildcard regexp + ;; elisp sexp: group 8! + (e-rx (rx (group-n 1 "%e" + "{" (group-n 8 "(" (1+ ascii) ")") "}")))) + ;; Evaluating elisp expression should go the first because it can produce + ;; additional wildcards + (while (string-match e-rx str) + (setq str (replace-match + (save-match-data + (orb--autokey-evaluate-expression + (match-string 8 str) entry)) t nil str 1))) + ;; Expanding all other wildcards are actually + ;; variations of calls to `orb--autokey-format-field' with many + ;; commonalities, so we wrap it into a macro + (cl-macrolet + ((expand + (wildcard &key field value entry capital + starred words characters delimiter) + (let ((cap (or capital '(match-string 2 str))) + (star (or starred '(match-string 3 str))) + (opt1 (or words '(match-string 4 str))) + (opt2 (or characters '(match-string 5 str))) + (opt3 (or delimiter '(match-string 6 str)))) + `(while (string-match ,wildcard str) + (setq str (replace-match + ;; we can safely pass nil values + ;; `orb--autokey-format-field' should + ;; handle them correctly + (orb--autokey-format-field ,field + :entry ,entry :value ,value + :capital ,cap :starred ,star + :words ,opt1 :characters ,opt2 :delimiter ,opt3) + t nil str 1)))))) + ;; Handle author wildcards + (expand a-rx + :field "=name=" + :value (or (bibtex-completion-get-value "author" entry) + (bibtex-completion-get-value "editor" entry))) + ;; Handle title wildcards + (expand t-rx + :field "title" + :value (or (bibtex-completion-get-value "title" entry) "")) + ;; Handle custom field wildcards + (expand f-rx + :field (match-string 7 str) + :entry entry) + ;; Handle pages wildcards %p*[-] + (expand p-rx + :field (if (match-string 3 str) + "pagetotal" "pages") + :entry entry + :words "1")) + ;; Handle year wildcards + ;; it's simple, so we do not use `orb--autokey-format-field' here + ;; year should be well-formed: YYYY + ;; TODO: put year into cl-macrolet + (let ((year (or (bibtex-completion-get-value "year" entry) + (bibtex-completion-get-value "date" entry)))) + (if (or (not year) + (string-empty-p year) + (string= year orb-autokey-empty-field-token)) + (while (string-match y-rx str) + (setq str (replace-match orb-autokey-empty-field-token + t nil str 1))) + (while (string-match y-rx str) + (setq year (format "%04d" (string-to-number year)) + str (replace-match + (format "%s" (if (match-string 3 str) + (substring year 2 4) + (substring year 0 4))) + t nil str 1))))) + str)) + +(defun orb--autokey-format-field (field &rest specs) + "Return BibTeX FIELD formatted according to plist SPECS. + +Recognized keys: +========== +:entry - BibTeX entry to use +:value - Value of BibTeX field to use + instead retrieving it from :entry +:capital - capitalized version +:starred - starred version +:words - first optional parameter (number of words) +:characters - second optional parameter (number of characters) +:delimiter - third optional parameter (delimiter) + +All values should be strings, including those representing numbers. + +This function is used internally by `orb-autokey-generate-key'." + (declare (indent 1)) + (-let* (((&plist :entry entry + :value value + :capital capital + :starred starred + :words words + :characters chars + :delimiter delim) specs) + ;; field values will be split into a list of words. `separator' is a + ;; regexp for word separators: either a whitespace, one or more + ;; dashes, or en dash, or em dash + (separator "\\([ \n\t]\\|[-]+\\|[—–]\\)") + (invalid-chars-rx + (rx-to-string `(any ,orb-autokey-invalid-symbols) t)) + (delim (or delim "")) + result) + ;; 0. virtual field "=name=" is used internally here and in + ;; `orb-autokey-generate-key'; it stands for author or editor + (if (string= field "=name=") + ;; in name fields, logical words are full names consisting of several + ;; words and containing spaces and punctuation, separated by a logical + ;; separator, the word "and" + (setq separator " and " + value (or value + (bibtex-completion-get-value "author" entry) + (bibtex-completion-get-value "editor" entry))) + ;; otherwise proceed with value or get it from entry + (setq value (or value + (bibtex-completion-get-value field entry)))) + (if (or (not value) + (string-empty-p value)) + (setq result orb-autokey-empty-field-token) + (when (> (length value) 0) + (save-match-data + ;; 1. split field into words + (setq result (split-string value separator t "[ ,.;:-]+")) + ;; 1a) only for title; + ;; STARRED = include words from `orb-autokey-titlewords-ignore + ;; unstarred version filters the keywords, starred ignores this block + (when (and (string= field "title") + (not starred)) + (let ((ignore-rx (concat "\\`\\(:?" + (mapconcat #'identity + orb-autokey-titlewords-ignore + "\\|") "\\)\\'")) + (words ())) + (setq result (dolist (word result (nreverse words)) + (unless (string-match-p ignore-rx words) + (push word words)))))) + ;; 2. take number of words equal to WORDS if that is set + ;; or just the first word; also 0 = 1. + (if words + (setq words (string-to-number words) + result (-take (if (> words (length result)) + (length result) + words) + result)) + (setq result (list (car result)))) + ;; 2a) only for "=name=" field, i.e. author or editor + ;; STARRED = include initials + (when (string= field "=name=") + ;; NOTE: here we expect name field 'Doe, J. B.' + ;; should ideally be able to handle 'Doe, John M. Longname, Jr' + (let ((r-x (if starred + "[ ,.\t\n]" + "\\`\\(.*?\\),.*\\'")) + (rep (if starred "" "\\1")) + (words ())) + (setq result + (dolist (name result (nreverse words)) + (push (s-replace-regexp r-x rep name) words))))) + ;; 3. take at most CHARS number of characters from every word + (when chars + (let ((words ())) + (setq chars (string-to-number chars) + result (dolist (word result (nreverse words)) + (push + (substring word 0 + (if (< chars (length word)) + chars + (length word))) + words))))) + ;; 4. almost there: concatenate words, include DELIMiter + (setq result (mapconcat #'identity result delim)) + ;; 5. CAPITAL = preserve case + (unless capital + (setq result (downcase result)))))) + ;; return result stripped of the invalid characters + (s-replace-regexp invalid-chars-rx "" result t))) + +(defun orb--autokey-evaluate-expression (expr &optional entry) + "Evaluate arbitrary elisp EXPR passed as readable string. +The expression will have value of ENTRY bound to `entry' variable +at its disposal. ENTRY should be a BibTeX entry as returned by +`bibtex-completion-get-entry'. The result returned should be a +string or nil." + (let ((result (eval `(let ((entry (quote ,(copy-tree entry)))) + ,(read expr))))) + (unless (or (stringp result) + (not result)) + (user-error "Result: %s, invalid type. \ +Expression must be string or nil" result)) + (or result ""))) + (provide 'orb-core) ;;; orb-core.el ends here ;; Local Variables: diff --git a/orb-note-actions.el b/orb-note-actions.el index 97841cc..efc21b3 100644 --- a/orb-note-actions.el +++ b/orb-note-actions.el @@ -54,6 +54,8 @@ (declare-function org-ref-format-entry "org-ref-bibtex" (key)) +(declare-function orb-pdf-scrapper-run "orb-pdf-scrapper" (key)) + ;; * Customize definitions (defcustom orb-note-actions-frontend 'default @@ -94,7 +96,8 @@ Each action is a cons cell DESCRIPTION . FUNCTION." :group 'orb-note-actions) (defcustom orb-note-actions-extra - '(("Save citekey to kill-ring and clipboard" . orb-note-actions-copy-citekey)) + '(("Save citekey to kill-ring and clipboard" . orb-note-actions-copy-citekey) + ("Run Orb PDF Scrapper" . orb-note-actions-scrap-pdf)) "Extra actions for `orb-note-actions'. Each action is a cons cell DESCRIPTION . FUNCTION." :risky t @@ -127,7 +130,7 @@ CANDIDATES. NAME is a string formatted with constructed from `orb-note-actions-default', `orb-note-actions-extra', and `orb-note-actions-user." (declare (indent 1) (debug (symbolp &rest form))) - (let* ((frontend-name (symbol-name (eval frontend))) + (let* ((frontend-name (symbol-name frontend)) (fun-name (intern (concat "orb-note-actions--" frontend-name)))) `(defun ,fun-name (citekey) ,(format "Provide note actions using %s interface. @@ -140,18 +143,18 @@ CITEKEY is the citekey." (capitalize frontend-name)) orb-note-actions-user)))) ,@body)))) -(orb-note-actions--frontend! 'default +(orb-note-actions--frontend! default (let ((f (cdr (assoc (completing-read name candidates) candidates)))) (funcall f (list citekey)))) -(orb-note-actions--frontend! 'ido +(orb-note-actions--frontend! ido (let* ((c (cl-map 'list 'car candidates)) (f (cdr (assoc (ido-completing-read name c) candidates)))) (funcall f (list citekey)))) (declare-function orb-note-actions-hydra/body "orb-note-actions" nil t) -(orb-note-actions--frontend! 'hydra +(orb-note-actions--frontend! hydra ;; we don't use candidates here because for a nice hydra we need each ;; group of completions separately (default, extra, user), so just ;; silence the compiler @@ -187,7 +190,7 @@ CITEKEY is the citekey." (capitalize frontend-name)) Falling back to default.") (orb-note-actions--default citekey))) -(orb-note-actions--frontend! 'ivy +(orb-note-actions--frontend! ivy (if (fboundp 'ivy-read) (ivy-read name candidates @@ -199,7 +202,7 @@ Falling back to default.") Falling back to default.") (orb-note-actions--default citekey))) -(orb-note-actions--frontend! 'helm +(orb-note-actions--frontend! helm (if (fboundp 'helm) (helm :sources `(((name . ,name) @@ -218,13 +221,19 @@ Falling back to default.") ;; * Note actions (defun orb-note-actions-copy-citekey (citekey) - "Save note's citekey to `kill-ring' and copy it to clipboard. -Since CITEKEY is actually a list of one element, the car of the -list is used." + "Save note's citation key to `kill-ring' and copy it to clipboard. +CITEKEY is a list whose car is a citation key." (with-temp-buffer (insert (car citekey)) (copy-region-as-kill (point-min) (point-max)))) +(defun orb-note-actions-scrap-pdf (citekey) + "Wrapper around `orb-pdf-scrapper-insert'. +CITEKEY is a list whose car is a citation key." + (require 'orb-pdf-scrapper) + (orb-pdf-scrapper-run (car citekey))) + + ;; * Main functions ;;;###autoload diff --git a/orb-pdf-scrapper.el b/orb-pdf-scrapper.el new file mode 100644 index 0000000..33362c8 --- /dev/null +++ b/orb-pdf-scrapper.el @@ -0,0 +1,1133 @@ +;;; orb-pdf-scrapper.el --- Orb Roam BibTeX: PDF reference scrapper -*- coding: utf-8; lexical-binding: t -*- + +;; Copyright © 2020 Mykhailo Shevchuk +;; Copyright © 2020 Leo Vivier + +;; Author: Mykhailo Shevchuk +;; Leo Vivier +;; URL: https://github.com/org-roam/org-roam-bibtex +;; Keywords: org-mode, roam, convenience, bibtex, helm-bibtex, ivy-bibtex, org-ref +;; Version: 0.2.3 + +;; This file is NOT part of GNU Emacs. + +;; This program is free software; you can redistribute it and/or modify +;; it under the terms of the GNU General Public License as published by +;; the Free Software Foundation; either version 3, or (at your option) +;; any later version. +;; +;; This program is distributed in the hope that it will be useful, +;; but WITHOUT ANY WARRANTY; without even the implied warranty of +;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the +;; GNU General Public License for more details. +;; +;; You should have received a copy of the GNU General Public License +;; along with GNU Emacs; see the file COPYING. If not, write to the +;; Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, +;; Boston, MA 02110-1301, USA. + +;; N.B. This file contains code snippets adopted from other +;; open-source projects. These snippets are explicitly marked as such +;; in place. They are not subject to the above copyright and +;; authorship claims. + +;;; Commentary: +;; + +;;; Code: +;; * Library requires + +(require 'orb-core) +(require 'orb-anystyle) + +;; it's fine here since `orb-pdf-scrapper' is autoloaded +(require 'bibtex-completion) + +(require 'bibtex) +(require 'rx) +(require 'cl-extra) + +(eval-when-compile + (require 'cl-lib) + (require 'cl-macs) + (require 'subr-x)) + +(declare-function bibtex-set-field "org-ref" (field value &optional nodelim)) + +;; * Customize definitions + +;; TODO: make these defcustom + +(defcustom orb-pdf-scrapper-refsection-headings + '((parent "References") + (in-roam "In Org Roam database" list) + (in-bib "In BibTeX file" list) + (valid "Valid citation keys" table) + (invalid "Invalid citation keys" table)) + "Determines appearence of Org-mode data generated by Org PDF Scrapper. +A list of five elements of form (GROUP TITLE TYPE). + +GROUP must be one of the symbols `parent', `in-roam', `in-bib', +`valid' or `invalid'. + +TITLE is an arbitrary string, which will be the title of the +group's headline. + +TYPE must be one of the symbols `list' or `table' determining how +the generated citations will appear under the group's headline. +TYPE is ignored for the `parent' group and defaults to `list' for +other groups when set to nil." + :type '(list (list :tag "Parent headline" + (const :format "" parent) + (string :tag "Title")) + (list :tag "\nIn-roam" + (const :format "" in-roam) + (string :tag "Title") + (radio :tag "Type" :value list + (const list) (const table))) + (list :tag "\nIn-bib" + (const :format "" in-bib) + (string :tag "Title") + (radio :tag "Type" :value list + (const list) (const table))) + (list :tag "\nValid" + (const :format "" valid) + (string :tag "Title") + (radio :tag "Type" :value table + (const list) (const table))) + (list :tag "\nInvalid" + (const :format "" invalid) + (string :tag "Title") + (radio :tag "Type" :value table + (const list) (const table)))) + :group 'orb-pdf-scrapper) + +(defcustom orb-pdf-scrapper-set-fields + '(("author" orb-pdf-scrapper--invalidate-nil-value) + ("editor" orb-pdf-scrapper--invalidate-nil-value + "book" "collection") + ("title" orb-pdf-scrapper--invalidate-nil-value) + ("journal" orb-pdf-scrapper--invalidate-nil-value + "article") + ("date" orb-pdf-scrapper--invalidate-nil-value) + ("volume" orb-pdf-scrapper--invalidate-nil-value + "article" "incollection") + ("pages" orb-pdf-scrapper--fix-or-invalidate-range + "article" "incollection")) + "BibTeX fields to set during key generation. +A list in which each element is the of the form (FIELD FUNCTION . TYPES). + +FIELD is a BibTeX field name to be set. + +FUNCTION is a function that will be called to generate the value, +it takes one argument ENTRY, which is the current entry. + +TYPES is a list of strings corresponding to BibTeX entry types +for which the FIELD should be set. If it is nil, set the FIELD +for all entry types." + :risky t + :type '(repeat + (list :tag "Item" + (string :tag "Field") + (function :tag "Function") + (repeat :tag "Entry types" :inline t + (string :tag "Type")))) + :group 'orb-pdf-scrapper) +(defcustom orb-pdf-scrapper-export-fields + '("author" "editor" "journal" "date" "volume" "pages") + "BibTeX fields to export into Org mode table. +A list in which each element is of form (FIELD . TYPES). + +FIELD is a field to export. + +TYPES is a list of strings corresponding to BibTeX entry types +for which the FIELD should be set. If it is nil, set the FIELD +for all entry types." + :type '(repeat (string :tag "Field")) + :group 'org-pdf-scrapper) + +(defcustom orb-pdf-scrapper-invalid-key-pattern "\\`.*N/A.*\\'" + "Regexp to match an invalid key." + :type 'regexp + :group 'orb-pdf-scrapper) + +;; * Helper functions: citekey generation + +(defvar orb-pdf-scrapper--refs nil) + +(defun orb-pdf-scrapper--invalidate-nil-value (field entry) + "Return value of FIELD or `orb-autokey-empty-field-token' if it is nil. +ENTRY is a BibTeX entry." + (bibtex-completion-get-value field entry orb-autokey-empty-field-token)) + +(defun orb-pdf-scrapper--fix-or-invalidate-range (field entry) + "Replace missing or non-standard delimiter between two strings with \"--\". +FIELD is the name of a BibTeX field from ENTRY. Return +`orb-autokey-empty-field-token' if the value is nil. + +This function is primarily intended for fixing anystyle parsing +artefacts such as those often encountered in \"pages\" field, +where two numbers have only spaces between them." + (replace-regexp-in-string "\\`[[:alnum:]]*?\\([- –]+\\)[[:alnum:]]*\\'" + "--" + (bibtex-completion-get-value + field entry orb-autokey-empty-field-token) + nil nil 1)) + +(defun orb-pdf-scrapper--get-entry-info (entry &optional collect-only) + "Collect some information from and about the BibTeX ENTRY for further use. +Take a bibtex entry as returned by `bibtex-completion-get-entry'\ +and return a plist with the following keys set: + +:key |string | citekey generated with `orb-autokey-generate-key' +:validp |boolean| according to `orb-pdf-scrapper-invalid-key-pattern' +:set-fields |(cons) | as per `orb-pdf-scrapper-set-fields' +:export-fields |(cons) | as per `orb-pdf-scrapper-export-fields' + +Each element of `:set-fields' and `:export-fields' lists is a +a cons cell (FIELD . VALUE). + +If optional COLLECT-ONLY is non-nil, do not generate the key, +`:set-fields' is set to nil." + (let ((type (bibtex-completion-get-value "=type=" entry)) + ;; return values + key validp set-fields export-fields + ;; internal variable + fields) + ;; when requested to collect keys, just do that + (if collect-only + (setq key (bibtex-completion-get-value "=key=" entry) + fields entry) + ;; otherwise + ;; prepare fields for setting + (dolist (set-field orb-pdf-scrapper-set-fields) + (let ((field-name (car set-field)) + (export-types (cddr set-field))) + ;; push the field for setting only when entry type is one of the + ;; specified types or nil, which means set the field regardless of + ;; entry type + (when (or (not export-types) + (member type export-types)) + (push (cons field-name + ;; call the function if provided + (if-let ((fn (cadr set-field))) + (funcall fn field-name entry) + ;; otherwise get the value from current entry + (bibtex-completion-get-value field-name entry ""))) + set-fields)))) + ;; prioritize fields from set-fields over entry fields + ;; for autokey generation + (let ((-compare-fn (lambda (x y) + (string= (car x) (car y))))) + (setq fields (-union set-fields entry) + key (orb-autokey-generate-key fields)))) + ;; validate the new shiny key (or the old existing one) + ;; not sure if save-match-data is needed here + ;; but it seems to be always a good choice + (save-match-data + (setq validp (and (not (string-match-p + orb-pdf-scrapper-invalid-key-pattern key)) + t))) + ;; list fields for org export + (dolist (field orb-pdf-scrapper-export-fields) + (let ((value (bibtex-completion-get-value field fields ""))) + ;; truncate author list to first three names, append et.al instead + ;; of the remaining names + ;; This is a hard-coded "reasonable default" + ;; and it may be replaced with something more + ;; flexible in the future + (when (or (string= field "author") + (string= field "editor")) + (setq value (split-string value " and " t "[ ,.;:-]+") + value (if (> (length value) 3) + (append (-take 3 value) '("et.al.")) + value) + value (concat (mapconcat #'identity value "; ")))) + (push (cons field value) export-fields))) + ;; return the entry + (list :key key + :validp validp + :set-fields set-fields + :export-fields (nreverse export-fields)))) + +(defun orb-pdf-scrapper--update-record-at-point (&optional collect-only) + "Generate citation key and update the BibTeX record at point. +Calls `orb-pdf-scrapper--get-entry-info' to get information about +BibTeX record at point and updates it accordingly. If optional +COLLECT-ONLY is non-nil, do not generate the key and do not set +the fields. + +This is an auxiliary function for command +`orb-pdf-scrapper-generate-keys'." + (let* ((entry (parsebib-read-entry (parsebib-find-next-item))) + (key-plist (orb-pdf-scrapper--get-entry-info entry collect-only)) + (new-key (plist-get key-plist :key)) + (validp (plist-get key-plist :validp)) + (fields-to-set (plist-get key-plist :set-fields)) + (formatted-entry (plist-get key-plist :export-fields))) + (unless collect-only + (save-excursion + ;; update citekey + ;; adjusted from bibtex-clean-entry + (bibtex-beginning-of-entry) + (re-search-forward bibtex-entry-maybe-empty-head) + (if (match-beginning bibtex-key-in-head) + (delete-region (match-beginning bibtex-key-in-head) + (match-end bibtex-key-in-head))) + (insert new-key) + ;; set the bibtex fields + (when fields-to-set + (dolist (field fields-to-set) + (bibtex-set-field (car field) (cdr field)))))) + ;; return the result ((NEW-KEY . ENTRY) . VALIDP) + ;; TODO: for testing until implemented + (cons new-key (cons formatted-entry validp)))) + +(defun orb-pdf-scrapper--sort-refs (refs) + "Sort references REFS. +Auxiliary function for `orb-pdf-scrapper-generate-keys'. +REFS should be an alist of form ((CITEKEY . FORMATTED-ENTRY) . VALIDP). + +References marked valid by `orb-pdf-scrapper-keygen-function' function +are further sorted into four groups: + +'in-roam - available in the `org-roam' database; +'in-bib - available in `bibtex-completion-bibliography' file(s); +'valid - marked valid by the keygen function but are not +available in the user databases; +'invalid - marked invalid by the keygen function." + (let* ((bibtex-completion-bibliography (orb-pdf-scrapper--get :global-bib)) + ;; When using a quoted list here, sorted-refs is not erased in + ;; consecutive runs + (sorted-refs (list (list 'in-roam) (list 'in-bib) + (list 'valid) (list 'invalid)))) + (dolist (ref refs) + (cond ((org-roam-db-query [:select [ref] + :from refs + :where (= ref $s1)] + (format "%s" (car ref))) + (push + (cons (format "cite:%s" (car ref)) (cadr ref)) + (cdr (assoc 'in-roam sorted-refs)))) + ((bibtex-completion-get-entry (car ref)) + (push + (cons (format "cite:%s" (car ref)) (cadr ref)) + (cdr (assoc 'in-bib sorted-refs)))) + ((cddr ref) + (push + (cons (format "cite:%s" (car ref)) (cadr ref)) + (cdr (assoc 'valid sorted-refs)))) + (t + (push + (cons (format "cite:%s" (car ref)) (cadr ref)) + (cdr (assoc 'invalid sorted-refs)))))) + sorted-refs)) + +;; * Helper functions: dispatcher + +(defvar orb-pdf-scrapper--plist nil + "Communication channel for Orb PDF Scrapper.") + +(defvar orb-pdf-scrapper--buffer "*Orb PDF Scrapper*" + "Orb PDF Scrapper special buffer.") + +(defmacro orb--with-scrapper-buffer! (&rest body) + "Execute BODY with `orb-pdf-scrapper--buffer' as current. +If the buffer does not exist it will be created." + (declare (indent 0) (debug t)) + `(save-current-buffer + (set-buffer (get-buffer-create orb-pdf-scrapper--buffer)) + ,@body)) + +(defmacro orb--when-current-context! (context &rest body) + "Execute BODY if CONTEXT is current context. +Run `orb-pdf-scrapper-keygen-function' with `error' context +otherwise. If CONTEXT is a list then current context must be a +member of that list." + (declare (indent 1) (debug t)) + `(if (not (orb-pdf-scrapper--current-context-p ,context)) + (orb-pdf-scrapper-dispatcher 'error) + ,@body)) + +(defun orb-pdf-scrapper--current-context-p (context) + "Return t if CONTEXT is current context. +CONTEXT can also be a list, in which case t is returned when +current context is its memeber." + (if (listp context) + (memq (orb-pdf-scrapper--get :context) context) + (eq (orb-pdf-scrapper--get :context) context))) + +(defun orb-pdf-scrapper--refresh-mode (mode) + "Restart `orb-pdf-scrapper-mode' with new major MODE." + (cl-case mode + ('txt + (text-mode) + (orb-pdf-scrapper--put :callee 'edit-bib + :context 'start + :caller 'edit-txt)) + ('bib + (bibtex-mode) + ;; anystyle uses biblatex dialect + (bibtex-set-dialect 'biblatex t) + (orb-pdf-scrapper--put :callee 'edit-org + :context 'start + :caller 'edit-bib)) + ('org + (org-mode) + (orb-pdf-scrapper--put :callee 'checkout + :context 'start + :caller 'edit-org)) + ('xml + (xml-mode) + (cl-case (orb-pdf-scrapper--get :context) + ;; since :callee is not used in training session, we set :callee here to + ;; the original :caller, so that we can return to the editing mode we + ;; were called from if the training session is to be cancelled + ('start + (orb-pdf-scrapper--put :callee (orb-pdf-scrapper--get :caller) + :context 'edit + :caller 'edit-xml)))) + ('train + (fundamental-mode) + (cl-case (orb-pdf-scrapper--get :context) + ('train + (orb-pdf-scrapper--put :context 'train + :caller 'train)) + ;; Since the session was not cancelled, we return to text, as everything + ;; else should be regenerated anyway. + ('finished + (orb-pdf-scrapper--put :callee 'edit-txt + :context 'continue + :caller 'train)))) + (t + (unwind-protect + (error "Oops...something went wrong. \ +Pressing the RED button, just in case") + (orb-pdf-scrapper-dispatcher 'error)))) + (set-buffer-modified-p nil) + (setq mark-active nil) + (orb-pdf-scrapper-mode -1) + (orb-pdf-scrapper-mode +1) + (goto-char (point-min))) + +(defun orb-pdf-scrapper--edit-txt () + "Edit text references in `orb-pdf-scrapper--buffer'." + ;; callee will be overridden in case of error + (cl-case (orb-pdf-scrapper--get :context) + ;; parse pdf file and switch to text editing mode + ('start + (let ((temp-txt (orb--temp-file "orb-pdf-scrapper-" ".txt")) + (pdf-file (orb-pdf-scrapper--get :pdf-file))) + (orb-pdf-scrapper--put :temp-txt temp-txt) + (let ((same-window-buffer-names (list orb-pdf-scrapper--buffer))) + (pop-to-buffer orb-pdf-scrapper--buffer)) + (setq buffer-file-name nil) + (orb--with-message! (format "Scrapping %s.pdf" (f-base pdf-file)) + (erase-buffer) + (orb-anystyle 'find + :format 'ref + :layout nil + :finder-model orb-anystyle-finder-model + :input pdf-file + :stdout t + :buffer orb-pdf-scrapper--buffer)) + (setq buffer-undo-list nil) + (orb-pdf-scrapper--refresh-mode 'txt))) + ;; read the previously generated text file + ('continue + (if-let ((temp-txt (orb-pdf-scrapper--get :temp-txt)) + (f-exists? temp-txt)) + (progn + (pop-to-buffer orb-pdf-scrapper--buffer) + (erase-buffer) + (insert-file-contents temp-txt) + (setq buffer-undo-list (orb-pdf-scrapper--get :txt-undo-list)) + (orb-pdf-scrapper--refresh-mode 'txt)) + (orb-pdf-scrapper-dispatcher 'error))) + (t + (orb-pdf-scrapper-dispatcher 'error)))) + +(defun orb-pdf-scrapper--edit-bib () + "Generate and edit BibTeX data in `orb-pdf-scrapper--buffer'." + (pop-to-buffer orb-pdf-scrapper--buffer) + (cl-case (orb-pdf-scrapper--get :context) + ('start + (let* ((temp-bib (or (orb-pdf-scrapper--get :temp-bib) + (orb--temp-file "orb-pdf-scrapper-" ".bib")))) + (orb-pdf-scrapper--put :temp-bib temp-bib) + ;; save previous progress in txt buffer + (write-region (orb--buffer-string) + nil (orb-pdf-scrapper--get :temp-txt) nil -1) + (orb-pdf-scrapper--put :txt-undo-list (copy-tree buffer-undo-list)) + (orb--with-message! "Generating BibTeX data" + ;; Starting from Emacs 27, whether shell-command erases buffer + ;; is controlled by `shell-command-dont-erase-buffer', so we + ;; make sure the buffer is clean + (erase-buffer) + (orb-anystyle 'parse + :format 'bib + :parser-model orb-anystyle-parser-model + :input (orb-pdf-scrapper--get :temp-txt) + :stdout t + :buffer orb-pdf-scrapper--buffer) + (write-region (orb--buffer-string) nil temp-bib nil -1)) + (setq buffer-undo-list nil)) + (orb-pdf-scrapper--refresh-mode 'bib)) + ('continue + (if-let ((temp-bib (orb-pdf-scrapper--get :temp-bib)) + (f-exists? temp-bib)) + (progn + (erase-buffer) + (insert-file-contents temp-bib) + (setq buffer-undo-list (orb-pdf-scrapper--get :bib-undo-list)) + (orb-pdf-scrapper--refresh-mode 'bib)) + (orb-pdf-scrapper-dispatcher 'error))) + (t + (orb-pdf-scrapper-dispatcher 'error)))) + +(defun orb-pdf-scrapper--insert-org-as-list (ref-alist) + "Insert REF-ALIST as Org-mode list." + (dolist (ref ref-alist) + (insert "- " (car ref) "\n" ))) + +(defun orb-pdf-scrapper--insert-org-as-table (ref-alist) + "Insert REF-ALIST as Org-mode table." + (insert + (concat "|citekey|" + (mapconcat #'identity + orb-pdf-scrapper-export-fields "|") + "|\n")) + (forward-line -1) + (org-table-insert-hline) + (forward-line 2) + (let ((table "")) + (dolist (ref ref-alist) + (setq table + (format "%s|%s|%s|\n" table (car ref) + (mapconcat + (lambda (field) + (bibtex-completion-get-value field (cdr ref) "")) + orb-pdf-scrapper-export-fields "|")))) + (insert table)) + (forward-line -1) + (org-table-align)) + +(defun orb-pdf-scrapper--edit-org () + "Edit generated Org-mode data." + (pop-to-buffer orb-pdf-scrapper--buffer) + (cl-case (orb-pdf-scrapper--get :context) + ('start + ;; if the BibTeX buffer was modified, save it and maybe generate keys + (orb-pdf-scrapper-generate-keys + nil + (if (buffer-modified-p) + ;; TODO: it's clumsy + ;; not "yes" means generate + ;; not "no" means collect only + (not (y-or-n-p "Generate BibTeX keys? ")) + t)) + (when (> (cl-random 100) 98) + (orb--with-message! "Pressing the RED button")) + (write-region (orb--buffer-string) + nil (orb-pdf-scrapper--get :temp-bib) nil 1) + (orb-pdf-scrapper--put :bib-undo-list (copy-tree buffer-undo-list)) + ;; generate Org-mode buffer + (let* ((temp-org (or (orb-pdf-scrapper--get :temp-org) + (orb--temp-file "orb-pdf-scrapper-" ".org")))) + (orb-pdf-scrapper--put :temp-org temp-org + :caller 'edit-org) + ;; we must change the mode in the beginning to get all the Org-mode + ;; facilities + (orb-pdf-scrapper--refresh-mode 'org) + (orb--with-message! "Generating Org data" + (erase-buffer) + ;; insert parent heading + (org-insert-heading nil nil t) + (insert + (concat + (cadr (assoc 'parent orb-pdf-scrapper-refsection-headings)) + " (retrieved by Orb PDF Scrapper from " + (f-filename (orb-pdf-scrapper--get :pdf-file)) ")")) + (org-end-of-subtree) + ;; insert child headings: in-roam, in-bib, valid, invalid + (dolist (ref-group + (orb-pdf-scrapper--sort-refs orb-pdf-scrapper--refs)) + (when-let* ((group (car ref-group)) + (refs (cdr ref-group)) + (heading + (cdr (assoc group + orb-pdf-scrapper-refsection-headings))) + (title (car heading)) + (type (cadr heading))) + (org-insert-heading '(16) nil t) + ;; insert heading + (insert (format "%s\n" title)) + (org-demote) + (org-end-of-subtree) + ;; insert references + (insert (format "\n#+name: %s\n" group)) + (cl-case type + ('table + (orb-pdf-scrapper--insert-org-as-table refs)) + (t + (orb-pdf-scrapper--insert-org-as-list refs))))) + (write-region (orb--buffer-string) nil temp-org nil -1) + (setq buffer-undo-list nil) + (set-buffer-modified-p nil) + (goto-char (point-min))))) + ('continue + (if-let ((temp-org (orb-pdf-scrapper--get :temp-org)) + (f-exists? temp-org)) + (progn + (erase-buffer) + (insert-file-contents temp-org) + (setq buffer-undo-list (orb-pdf-scrapper--get :org-undo-list)) + (orb-pdf-scrapper--refresh-mode 'org)) + (orb-pdf-scrapper-dispatcher 'error))))) + +(defun orb-pdf-scrapper--edit-xml () + "Edit XML data." + (pop-to-buffer orb-pdf-scrapper--buffer) + (cl-case (orb-pdf-scrapper--get :context) + ('start + (let* ((temp-xml (or (orb-pdf-scrapper--get :temp-xml) + (orb--temp-file "orb-pdf-scrapper-" ".xml")))) + (orb-pdf-scrapper--put :temp-xml temp-xml) + (orb--with-message! "Generating XML data" + ;; save progress in text mode when called from there if called from + ;; anywhere else, text mode progress is already saved, other data will + ;; be re-generated anyway + (when (eq (orb-pdf-scrapper--get :caller) 'edit-txt) + (write-region (orb--buffer-string) + nil (orb-pdf-scrapper--get :temp-txt) nil -1) + (orb-pdf-scrapper--put :txt-undo-list (copy-tree buffer-undo-list))) + (erase-buffer) + (orb-anystyle 'parse + :format 'xml + :parser-model orb-anystyle-parser-model + :input (orb-pdf-scrapper--get :temp-txt) + :stdout t + :buffer orb-pdf-scrapper--buffer) + (write-region (orb--buffer-string) nil temp-xml nil -1) + (setq buffer-undo-list nil) + (orb-pdf-scrapper--refresh-mode 'xml)))) + ('edit-master + (progn + (erase-buffer) + (insert-file-contents orb-anystyle-parser-training-set) + ;; we allow the user to see which file they are editing + (setq buffer-file-name orb-anystyle-parser-training-set) + (setq buffer-undo-list nil) + (orb-pdf-scrapper--refresh-mode 'xml))) + (t + (orb-pdf-scrapper-dispatcher 'error)))) + +(defun orb-pdf-scrapper--update-master-file () + "Append generated XML data to `orb-anystyle-parser-training-set'." + (orb--with-scrapper-buffer! + (orb--with-message! (format "Appending to master training set %s" + orb-anystyle-parser-training-set) + ;; save any progress in XML mode + (write-region (orb--buffer-string) nil + (orb-pdf-scrapper--get :temp-xml) nil -1) + (let (new-data) + ;; strip down the header and footer tokens from our data + (save-excursion + (save-match-data + (let* (beg end) + (goto-char (point-min)) + (re-search-forward "\\(^[ \t]*[ \t]*\n\\)" nil t) + (setq beg (or (match-end 1) + (point-min))) + (re-search-forward "\\(^[ \t]*[ \t]*\n\\)" nil t) + (setq end (or (match-beginning 1) + (point-max))) + (setq new-data (orb--buffer-string beg end))))) + ;; append our data to the master file + (with-temp-buffer + (insert-file-contents orb-anystyle-parser-training-set) + ;; backup the master file + (let ((master-backup (concat orb-anystyle-parser-training-set ".back"))) + (orb-pdf-scrapper--put :master-backup master-backup) + (rename-file orb-anystyle-parser-training-set master-backup t)) + (goto-char (point-max)) + (forward-line -1) + (insert new-data) + (f-touch orb-anystyle-parser-training-set) + (write-region (orb--buffer-string) nil + orb-anystyle-parser-training-set nil -1)))))) + +(defun orb-pdf-scrapper--train (&optional review) + "Update parser training set and run anystyle train. +If optional REVIEW is non-nil, run `orb-pdf-scrapper--edit-xml' +in `:edit-master' context." + (pop-to-buffer orb-pdf-scrapper--buffer) + ;; edit the master file or proceed to training + (if review + ;; we've been requested to review the master file + (progn + (orb-pdf-scrapper--update-master-file) + (orb-pdf-scrapper--put :context 'edit-master) + (orb-pdf-scrapper--edit-xml)) + ;; start the training process otherwise + (orb-pdf-scrapper--update-master-file) + (message "Training anystyle parser model...") + (when buffer-file-name + (save-buffer)) + (setq buffer-file-name nil) + (erase-buffer) + (orb-pdf-scrapper--put :context 'train) + (orb-pdf-scrapper--refresh-mode 'train) + (insert (format "\ +This can take several minutes depending on the size of your training set. +You can continue your work meanwhile and return here later.\n +Training set => %s +Parser model => %s\n +anystyle output: +=====================\n" + orb-anystyle-parser-model + orb-anystyle-parser-training-set)) + (goto-char (point-min)) + ;; normally, anystyle runs with `shell-command', anystyle train, however, + ;; can take minutes on large files, so it runs in a shell sub-process + (let ((training-process + (orb-anystyle 'train + :stdout t + :overwrite t + :input orb-anystyle-parser-training-set + :output orb-anystyle-parser-model + :buffer orb-pdf-scrapper--buffer))) + (orb-pdf-scrapper--put :training-process training-process) + ;; finalize + (set-process-sentinel + training-process + (lambda (_p result) + (orb--with-scrapper-buffer! + (if (string= result "finished\n") + (orb--with-scrapper-buffer! + (goto-char (point-max)) + (insert "=====================\n\nDone!") + (message "Training anystyle parser model...done") + (orb-pdf-scrapper--put :context 'finished + :training-process nil) + (orb-pdf-scrapper--refresh-mode 'train)) + (orb-pdf-scrapper--put :context 'error + :training-process nil)))))))) + +(defun orb-pdf-scrapper--checkout () + "Finalize Orb PDF Scrapper process. +Insert generated Org data into the note buffer that started the +process." + (cl-case (orb-pdf-scrapper--get :context) + ('start + (pop-to-buffer (orb-pdf-scrapper--get :original-buffer)) + (save-restriction + (save-excursion + (widen) + (goto-char (point-max)) + (insert-file-contents (orb-pdf-scrapper--get :temp-org)))) + (orb-pdf-scrapper-dispatcher 'kill)) + (t + (orb-pdf-scrapper-dispatcher 'error)))) + +(defun orb-pdf-scrapper--cleanup () + "Clean up before and after Orb Pdf Scrapper process." + (setq orb-pdf-scrapper--refs ()) + (dolist (prop (list :running :callee :context :caller + :current-key :prevent-concurring + :temp-txt :temp-bib :temp-org :temp-xml + :pdf-file :global-bib :master-backup + :txt-undo-list :bib-undo-list :org-undo-list + :training-process :window-conf :original-buffer)) + (orb-pdf-scrapper--put prop nil))) + + +;; * Minor mode + +;;; Code in this section was adopted from org-capture.el +;; +;; Copyright (C) 2010-2020 Free Software Foundation, Inc. +;; Author: Carsten Dominik +(defvar orb-pdf-scrapper-mode-map + (let ((map (make-sparse-keymap))) + (define-key map "\C-c\C-k" #'orb-pdf-scrapper-kill) + map) + "Keymap for `orb-pdf-scrapper-mode' minor mode. +The keymap is updated automatically according to the Orb PDF +Scrapper process context. It is not supposed to be modified +directly by user." ) + +(defcustom orb-pdf-scrapper-mode-hook nil + "Hook for the `orb-pdf-scrapper-mode' minor mode." + :type 'hook + :group 'orb-pdf-scrapper) + +(define-minor-mode orb-pdf-scrapper-mode + "Minor mode for special key bindings in a orb-pdf-scrapper buffer. +Turning on this mode runs the normal hook `orb-pdf-scrapper-mode-hook'." + nil " OPS" orb-pdf-scrapper-mode-map + (when orb-pdf-scrapper-mode + (orb-pdf-scrapper--update-keymap) + (setq-local + header-line-format + (orb-pdf-scrapper--format-header-line)))) + +(defun orb-pdf-scrapper--put (&rest props) + "Add properties PROPS to `orb-pdf-scrapper--plist'. +Returns the new plist." + (while props + (setq orb-pdf-scrapper--plist + (plist-put orb-pdf-scrapper--plist + (pop props) + (pop props))))) + +(defun orb-pdf-scrapper--get (prop) + "Get PROP from `orb-pdf-scrapper--plist'." + (plist-get orb-pdf-scrapper--plist prop)) +;;; +;;; End of code adopted from org-capture.el + +;; TODO combine `orb-pdf-scrapper--format-header-line' +;; and `orb-pdf-scrapper--update-keymap' into one +;; function and use a macro to generate each entry +(defun orb-pdf-scrapper--format-header-line () + "Return formatted buffer header line depending on context." + (substitute-command-keys + (format "\\Orb PDF Scrapper: %s. %s" + (orb-pdf-scrapper--get :current-key) + (cl-case (orb-pdf-scrapper--get :caller) + ('edit-txt + "\ +Generate BibTeX `\\[orb-pdf-scrapper-dispatcher]', \ +sanitize text `\\[orb-pdf-scrapper-sanitize-text]', \ +train parser `\\[orb-pdf-scrapper-training-session]', \ +abort `\\[orb-pdf-scrapper-kill]'.") + ('edit-bib + "\ +Generate Org `\\[orb-pdf-scrapper-dispatcher]', \ +generate keys `\\[orb-pdf-scrapper-generate-keys]', \ +return to text `\\[orb-pdf-scrapper-cancel]', \ +train parser `\\[orb-pdf-scrapper-training-session], \ +abort `\\[orb-pdf-scrapper-kill]'.") + ('edit-org + "\ +Finish `\\[orb-pdf-scrapper-dispatcher]', \ +return to BibTeX `\\[orb-pdf-scrapper-cancel]', \ +abort `\\[orb-pdf-scrapper-kill]'.") + ('edit-xml + (cl-case (orb-pdf-scrapper--get :context) + ('edit + (format "\ +Train `\\[orb-pdf-scrapper-training-session]', \ +review %s `\\[orb-pdf-scrapper-review-master-file]', \ +cancel `\\[orb-pdf-scrapper-cancel], \ +abort `\\[orb-pdf-scrapper-kill]'." + (file-name-nondirectory + orb-anystyle-parser-training-set))) + ('edit-master + "\ +Train `\\[orb-pdf-scrapper-training-session]', \ +cancel `\\[orb-pdf-scrapper-cancel], \ +abort `\\[orb-pdf-scrapper-kill]'."))) + ('train + (cl-case (orb-pdf-scrapper--get :context) + ('train + "\ +Abort `\\[orb-pdf-scrapper-kill]'.") + ('continue + "\ +Finish `\\[orb-pdf-scrapper-dispatcher]', \ +abort `\\[orb-pdf-scrapper-kill]'."))) + (t + "\ +Press the RED button `\\[orb-pdf-scrapper-kill]'."))))) + +(defun orb-pdf-scrapper--update-keymap () + "Update `orb-pdf-scrapper-mode-map' according to current editing mode. +Context is read from `orb-pdf-scrapper--plist' property `:context'." + (let ((map orb-pdf-scrapper-mode-map)) + (cl-case (orb-pdf-scrapper--get :caller) + ;; + ('edit-txt + (define-key map "\C-c\C-c" #'orb-pdf-scrapper-dispatcher) + (define-key map "\C-c\C-u" #'orb-pdf-scrapper-sanitize-text) + (define-key map "\C-C\C-t" #'orb-pdf-scrapper-training-session) + (define-key map "\C-c\C-r" nil)) + ;; + ('edit-bib + (define-key map "\C-c\C-c" #'orb-pdf-scrapper-dispatcher) + (define-key map "\C-c\C-u" #'orb-pdf-scrapper-generate-keys) + (define-key map "\C-C\C-t" #'orb-pdf-scrapper-training-session) + (define-key map "\C-c\C-r" #'orb-pdf-scrapper-cancel)) + ;; + ('edit-org + (define-key map "\C-c\C-c" #'orb-pdf-scrapper-dispatcher) + (define-key map "\C-c\C-u" nil) + (define-key map "\C-C\C-t" nil) + (define-key map "\C-c\C-r" #'orb-pdf-scrapper-cancel)) + ('edit-xml + (cl-case (orb-pdf-scrapper--get :context) + ('edit + (define-key map "\C-c\C-c" #'orb-pdf-scrapper-training-session) + (define-key map "\C-c\C-u" nil) + (define-key map "\C-C\C-t" #'orb-pdf-scrapper-review-master-file) + (define-key map "\C-c\C-r" #'orb-pdf-scrapper-cancel)) + ('edit-master + (define-key map "\C-c\C-c" #'orb-pdf-scrapper-training-session) + (define-key map "\C-c\C-u" nil) + (define-key map "\C-C\C-t" nil) + (define-key map "\C-c\C-r" #'orb-pdf-scrapper-cancel)))) + ('train + (cl-case (orb-pdf-scrapper--get :context) + ('train + (define-key map "\C-c\C-c" nil) + (define-key map "\C-c\C-r" nil) + (define-key map "\C-c\C-u" nil) + (define-key map "\C-c\C-t" nil)) + ('continue + (define-key map "\C-c\C-c" #'orb-pdf-scrapper-dispatcher)))) + (t + (define-key map "\C-c\C-u" nil) + (define-key map "\C-c\C-t" nil) + (define-key map "\C-c\C-r" nil))))) + +;; * Interactive functions + +(defun orb-pdf-scrapper-generate-keys (&optional at-point collect-only) + "Generate BibTeX citation keys in the current buffer. +\\ +While the Orb PDF Scrapper interactive process, when editing +BibTeX data, press \\[orb-pdf-scrapper-generate-keys] to generate +citation keys using the function specified in +`orb-pdf-scrapper-keygen-function'. When called interactively +with a \\[universal-argument] prefix argument AT-POINT, generate +key only for the record at point. + +When called from Lisp, if optional COLLECT-ONLY is non-nil, do +not generate the key and update the records, just collect records +for future use." + (interactive "P") + (orb--with-message! "Generating citation keys" + (let ((bibtex-help-message nil) + (bibtex-contline-indentation 2) + (bibtex-text-indentation 2)) + (save-excursion + (if (equal at-point '(4)) + ;; generate key at point + (progn + (bibtex-beginning-of-entry) + (let* ((old-key (save-excursion + (re-search-forward + bibtex-entry-maybe-empty-head) + (bibtex-key-in-head))) + (old-ref (assoc old-key orb-pdf-scrapper--refs)) + (new-ref (orb-pdf-scrapper--update-record-at-point + collect-only))) + (if old-ref + (setf (car old-ref) (car new-ref) + (cdr old-ref) (cdr new-ref)) + (cl-pushnew new-ref orb-pdf-scrapper--refs :test 'equal)))) + ;; generate keys in the buffer otherwise + (let ((refs ())) + (goto-char (point-min)) + (bibtex-skip-to-valid-entry) + (while (not (eobp)) + (cl-pushnew (orb-pdf-scrapper--update-record-at-point + collect-only) + refs) + (bibtex-skip-to-valid-entry)) + (setq orb-pdf-scrapper--refs refs))))) + (write-region (orb--buffer-string) nil + (orb-pdf-scrapper--get :temp-bib) nil -1) + (set-buffer-modified-p nil))) + +(defun orb-pdf-scrapper-sanitize-text (&optional contents) + "Run string processing in current buffer. +Try to get every reference onto newline. Return this buffer's +contents (`orb--buffer-string'). + +If optional string CONTENTS was specified, run processing on this +string instead. Return modified CONTENTS." + (interactive) + (let* ((rx1 '(and "(" (** 1 2 (any "0-9")) ")")) + (rx2 '(and "[" (** 1 2 (any "0-9")) "]")) + (rx3 '(and "(" (any "a-z") (opt (any space)) ")")) + (rx4 '(and " " (any "a-z") ")")) + (regexp (rx-to-string + `(group-n 1 (or (or (and ,rx1 " " ,rx3) + (and ,rx2 " " ,rx3)) + (or (and ,rx1 " " ,rx4) + (and ,rx2 " " ,rx4)) + (or ,rx1 ,rx2) + (or ,rx3 ,rx4))) t))) + (if contents + (--> contents + (s-replace "\n" " " it) + (s-replace-regexp regexp "\n\\1" it)) + (goto-char (point-min)) + (while (re-search-forward "\n" nil t) + (replace-match " " nil nil)) + (goto-char (point-min)) + (while (re-search-forward regexp nil t) + (replace-match "\n\\1" nil nil)) + (goto-char (point-min)) + (orb--buffer-string)))) + +(defun orb-pdf-scrapper-training-session (&optional context) + "Run training session subroutines depending on CONTEXT. +If context is not provided, it will be read from +`orb-pdf-scrapper--plist''s `:context'." + (interactive) + (pop-to-buffer orb-pdf-scrapper--buffer) + (let ((context (or context (orb-pdf-scrapper--get :context)))) + (orb-pdf-scrapper--put :context context) + (cl-case context + ('start + ;; generate xml + (orb-pdf-scrapper--edit-xml)) + ((edit edit-master) + (orb-pdf-scrapper--train nil)) + ('finished + (orb-pdf-scrapper-dispatcher 'edit-txt 'continue)) + (t (orb-pdf-scrapper-dispatcher 'error))))) + +(defun orb-pdf-scrapper-review-master-file () + "Review parser training set (master file)." + (interactive) + (orb-pdf-scrapper--train t)) + +(defun orb-pdf-scrapper-dispatcher (&optional callee context) + "Call Orb PDF Scrapper subroutine CALLEE in context CONTEXT. +CALLEE and CONTEXT can be passed directly as optional variables, +or they will be read from `orb-pdf-scrapper--plist''s +respectively `:collee' and `:context' properties. + +Recognized CALLEEs are: +========== +'edit-txt - `orb-pdf-scrapper--edit-txt' +'edit-bib - `orb-pdf-scrapper--edit-bib' +'edit-org - `orb-pdf-scrapper--edit-org' +'train - `orb-pdf-scrapper-training-session' +'checkout - `orb-pdf-scrapper--checkout' + +Passing or setting any other CALLEE will kill the process. + +This function also checks `:prevent-concurring' property in +`orb-pdf-scrapper--plist' and will suggest to restart the process +if its value is non-nil." + ;; TODO: check for whether the user killed any of the buffers + (interactive) + (let ((callee (or callee (orb-pdf-scrapper--get :callee))) + (context (or context (orb-pdf-scrapper--get :context)))) + ;; in case context was passed as an argument + (orb-pdf-scrapper--put :callee callee + :context context) + (if + ;; Prevent another Orb PDF Scrapper process from running + ;; Ask user whether to kill the currently running process + (orb-pdf-scrapper--get :prevent-concurring) + (if (y-or-n-p + (format "Another Orb PDF Scrapper process is running: %s. \ +Kill it and start a new one %s? " + (orb-pdf-scrapper--get :current-key) + (orb-pdf-scrapper--get :new-key))) + ;; Kill the process and start a new one + (progn + (orb--with-message! "Killing current process" + (orb-pdf-scrapper--cleanup)) + (orb-pdf-scrapper-run (orb-pdf-scrapper--get :new-key))) + ;; Do nothing + (orb-pdf-scrapper--put :prevent-concurring nil)) + ;; Finilize the requested context otherwise + (cl-case callee + ('edit-txt + (orb-pdf-scrapper--edit-txt)) + ('edit-bib + (orb-pdf-scrapper--edit-bib)) + ;; edit org + ('edit-org + (orb-pdf-scrapper--edit-org)) + ('checkout + ;; currently, this is unnecessary but may be useful + ;; if some recovery options are implemented + (orb--with-scrapper-buffer! + (write-region (orb--buffer-string) + nil (orb-pdf-scrapper--get :temp-org) nil 1)) + (orb-pdf-scrapper--checkout)) + (t + ;; 1 in 100 should not be too annoying + (when (> (cl-random 100) 98) + (message "Oops...") + (sleep-for 1) + (message "Oops...Did you just ACCIDENTALLY press the RED button?") + (sleep-for 1) + (message "Activating self-destruction subroutine...") + (sleep-for 1) + (message "Activating self-destruction subroutine...Bye-bye") + (sleep-for 1)) + (let ((kill-buffer-query-functions nil)) + (and (get-buffer orb-pdf-scrapper--buffer) + (kill-buffer orb-pdf-scrapper--buffer))) + (set-window-configuration (orb-pdf-scrapper--get :window-conf)) + (orb-pdf-scrapper--cleanup)))))) + +(defun orb-pdf-scrapper-cancel () + "Discard edits and return to previous editing mode." + (interactive) + (cl-case (orb-pdf-scrapper--get :caller) + ('edit-bib + (orb--with-scrapper-buffer! + (orb-pdf-scrapper--put :bib-undo-list nil)) + (orb-pdf-scrapper-dispatcher 'edit-txt 'continue)) + ('edit-org + (orb-pdf-scrapper-dispatcher 'edit-bib 'continue)) + ('edit-xml + (when-let ((master-backup (orb-pdf-scrapper--get :master-backup))) + (rename-file master-backup orb-anystyle-parser-training-set t)) + (orb-pdf-scrapper-dispatcher (orb-pdf-scrapper--get :callee) 'continue)) + (t + (orb-pdf-scrapper-dispatcher 'error)))) + +(defun orb-pdf-scrapper-kill () + "Kill the interactive Orb PDF Scrapper process." + (interactive) + (when-let (process (orb-pdf-scrapper--get :training-process)) + (kill-process process)) + (orb-pdf-scrapper-dispatcher 'kill)) + + +;; * Main functions + +;; entry point + +;;;###autoload +(defun orb-pdf-scrapper-run (key) + "Run Orb PDF Scrapper interactive process. +KEY is note's citation key." + (interactive) + (if (orb-pdf-scrapper--get :running) + (progn + (orb-pdf-scrapper--put :prevent-concurring t + :new-key key) + (orb-pdf-scrapper-dispatcher)) + ;; in case previous process was not killed properly + (orb-pdf-scrapper--cleanup) + (orb-pdf-scrapper--put :callee 'edit-txt + :context 'start + :caller 'run + :current-key key + :new-key nil + :pdf-file (file-truename + (orb-process-file-field key)) + :running t + :prevent-concurring nil + :global-bib bibtex-completion-bibliography + :original-buffer (current-buffer) + :window-conf (current-window-configuration)) + (orb-pdf-scrapper-dispatcher))) + +(provide 'orb-pdf-scrapper) +;;; orb-pdf-scrapper.el ends here +;; Local Variables: +;; fill-column: 79 +;; End: diff --git a/orb-utils.el b/orb-utils.el index ae88848..b13a197 100644 --- a/orb-utils.el +++ b/orb-utils.el @@ -8,7 +8,6 @@ ;; URL: https://github.com/org-roam/org-roam-bibtex ;; Keywords: org-mode, roam, convenience, bibtex, helm-bibtex, ivy-bibtex, org-ref ;; Version: 0.2.3 -;; Package-Requires: ((emacs "26.1")) ;; This file is NOT part of GNU Emacs. @@ -27,6 +26,11 @@ ;; Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, ;; Boston, MA 02110-1301, USA. +;; N.B. This file contains code snippets adopted from other +;; open-source projects. These snippets are explicitly marked as such +;; in place. They are not subject to the above copyright and +;; authorship claims. + ;;; Commentary: ;; ;; This file contains utility macros and helper functions used accross @@ -36,16 +40,24 @@ ;;; Code: ;; * Library requires -(require 'orb-compat) (defvar orb-citekey-format) ;; * Macros +(defmacro orb--with-message! (message &rest body) + "Put MESSAGE before and after BODY. +Append \"...\" to the first message and \"...done\" to the second. +Return result of evaluating the BODY." + (declare (indent 1) (debug (stringp &rest form))) + `(prog2 + (message "%s..." ,message) + (progn ,@body) + (message "%s...done" ,message))) ;; * Functions -(defun orb-unformat-citekey (citekey) +(defun orb--unformat-citekey (citekey) "Remove format from CITEKEY. Format is `orb-citekey-format'." (string-match "\\(.*\\)%s\\(.*\\)" orb-citekey-format) @@ -55,6 +67,123 @@ Format is `orb-citekey-format'." (length orb-citekey-format))))) (substring citekey beg end))) +(defun orb--buffer-string (&optional start end) + "Retun buffer (sub)string with no text porperties. +Like `buffer-substring-no-properties' but START and END are +optional and equal to (`point-min') and (`point-max'), +respectively, if nil." + (buffer-substring-no-properties (or start (point-min)) + (or end (point-max)))) + +(defun orb--format (&rest args) + "Format ARGS conditionally and return a string. +ARGS must be a plist, whose keys are `format' control strings and +values are `format' objects. Thus only one object per control +string is allowed. The result will be concatenated into a single +string. + +In the simplest case, it behaves as a sort of interleaved `format': +========== + +\(orb--format \"A: %s\" 'hello + \" B: %s\" 'world + \" C: %s\" \"!\") + + => 'A: hello B: world C: !' + +If format object is nil, it will be formatted as empty string: +========== + +\(orb--format \"A: %s\" 'hello + \" B: %s\" nil + \" C: %s\" \"!\") + => 'A: hello C: !' + +Object can also be a cons cell. If its car is nil then its cdr +will be treated as default value and formatted as \"%s\": +========== + +\(orb--format \"A: %s\" 'hello + \" B: %s\" '(nil . dworl) + \" C: %s\" \"!\") + => 'A: hellodworl C: !' + +Finally, if the control string is nil, the object will be formatted as \"%s\": +========== + +\(orb--format \"A: %s\" 'hello + \" B: %s\" '(nil . \" world\") + nil \"!\") +=> 'A: hello world!'." + (let ((res "")) + (while args + (let ((str (pop args)) + (obj (pop args))) + (unless (consp obj) + (setq obj (cons obj nil))) + (setq res + (concat res + (format (or (and (car obj) str) "%s") + (or (car obj) (cdr obj) "")))))) + res)) + +;;; Code in this section was adopted from ob-core.el +;; +;; Copyright (C) 2009-2020 Free Software Foundation, Inc. +;; +;; Authors: Eric Schulte +;; Dan Davison + +(defvar orb--temp-dir) +(unless (or noninteractive (boundp 'orb--temp-dir)) + (defvar orb--temp-dir + (or (and (boundp 'orb--temp-dir) + (file-exists-p orb--temp-dir) + orb--temp-dir) + (make-temp-file "orb-" t)) +"Directory to hold temporary files created during reference parsing. +Used by `orb--temp-file'. This directory will be removed on Emacs +shutdown.")) + +(defun orb--temp-file (prefix &optional suffix) + "Create a temporary file in the `orb--temp-dir'. +Passes PREFIX and SUFFIX directly to `make-temp-file' with the +value of variable `temporary-file-directory' temporarily set to +the value of `orb--temp-dir'." + (let ((temporary-file-directory + (or (and (boundp 'orb--temp-dir) + (file-exists-p orb--temp-dir) + orb--temp-dir) + temporary-file-directory))) + (make-temp-file prefix nil suffix))) + +(defun orb--remove-temp-dir () + "Remove `orb--temp-dir' on Emacs shutdown." + (when (and (boundp 'orb--temp-dir) + (file-exists-p orb--temp-dir)) + ;; taken from `delete-directory' in files.el + (condition-case nil + (progn + (mapc (lambda (file) + ;; This test is equivalent to + ;; (and (file-directory-p fn) (not (file-symlink-p fn))) + ;; but more efficient + (if (eq t (car (file-attributes file))) + (delete-directory file) + (delete-file file))) + (directory-files orb--temp-dir 'full + directory-files-no-dot-files-regexp)) + (delete-directory orb--temp-dir)) + (error + (message "Failed to remove temporary Org-roam-bibtex directory %s" + (if (boundp 'orb--temp-dir) + orb--temp-dir + "[directory not defined]")))))) + +(add-hook 'kill-emacs-hook 'orb--remove-temp-dir) + +;;; End of code adopted from ob-core.el + (provide 'orb-utils) ;;; orb-utils.el ends here ;; Local Variables: diff --git a/org-roam-bibtex.el b/org-roam-bibtex.el index 19eb73d..7bfa275 100644 --- a/org-roam-bibtex.el +++ b/org-roam-bibtex.el @@ -1,4 +1,4 @@ -;;; org-roam-bibtex.el --- Org Roam meets BibTeX -*- coding: utf-8; lexical-binding: t -*- +;;; org-roam-bibtex.el --- Org Roam meets BibTeX -*- coding: utf-8; lexical-binding: t -*- ;; Copyright © 2020 Jethro Kuan ;; Copyright © 2020 Mykhailo Shevchuk