Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Explorer revamp #428

Open
wants to merge 219 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 26 commits
Commits
Show all changes
219 commits
Select commit Hold shift + click to select a range
1d3f587
First commit :)
sal-uva Apr 8, 2024
c4a4606
Use regular `iterate_items` method when looping through dataset + min…
sal-uva Apr 8, 2024
cac644e
Change wording in Explorer settings
sal-uva Apr 9, 2024
8b78452
Allow Explorer CSS to be inserted and changed in Settings
sal-uva Apr 9, 2024
0fe3ea6
Move around Explorer CSS files
sal-uva Apr 9, 2024
e06760a
Edit custom Explorer CSS options
sal-uva Apr 9, 2024
a921967
Forgot to save these
sal-uva Apr 9, 2024
e37ebc9
Typozzz
sal-uva Apr 10, 2024
a7668f0
First setup for dynamic Explorer options in Settings
sal-uva Apr 10, 2024
59e33b0
First steps to datasource table user input
sal-uva Apr 15, 2024
712aff1
Merge branch 'explorer-improvements' of https://github.com/digitalmet…
sal-uva Apr 15, 2024
46628c6
Add basic UserInput.DATASOURCES_TABLE functionality, and use in Explo…
sal-uva Apr 15, 2024
340d1ff
Simplify config setting name
sal-uva Apr 15, 2024
dfbe5f3
Only show Explorer when enabled per data source
sal-uva Apr 15, 2024
28abb42
First steps in integrating the Explorer more with the main interface
sal-uva Apr 15, 2024
70d00b1
First steps in bringing back sorting
sal-uva Apr 16, 2024
e937362
More sorting stuff
sal-uva Apr 17, 2024
11eaaf9
Fix and simplify sorting, control box styling
sal-uva Apr 17, 2024
c33fd72
Style and fix annotation field editor, enable config settings for CSS
sal-uva Apr 18, 2024
00993ed
Fix annotation saving, improve CSS inclusions
sal-uva Apr 19, 2024
7149a6d
Make sure annotations are kept in NDJSON and CSV, change custom field…
sal-uva Apr 22, 2024
93460ba
Improve Instagram template, add location fields to Instagram search
sal-uva Apr 22, 2024
6929986
Simplify template settings, add Twitter and Instagram template
sal-uva Apr 23, 2024
7d81a79
Merge remote-tracking branch 'origin/master' into explorer-improvements
sal-uva Apr 23, 2024
08735b7
Remove prints
sal-uva Apr 23, 2024
759b36a
Don't prepend 'annotations'
sal-uva Apr 23, 2024
0cf2ccd
Don't commafy post body in generic Explorer template
sal-uva Apr 24, 2024
bc386b3
Leftover string in config definition
sal-uva Apr 24, 2024
7bf4ed8
No user input needed for 9GAG
sal-uva Apr 24, 2024
4c66f41
Remove old files
sal-uva Apr 24, 2024
6744736
Rudimentary TikTok template
sal-uva Apr 24, 2024
afa2d3a
Make invalid fields have a red border
sal-uva Apr 29, 2024
1716fe4
Make sure that annotation fields do not use existing column names in …
sal-uva Apr 29, 2024
bdfa308
Get rid of non-necessary code and libraries
sal-uva Apr 30, 2024
a7f7375
Move save annotation functions to dataset.py
sal-uva Apr 30, 2024
6715c7e
Merge branch 'explorer-improvements' of https://github.com/digitalmet…
sal-uva Apr 30, 2024
fcad68b
Add 'social_mediafy' template filter to add links to URLs, hashtags, …
sal-uva Apr 30, 2024
7a7af83
Improve social_mediafy regexes and add to templates
sal-uva May 1, 2024
e1dc2f2
Allow reverse-sorting by dataset order
sal-uva May 1, 2024
19b4639
Add animations
sal-uva May 1, 2024
67a8298
There needs to be something to save!
sal-uva May 1, 2024
5a2862b
Add coauhtors to instagram map_item and add to Explorer template
sal-uva May 2, 2024
67f0746
Better social_mediafy regexes, implement per platform
sal-uva May 2, 2024
3e28667
Typo in LinkedIn search
sal-uva May 13, 2024
544cd91
Make quote tweets just that little bit nicer.
sal-uva May 14, 2024
432bdec
Fix tag links in template filter
sal-uva May 14, 2024
aeaec85
Fix bug in saving annotation fields (misnamed variable)
sal-uva May 14, 2024
edc89ca
Fix when Save buttons are enabled/disabled
sal-uva May 14, 2024
8f60c3a
Add quote tweet info to Twitter map_item()
sal-uva May 14, 2024
580a083
No sets!
sal-uva May 14, 2024
fa9595c
Add quote tweet information to Twitter Explorer template
sal-uva May 14, 2024
e76f18c
No telegram CSS yet
sal-uva May 14, 2024
4bc50dd
Merge branch 'master' into explorer-improvements
sal-uva Jun 6, 2024
36d4589
Fix set bug
sal-uva Jun 11, 2024
eb51d40
Small Twitter Explorer style change
sal-uva Jun 11, 2024
179e9d2
Merge branch 'explorer-improvements' of https://github.com/digitalmet…
sal-uva Jun 11, 2024
e3f66e8
Only get and set annotations for top-level datasets
sal-uva Jun 11, 2024
ae73d9c
LinkedIn Explorer template pt.1
sal-uva Jun 14, 2024
de39a53
Merge branch 'explorer-improvements' of https://github.com/digitalmet…
sal-uva Jun 14, 2024
f4159cc
LinkedIn Explorer template pt.2
sal-uva Jun 14, 2024
70a7767
LinkedIn Explorer template pt.3
sal-uva Jun 17, 2024
825bb40
Change file download functionality: Show download .csv button if ther…
sal-uva Jul 1, 2024
4a8e694
Totally remove empty annotations from database when they are removed …
sal-uva Jul 2, 2024
90320e9
Add annotation labels to dataset metadata box
sal-uva Jul 2, 2024
9eb68f2
...but keep empty strings in annotations dict so we know a field got …
sal-uva Jul 2, 2024
d6c2d21
Change label of original data download; ndjson string is a bit too large
sal-uva Jul 2, 2024
596fbe2
Add Twitter/Zeeschuimer profile banner URL
sal-uva Jul 2, 2024
42df958
Add a tooltip that explains Explorer saving behaviour
sal-uva Jul 2, 2024
88ab97a
Slight edit of tooltip wording
sal-uva Jul 2, 2024
b22385e
Nicely align option fields in the Annotations editor
sal-uva Jul 2, 2024
868af5e
Improve saving and deleting annotations; less clutter of empty values
sal-uva Jul 2, 2024
afbf897
Lead to map_item() download on Datasets overview. Only lead to "Origi…
sal-uva Jul 3, 2024
007f8bf
Fix config import in Explorer
sal-uva Jul 3, 2024
dc26b10
Revert changes to parent dataset writing
sal-uva Jul 4, 2024
96eb11e
Use correct config name
sal-uva Jul 4, 2024
3fd55ca
Only show dataset download buttons when the file is actually there.
sal-uva Jul 4, 2024
4a92b14
Remove unused UserInput
sal-uva Jul 4, 2024
c4b1943
Remove unnecessary UserInput imports
sal-uva Jul 4, 2024
fcb7473
Use dictionary order as sort order for config settings
sal-uva Jul 4, 2024
e672933
Change name of "Explore" button to "Explore & annotate"
sal-uva Jul 9, 2024
0d2eef2
Space out Twitter metrics better
sal-uva Jul 9, 2024
7389e9b
Include index in Explorer posts loop
sal-uva Jul 9, 2024
b8e1267
Telegram Explorer template v0.5
sal-uva Jul 9, 2024
f3f6f41
Add a string character counter template that also handles graphemes
sal-uva Jul 10, 2024
e904351
Fix incorrect emoji handling with resolved references in Telegram
sal-uva Jul 10, 2024
9fcd9aa
Get markdown text from telegram messages
sal-uva Jul 10, 2024
c0c7bfa
..but then a bit more elegant and also for resolved messages
sal-uva Jul 10, 2024
92ac7c4
styling
sal-uva Jul 10, 2024
d4256ae
Telegram template v1.0
sal-uva Jul 10, 2024
2204cf7
Show URLs nicely in Telegram template
sal-uva Jul 10, 2024
a904e65
Add markdown text to Telegram
sal-uva Jul 12, 2024
bb96af7
Typo in Truth social search
sal-uva Jul 12, 2024
bc4f566
Update Tumblr search so it works with the Neue Posts Format.
sal-uva Jul 12, 2024
10c8853
Fix notes fetching for Tumblr, add extra notes metrics to NDJSONs and…
sal-uva Jul 15, 2024
ce16f97
Make Tumblr search work with new blocks formatting, include some new …
sal-uva Jul 16, 2024
e392544
Tumblr Explorer Template v0.5
sal-uva Jul 16, 2024
40f5fa0
Bump PyTumblr version
sal-uva Jul 17, 2024
9e486ad
Dashes are okay for Tumblr Blog names
sal-uva Jul 17, 2024
9837a8a
Better styling for Tumblr Explorer Template
sal-uva Jul 17, 2024
4d19e52
Include post blocks in the right order in Tumblr Explorer Template
sal-uva Jul 17, 2024
d5d14e0
Get block orders and start changing how note retrieval works in Tumbl…
sal-uva Jul 17, 2024
8e885f0
Fix Markdown, include audio and video, and follow correct block order…
sal-uva Jul 17, 2024
c7fa5fa
Skip URLs in social mediafy template filter if it's already markdown
sal-uva Jul 17, 2024
281bf56
add markdown
sal-uva Jul 17, 2024
b65ad43
Typo in pagination
sal-uva Jul 17, 2024
e1d25da
Add video to Tumblr Template
sal-uva Jul 17, 2024
70e0c9b
Be more honest with errors
sal-uva Jul 17, 2024
95d03f4
Add more layout options for Tumblr
sal-uva Jul 23, 2024
451f2bb
No post reshuffling after the fact
sal-uva Jul 24, 2024
996512d
Skip duplicate posts in a better way
sal-uva Jul 24, 2024
05e5c7b
Don't hashtagify
sal-uva Jul 24, 2024
8263ebc
Skip duplicate Tumblr posts and format Ask content better
sal-uva Jul 24, 2024
df7185a
More Tumblr Explorer templating
sal-uva Jul 26, 2024
f6858ba
Revamp Tumblr search v0.5
sal-uva Jul 26, 2024
7d9df2d
Improve Tumblr querying
sal-uva Jul 26, 2024
6fe891c
Change options for Tumblr
sal-uva Jul 30, 2024
622f779
Revamp Tumblr search and allow reblogs in Explorer template
sal-uva Jul 30, 2024
c8f204e
Improve and fix revamped Tumblr search
sal-uva Jul 31, 2024
da62b83
Some more warnings in the Tumblr search info
sal-uva Jul 31, 2024
0e739b3
Migrate script for expanded annotation table
sal-uva Jul 31, 2024
3c524c1
Get annotations per row
sal-uva Jul 31, 2024
9cfd5bd
First steps in revamping annotation saving
sal-uva Aug 7, 2024
e95c4bd
remove unused variables in explorer.js
sal-uva Aug 7, 2024
09a5350
Merge branch 'explorer-improvements' of https://github.com/digitalmet…
sal-uva Aug 7, 2024
2fdb876
First steps in giving annotations their own class
sal-uva Aug 7, 2024
6ddae4e
Fix mistakes in database.sql
sal-uva Aug 8, 2024
f679b7a
Make Annotation object usable
sal-uva Aug 8, 2024
24425e1
General annotations improvements and make processors save annotations
sal-uva Aug 9, 2024
7721341
fix: Bug in migrate
sal-uva Aug 12, 2024
22f6ea2
Fixes in migrate script
sal-uva Aug 12, 2024
e5cce49
Improve Annotation() and make map_item() fetch annotation values
sal-uva Aug 12, 2024
be8ac89
First steps to make new annotation system work with Explorer
sal-uva Aug 12, 2024
f0e61c3
Make annotations editable and saveable in Explorer
sal-uva Aug 13, 2024
28032b5
Make Tumblr search code a bit neater
sal-uva Aug 19, 2024
12b54b1
Add hash function to helpers
sal-uva Aug 19, 2024
2e6185c
Revert test code in count posts processor
sal-uva Aug 19, 2024
4ac3e62
Change parameter in Jinja2 template
sal-uva Aug 19, 2024
50cae61
Don't initialise Annotation() twice
sal-uva Aug 19, 2024
cdbe6ed
Clean up and revert some JS
sal-uva Aug 19, 2024
851e067
Separate annoatation field into a component
sal-uva Aug 19, 2024
f0a9708
Make processor and Explorer annotation features co-exist peacefully
sal-uva Aug 19, 2024
e78099f
Test annotation processor
sal-uva Aug 20, 2024
90a0eb0
Improve Tumblr search description
sal-uva Aug 20, 2024
288dc1a
Convert timestamps to the client's local time zone in Explorer
sal-uva Aug 20, 2024
a6230aa
Add processor that lets you download annotation metadata
sal-uva Aug 20, 2024
1a31d60
Add annotation metadata processor
sal-uva Aug 20, 2024
6286691
Add `author_original` to Annotation() attributes
sal-uva Aug 20, 2024
79e8661
'Fix' dummy processor
sal-uva Aug 20, 2024
a43ed90
Make lines separable in tooltip
sal-uva Aug 20, 2024
e378938
Remove a humongous amount of code in explorer.js by simply refreshing…
sal-uva Aug 20, 2024
908544b
Style changes in Explorer
sal-uva Aug 20, 2024
5e77fe2
Redesign annotation field input controls, make them sortable, plus so…
sal-uva Aug 21, 2024
af71c6c
Fix and simplify annotation field saving, re-enable saving options (a…
sal-uva Aug 22, 2024
a417283
Forgot a postgresql field in migrate script
sal-uva Aug 22, 2024
09f26dc
Revamp annotation saving from annotations made in Explorer
sal-uva Aug 22, 2024
88b7609
Add saving notice and fix dropdown saving
sal-uva Aug 23, 2024
9e06bf6
(almost) fix options saving
sal-uva Aug 26, 2024
33b527f
LinkedIn template: Don't make the play button icon as big as the post…
sal-uva Aug 26, 2024
50b53ef
Fix spinner
sal-uva Aug 26, 2024
e1b83b7
Don't contain tag drags in config
sal-uva Aug 26, 2024
b4c9437
Ibid
sal-uva Aug 26, 2024
1d35442
Simplify and fix option field saving, let changes also affect annotat…
sal-uva Aug 26, 2024
9cfc4e5
Save state of shoing and hiding annotations in URL params
sal-uva Aug 26, 2024
cd4ff4b
Don't let commas mess with options
sal-uva Aug 26, 2024
b9382b8
Return float in a float conversion function :)
sal-uva Aug 26, 2024
b097a54
Demove `dataset.file_exists()` in favour of more direct check
sal-uva Aug 26, 2024
a7eb872
Show processor parameters for processor-generated annotations in the …
sal-uva Aug 26, 2024
c553245
Delete annotations when a dataset is deleted, also for child datasets.
sal-uva Aug 26, 2024
cea24de
Don't use wrong variable in error handling when saving annotations
sal-uva Aug 26, 2024
9d81fb2
Don't use built-in variable name `id` in `Annotation()`
sal-uva Aug 26, 2024
6e8f352
Ibid
sal-uva Aug 26, 2024
5393741
Make PyCharm happy with `annotations.py` formatting
sal-uva Aug 26, 2024
cbbd89f
Rename `hash_values()` helper function to `hash_to_md5()`
sal-uva Aug 26, 2024
aa1ca56
Delete dummy annotator processor
sal-uva Aug 26, 2024
c878708
Fix typos in `annotation.py`
sal-uva Aug 26, 2024
1c4b905
Merge branch 'master' into explorer-improvements
sal-uva Aug 26, 2024
b8ebec4
Don't explicitly delete annotations for child datasets
sal-uva Aug 27, 2024
4d16317
Bump nltk
sal-uva Aug 27, 2024
0152645
Fix migrate script
sal-uva Aug 27, 2024
869aa1b
Don't store processor metadata in annotations table; make join on dat…
sal-uva Aug 27, 2024
3e77335
Make property badges look nicer in Explorer
sal-uva Aug 27, 2024
ecf2441
Ibid
sal-uva Aug 27, 2024
2c44ccb
Add Perspective API processor
sal-uva Aug 27, 2024
ce7836e
Disable disabled buttons in Explorer (resolves https://github.com/dig…
sal-uva Aug 28, 2024
ff2bf7e
Don't fail dataset sorting in Explorer views if the field is missing …
sal-uva Aug 28, 2024
55d81f5
Only do mapped item things when a row is already a mapped item
sal-uva Aug 28, 2024
475692c
Better description in Explorer save tooltip
sal-uva Aug 28, 2024
8811cae
utf-8 in csv->excel processor
sal-uva Aug 28, 2024
bc342ae
Uneccesary import
sal-uva Aug 28, 2024
8652ba2
Add `map_item()` to Perspective processor
sal-uva Aug 28, 2024
2ab9e86
Make mapped csvs downloadable for processor from the UI
sal-uva Aug 28, 2024
4fcafb9
Don't print in perspective processor
sal-uva Aug 30, 2024
9d322df
Move perspective processor
sal-uva Sep 2, 2024
057dd14
Typo in Tumblr search
sal-uva Sep 2, 2024
05e52c0
Add GPT processor
sal-uva Sep 2, 2024
cfc1e0e
Fix GPT processor and make it compatible with any NDJSON or CSV file
sal-uva Sep 6, 2024
b7ba19a
Don't fail migrate script in edge case when new annotations table is …
sal-uva Sep 9, 2024
94f415d
Improve GPT Prompting processor, add some more error handling and fri…
sal-uva Sep 9, 2024
a75c64d
Don't save annotations when no changes are made.
Sep 11, 2024
ae0712b
Space out tweets better in Explorer template
sal-uva Sep 12, 2024
8b63121
Merge branch 'explorer-improvements' of https://github.com/digitalmet…
sal-uva Sep 12, 2024
2bbb83c
No spacy
Sep 17, 2024
e0bbb83
Merge remote-tracking branch 'origin/explorer-improvements' into expl…
Sep 17, 2024
141b7af
Rename Explorer migrate script
Sep 17, 2024
20925a9
Merge remote-tracking branch 'origin/master' into explorer-improvements
Sep 17, 2024
1f9e1c3
Merge branch 'refs/heads/master' into explorer-improvements
sal-uva Sep 17, 2024
09d1551
Include openai library in setup
sal-uva Sep 17, 2024
b914687
Fix merge errors in result-result-row.html
sal-uva Sep 17, 2024
4d3cc1c
Add option to add a custom (fine-tuned) model to GPT processor.
sal-uva Sep 17, 2024
e07d748
Change wording and compatibility of Annotation metadata processor
sal-uva Sep 18, 2024
fa4d236
Change settings and wording for OpenAI LLM processor
sal-uva Sep 18, 2024
754b339
Allow Google API key in config
sal-uva Sep 18, 2024
e37db4f
Better error handling for Perspective processor
sal-uva Sep 18, 2024
c6f587a
Strong no longer
sal-uva Sep 18, 2024
f4b108e
Get rid of some unncessary code in Perspective processor
sal-uva Oct 1, 2024
024875e
Merge branch 'master' into explorer-improvements
sal-uva Oct 15, 2024
b90bb3e
Fix merge issues in Explorer API and Telegram search
sal-uva Oct 15, 2024
e5dba0e
Merge branch 'master' into explorer-improvements
sal-uva Oct 15, 2024
ba0bd6e
Merge branch 'master' into explorer-improvements
stijn-uva Dec 11, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions backend/lib/processor.py
Original file line number Diff line number Diff line change
Expand Up @@ -400,7 +400,7 @@ def add_field_to_parent(self, field_name, new_data, which_parent=source_dataset,

TODO: could be improved by accepting different types of data depending on csv or ndjson.

:param str field_name: name of the desired
:param str field_name: Name of the desired new field
:param List new_data: List of data to be added to parent dataset
:param DataSet which_parent: DataSet to be updated (e.g., self.source_dataset, self.dataset.get_parent(), self.dataset.top_parent())
:param bool update_existing: False (default) will raise an error if the field_name already exists
Expand All @@ -418,7 +418,7 @@ def add_field_to_parent(self, field_name, new_data, which_parent=source_dataset,
parent_path = which_parent.get_results_path()

if len(new_data) != which_parent.num_rows:
raise ProcessorException('Must have new data point for each record: parent dataset: %i, new data points: %i' % (which_parent.num_rows, len(new_data)))
self.dataset.update_status('The amount of new data points and existing records don\'t match; data may be misaligned (parent dataset: %i, new data points: %i)' % (which_parent.num_rows, len(new_data)))
sal-uva marked this conversation as resolved.
Show resolved Hide resolved

self.dataset.update_status("Adding new field %s to the source file" % field_name)

Expand Down
44 changes: 36 additions & 8 deletions common/lib/config_definition.py
Original file line number Diff line number Diff line change
Expand Up @@ -102,8 +102,8 @@
"privileges.can_use_explorer": {
"type": UserInput.OPTION_TOGGLE,
"default": True,
"help": "Can use explorer",
"tooltip": "Controls whether users can use the Explorer feature to navigate datasets."
"help": "Can use Explorer",
"tooltip": "Controls whether users can use the Explorer feature to analyse and annotate datasets."
},
"privileges.can_export_datasets": {
"type": UserInput.OPTION_TOGGLE,
Expand Down Expand Up @@ -305,22 +305,50 @@
"global": True
},
# Explorer settings
# The maximum allowed amount of rows (prevents timeouts and memory errors)
"explorer.max_posts": {
"explorer.__basic-explanation": {
sal-uva marked this conversation as resolved.
Show resolved Hide resolved
"type": UserInput.OPTION_INFO,
"help": "4CAT's Explorer feature lets you navigate and annotate datasets as if they "
"appared on their original platform. This is intended to facilitate qualitative "
"exploration and manual coding."
},
"explorer.__max_posts": {
"type": UserInput.OPTION_TEXT,
"default": 100000,
"help": "Amount of posts",
"coerce_type": int,
"tooltip": "Amount of posts to show in Explorer. The maximum allowed amount of rows (prevents timeouts and "
"tooltip": "Maximum number of posts to be considered by the Explorer (prevents timeouts and "
"memory errors)"
},
"explorer.posts_per_page": {
"explorer.__posts_per_page": {
"type": UserInput.OPTION_TEXT,
"default": 50,
"help": "Posts per page",
"coerce_type": int,
"tooltip": "Posts to display per page"
"tooltip": "Number of posts to display per page"
},
"explorer._config_explanation": {
"type": UserInput.OPTION_INFO,
"help": "Per data source, you can enable or disable the Explorer. Posts will be formatted through a <em>generic</em> template "
"made of [this HTML file](https://github.com/digitalmethodsinitiative/4cat/tree/master/webtool/templates/explorer/"
"templates/generic.html) and [this CSS file](https://github.com/digitalmethodsinitiative/4cat/tree/master/webtool/"
"static/css/explorer/generic.css). For various data sources, <em>data source-specific</em> templates are also available. "
"These are made of a custom HTML template in [this directory](https://github.com/digitalmethodsinitiative/4cat/tree/master/"
"webtool/datasource-templates/explorer/templates) and a custom CSS file [in this directory](https://github.com/digitalmethodsinitiative/4cat/tree/master/webtool/static/css/explorer)."
},
"explorer.config": {
"type": UserInput.OPTION_DATASOURCES_TABLE,
"help": "Explorer settings per data source",
"default": {"fourchan": {"enabled": True}, "eightchan": {"enabled": True}, "eightkun": {"enabled": True}, "ninegag": {"enabled": True}, "bitchute": {"enabled": True}, "dmi-tcat": {"enabled": True}, "dmi-tcatv2": {"enabled": True}, "douban": {"enabled": True}, "douyin": {"enabled": False}, "imgur": {"enabled": True}, "upload": {"enabled": True}, "instagram": {"enabled": True}, "linkedin": {"enabled": True}, "parler": {"enabled": True}, "reddit": {"enabled": True}, "telegram": {"enabled": True}, "tiktok": {"enabled": True}, "tiktok-urls": {"enabled": True}, "tumblr": {"enabled": True}, "twitter": {"enabled": True}, "twitterv2": {"enabled": True}, "usenet": {"enabled": True}, "vk": {"enabled": True}},
"columns": {
"enabled": {
"type": UserInput.OPTION_TOGGLE,
"help": "Enable Explorer",
"tooltip": "Whether the Explorer is available for this data source",
"default": True
}
}
},
"explorer"
# Web tool settings
# These are used by the FlaskConfig class in config.py
# Flask may require a restart to update them
Expand Down Expand Up @@ -515,7 +543,7 @@
"4cat": "4CAT Tool settings",
"api": "API credentials",
"flask": "Flask settings",
"explorer": "Data Explorer",
"explorer": "Explorer",
"datasources": "Data sources",
"expire": "Dataset expiration settings",
"mail": "Mail settings & credentials",
Expand Down
18 changes: 16 additions & 2 deletions common/lib/dataset.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
import backend
from common.config_manager import config
from common.lib.job import Job, JobNotFoundException
from common.lib.helpers import get_software_commit, NullAwareTextIOWrapper, convert_to_int
from common.lib.helpers import get_software_commit, NullAwareTextIOWrapper, convert_to_int, convert_to_float, flatten_dict
from common.lib.item_mapping import MappedItem, MissingMappedField, DatasetItem
from common.lib.fourcat_module import FourcatModule
from common.lib.exceptions import (ProcessorInterruptedException, DataSetException, DataSetNotFoundException,
Expand Down Expand Up @@ -275,8 +275,9 @@ def _iterate_items(self, processor=None):
yield item

elif path.suffix.lower() == ".ndjson":
# In NDJSON format each line in the file is a self-contained JSON

with path.open(encoding="utf-8") as infile:

for line in infile:
if hasattr(processor, "interrupted") and processor.interrupted:
raise ProcessorInterruptedException("Processor interrupted while iterating through NDJSON file")
Expand Down Expand Up @@ -337,6 +338,10 @@ def iterate_items(self, processor=None, warn_unmappable=True, map_missing="defau
if own_processor and own_processor.map_item_method_available(dataset=self):
item_mapper = True

# Annotation fields are dynamically added,
# so we're always going to accept these.
annotation_fields = self.get_annotation_fields()

# Loop through items
for i, item in enumerate(self._iterate_items(processor)):
# Save original to yield
Expand Down Expand Up @@ -381,6 +386,15 @@ def iterate_items(self, processor=None, warn_unmappable=True, map_missing="defau

else:
mapped_item = original_item

# Re-add annotation fields to a mapped item.
if annotation_fields:
for annotation_field in annotation_fields.values():
label = annotation_field["label"]
if type(mapped_item) is MappedItem:
mapped_item.data[label] = original_item.get(label, "")
else:
mapped_item[label] = original_item.get(label, "")

# yield a DatasetItem, which is a dict with some special properties
yield DatasetItem(mapper=item_mapper, original=original_item, mapped_object=mapped_item, **(mapped_item.get_item_data() if type(mapped_item) is MappedItem else mapped_item))
Expand Down
18 changes: 17 additions & 1 deletion common/lib/helpers.py
Original file line number Diff line number Diff line change
Expand Up @@ -206,6 +206,22 @@ def convert_to_int(value, default=0):
except (ValueError, TypeError):
return default

def convert_to_float(value, default=0):
"""
Convert a value to a floating point, with a fallback

The fallback is used if an Error is thrown during converstion to float.
This is a convenience function, but beats putting try-catches everywhere
we're using user input as a floating point number.

:param value: Value to convert
:param int default: Default value, if conversion not possible
:return int: Converted value
sal-uva marked this conversation as resolved.
Show resolved Hide resolved
"""
try:
return float(value)
except (ValueError, TypeError):
return default

def timify_long(number):
"""
Expand Down Expand Up @@ -789,7 +805,7 @@ def flatten_dict(d: MutableMapping, parent_key: str = '', sep: str = '.'):
Lists will be converted to json strings via json.dumps()

:param MutableMapping d: Dictionary like object
:param str partent_key: The original parent key prepending future nested keys
:param str parent_key: The original parent key prepending future nested keys
:param str sep: A seperator string used to combine parent and child keys
:return dict: A new dictionary with the no nested values
"""
Expand Down
17 changes: 17 additions & 0 deletions common/lib/user_input.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,8 @@ class UserInput:
OPTION_FILE = "file" # file upload
OPTION_HUE = "hue" # colour hue
OPTION_DATASOURCES = "datasources" # data source toggling
OPTION_DATASOURCES_TABLE = "datasources_table" # a table with settings per data source
OPTION_DATASOURCES_TEXT = "datasources_text" # text input per data source (via dropdown)
sal-uva marked this conversation as resolved.
Show resolved Hide resolved

OPTIONS_COSMETIC = (OPTION_INFO, OPTION_DIVIDER)

Expand Down Expand Up @@ -143,6 +145,21 @@ def parse_all(options, input, silently_correct=True):
parsed_input[option] = [datasource for datasource, v in datasources.items() if v["enabled"]]
parsed_input[option.split(".")[0] + ".expiration"] = datasources

elif settings.get("type") == UserInput.OPTION_DATASOURCES_TABLE:
# special case, parse table values to generate a dict
columns = list(settings["columns"].keys())
table_input = {}

for datasource in list(settings["default"].keys()):
table_input[datasource] = {}
for column in columns:

choice = input.get(option + "-" + datasource + "-" + column, False)
column_settings = settings["columns"][column] # sub-settings per column
table_input[datasource][column] = UserInput.parse_value(column_settings, choice, table_input, silently_correct=True)

parsed_input[option] = table_input

elif option not in input:
# not provided? use default
parsed_input[option] = settings.get("default", None)
Expand Down
92 changes: 0 additions & 92 deletions datasources/dmi-tcat/explorer/dmi-tcat-explorer.json

This file was deleted.

84 changes: 0 additions & 84 deletions datasources/dmi-tcatv2/explorer/dmi-tcat-explorer.css

This file was deleted.

Loading
Loading