This repository has been archived by the owner on Jul 2, 2020. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathTODO
49 lines (35 loc) · 2.09 KB
/
TODO
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
Things to do (a.k.a. "road map"):
---------------------------------
a) Jackrabbit as back-end repository for:
1) storage of whole "original" (as downloaded) documents:
a) versioned: to have some audit trail
b) caching: to be able to re-run harvesting/enhancements
if we update our code without making a request to
original source server
2) storage of whole processed documents:
a) for now, search GUI have nowhere to link to so this will also
allow to make some presentation layer for the data we harvest
b) further publication/replication of data: OAI-PMH, APIs, ...
b) document presentation layer: using processed documents from Jackrabbit - we
need to implement "pages" which will display details about organizations, procurement,
... so that we can link to them (from search pages, from anywhere else)
- needed feature: SOA friendly URLs like http://opendata.sk/dataset/organizations/<org_id>
possibly with also http://opendata.sk/dataset/organizations/<ico> redirecting to
.../<org_id>
- idea: assuming processed documents are XML, we can employ XSLT to produce HTML
note: This might/should be reused by ODN Search i.e. that's where
the search results should point to.
c) adding more harvesters: direct harvesting of ORSR to get more data
about companies, direct harvesting of procurement portals to have
a shot by getting more data from the scanned documents themselves, ...
d) transform the existing application into something like container
or whatever (maybe even using OSGI) so that it loads and runs
Harvester and other components (as sort of plug-ins)
i) at first, we need to only split the code into appropriate
classes and packages, all in one project
ii) later we can think about making it configurable so that ODN
can be deployed in varying configurations (like somebody does
not need certain harvesters and APIs so there would be a way to
configure and deploy ODN to meet that criteria)
Use ODCleanStore2 for that as it already does precisely that.
e) ...