forked from sdsykes/ferret
-
Notifications
You must be signed in to change notification settings - Fork 1
/
TODO
109 lines (104 loc) · 4.78 KB
/
TODO
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
TODO
====
* C
- IMPORTANT:
+ FIX file descriptor overflow. See Tickets #341 and #343
- add .. operator to query parser. For example, [100 200] could be written as
100..200 or 100...201 like in Ruby Ranges
- remove exception handling from C code. All errors to be handled by return
values.
- Move to sqlite's locking model. Ferret should work fine in a multi-process
environment.
- Add optional logging. To be enabled at compilation time, perhaps?
- Add support for changing zlib and bzlib compression parameters
- Improve unit test coverage to 100%
- Add benchmark suite
- Add Rakefile for development purposes
+ task to publish gcov and benchmark results to ferret wiki
- Index rebuilding of old versioned indexes.
- Add a globally accessable, threadsafe symbol table. This will be very
useful for storing field names so that no objects need to strdup the
field-names but can just store the symbol representative instead.
+ this has been done but it can be improved using actual Symbol structs
instead of plain char*
- Make threading optional at compile time
- to_json should limit output to prevent memory overflow on large indexes.
Perhaps we could use some type of buffered read for this.
- Make BitVector run as fast as bitset from C++ STL. See;
c/benchmark/bm_bitvector.c
- Add a symbol table for field names. This will mean that we won't need to
worry about mallocing and freeing field names which happens all over the
place.
- Divide the headers into public and private (the private headers to be
stored in the src directory).
- Group-by search. ie you should be able to pass a field to group search
results by
- Auto-loading of documents during search. ie actual documents get returned
instead of document numbers.
* Ruby bindings
- argument checking for every method. We need a new api for argument checking
so that the arguments get checked at the start of each method that could
cause a segfault.
- improve memory management. It was way to complex at the moment. I also need
to document how it works so that other developers understand what is going
on.
- Replace Data_Wrap_Struct with ferret alternative which handles rewrapping
of structs automatically and also knows when to release a struct by using
refcounting.
* Ruby
- integrate rcov
- improve unit test coverage to 100%
* Documentation.
- generate Ruby binding documentation with custom build template similar
jaxdoc http://rubyforge.org/projects/jaxdoc
- all documentation should meet DOCUMENTATION_STANDARDS
- documentation in C code to be generated by doxygen
Someday Maybe
=============
* apply for Google Summer of Code 2009
* optimize read and write vint
- test the following outside of ferret before implementing
- perform a binary scan using bit-wise or to find out how many bytes need
to be written
- if the write/read will overflow the buffer, split it into two, refreshing
the buffer in between
- use Duff's device to write bytes now that we know how many we need
* add a super fast language based dictionary compression
* add portable stacktrace function. Perhaps implement as an external library.
- See http://www.nongnu.org/libunwind/
- See http://www.tlug.org.za/wiki/index.php/Obtaining_a_stack_trace_in_C_upon_SIGSEGV
* investigate unscored searching
* user defined sorting
* Fix highlighting to work for external fields
* investigate faster string hashing method
Done
====
* add rake install task
* FIX :create parameter so that it only deletes the files owned by Ferret.
* fix compression. Currently nothing is happening if you set a field to
:compress. I guess we'll just assume zlib is installed, as I think it has to
be for Ruby to be installed.
* add bzlib support
* integrate gcov
* add a field cache to IndexReader
* setup email alerts for svn commits
* Ranged, unordered searching. Ie search through the index until you have the
required number of documents and then break. This will require the ability to
start searches from a particular doc-num.
+ See searcher_search_unordered in the C code and Searcher#scan in Ruby
* improve unit test code. I'd like to implement some way to print out a stack
trace when a test fails so that it is easy to find the source of the error.
* catch segfaults and print stack trace so users can post helpful bug tickets.
again, see the same links for adding stacktrace to unit tests.
* Add string Sort descripter
* fix memory bug
* add MultiReader interface
* add lexicographical sort (byte sort)
* Add highlighting
* add field compression
* Fix highlighting to work for compressed fields
* Add Ferret::Index::Index
* Fix:
+ Working Query: field1:value1 AND NOT field2:value2
+ Failing Query: field1:value1 AND ( NOT field2:value2 )
* update benchmark suite to use getrusage