Skip to content

Commit

Permalink
updated to new API
Browse files Browse the repository at this point in the history
  • Loading branch information
Crenshinibon committed Oct 29, 2013
1 parent 8e2ff4b commit 3b863b4
Show file tree
Hide file tree
Showing 6 changed files with 184 additions and 58 deletions.
2 changes: 1 addition & 1 deletion .meteor/release
Original file line number Diff line number Diff line change
@@ -1 +1 @@
0.6.6.1
0.6.6.2
124 changes: 93 additions & 31 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,8 +18,8 @@ Alternatively you can use [Meteorite](https://atmosphere.meteor.com). And add sp

This repository is itself a Meteor app and should serve as an example, of how to actually use Spomet.

Get Started
===========
Searching
=========

I tried to make using the package as simple as possible:

Expand Down Expand Up @@ -57,37 +57,85 @@ result =
```
<script src="https://gist.github.com/Crenshinibon/6710149.js"></script>

Add documents to the search by calling the method *add* with a *Spomet.Findable* instance:
Adding
======

Spomet.add new Spomet.Findable text, path, base, type, rev
Add documents to the search by calling the method *add* with a hash:

* text
The first parameter is the text, to be indexed.
* path
Part of the identifier, relative to the base. Useful to identify parts of the base document. E.g. attribute identifiers of the stored document.
* base
The base path. E.g. the id of the document, whose text should be indexed.
* type
The documents type. Might be useful to distinguish between different types of documents.
* rev
A revision number to support multiple version of a document.

´´´coffee-script
Spomet.add
text: 'this is some text that should be searchable'
path: '/description'
base: 'SOMEREFID'
type: 'post'
´´´

*text* and *base* are mandatory. *type* will be substituted with 'default' and *path* with '/' in case those are missing.

Spomet provides a basic revisioning mechanism. A version number is incremented in case a document with the same identifying attributes is added.

So in case there already exists a document with the same *path*, *base* and *type*, Spomet will add the new one nevertheless. The version number is increased by one. In case no document with the identifying parameters exists the revision number 1 will be used.


Replacing
=========

*Spomet.replace* is meant to add a new version of a document and remove a prior version of it. *replace* takes the same hash parameter like add.

Advanced
´´´
Spomet.replace
text: 'this is some other text'
path: '/description'
base: 'SOMEREFID'
type: 'post'
´´´

Only *base* and *text* are mandatory.

Additionally there is an optional second parameter. An integer version number. There you can specify which version of the document might be removed in favor for the new one.

If you omit the second parameter the latest version of the docmuent is deleted and the new one inserted.

*replace* adds a document, even when there isn't a prior version.

In case you don't want to or don't need to handle different versions, you should always use *replace* instead of *add*. *replace* adds a document, even if there was no document removed.

Removing
========

You can delete documents from the search by calling *Spomet.remove* with a *Spomet.Findable* instance as the parameter or with the *docId*.
You can remove documents by calling *Spomet.remove* with a hash parameter. *remove* reacts very flexible based on the attributes of the hash.

Spomet.remove 'post-id1234-description-2'
Spomet.remove new Spomet.Findable null, 'description', 'id1234', 'post', 2
Basically it performs a search with the given parameters and removes all matching documents. Here are a few examples:

You can update already indexed documents, dismissing the prior version.
Remove all documents of type 'default':

Spomet.update new Spomet.Findable text, path, base, type, rev

The document, with *rev - 1* gets removed from the search as a result.
Spomet.remove
type: 'default'

Remove all documents with a given base reference:

You can create your own searches by instantiating *Spomet.Search*.
Spomet.remove
base: 'BASEREFID'

Remove all documents with the give path:

Spomet.remove
path: '/description'

Remove a specific version of a document:

Spomet.remove
type: 'default'
base: 'BASEREFID'
path: '/description'
version: 2

You have to be careful, though. Missing out some parameter might result in the removal of more documenst than intended.

Custom Search
=============

Despite the provided search box you might want to create your own searches. You can achieve this by instantiating *Spomet.Search*.

mySearch = new Spomet.Search
mySearch.find 'some text'
Expand All @@ -97,28 +145,42 @@ You can create your own searches by instantiating *Spomet.Search*.
Technology
==========

The current implementation uses four simple indexes. They are supposed to balance precision and recall. There haven't been any tests yet. So future updates might fine-tune the parameters and introduce further indexes.
The current implementation uses four simple indexes. They are supposed to balance precision and recall. There haven't been many tests yet. So future updates might fine-tune the parameters and introduce further indexes.

Currently there is a 3gram based index, a simple word index, a custom index and a wordgroup index. Whereas wordgroups are groups of two words.
Currently there is a 3gram based index, a simple word index, a custom index (using four letter parts of words as tokens) and a wordgroup index. Whereas wordgroups are groups of two words.

Future enhancements might include stemming, algorithm based (e.g. Porter) or based on a lexikon. As well as phonetics.

Furthermore is the implementation not very efficient, I fear. There is plenty of room to optimize certain aspects.

The server process handles the heavy lifting of indexing, finding and scoring the documents.

When there are many documents to index the server might stall.
When there are many documents to index or search the server might stall.

A future enhancement might include establishing a separate process (deployable on a different host) for the indexing. Client side indexing might not be doable, because of security considerations.
A future enhancement might include establishing a separate process (deployable on a different host) for indexing. Client side indexing might not be doable, because of security considerations.

If you experience performance issues you might want to disable certain Indexes, you should start with the 3Gram index.
Controlling Indexes
===================

There are handy Meteor methods to achieve this:
If you experience performance issues you might want to disable certain indexes. You should start with the 3Gram index.

There are handy Meteor methods to disable indexes globally:

Meteor.call 'disableThreeGramIndex'
Meteor.call 'disableCustomIndex'
Meteor.call 'disableWordGroupIndex'
Meteor.call 'disableFullWordIndex'

Besides the ability to disable indexes in general - e.g. not using them while indexing **and** searching. You can disable indexes per search.

You set the indexes to be used during search by calling *setIndexNames* on the *Search* object your are using. As the only parameter of this methods Spomet expects an array of index names.

mySearch = new Spomet.Search
mySearch.setIndexNames ['fullword','custom']
mySearch.find 'some text'
mySearch.results()

The index's names are: 'fullword', 'wordgroup', 'threegram' and 'custom'

Tests
=====
Expand All @@ -129,7 +191,7 @@ Run the tests from the project's root folder with:

laika --compilers coffee:coffee-script

Note: There might be some false errors, indicating some curly braces problem, when you run all tests at once.
**Note:** There might be some false errors, indicating some curly braces problem, when you run all tests at once.

Warning
=======
Expand Down
20 changes: 16 additions & 4 deletions spomet.coffee
Original file line number Diff line number Diff line change
Expand Up @@ -27,17 +27,29 @@ if Meteor.isClient

Template.addable.events
'click input' : () ->
Spomet.add new Spomet.Findable this.title, '/title', this._id, 'post', 1
Spomet.add new Spomet.Findable this.text, '/text', this._id, 'post', 1
Posts.update {_id: this._id},{$set: {indexed: true}}
Spomet.add
text: @title
path: '/title'
base: @_id
type: 'post'
Spomet.add
text: @text
path: '/text'
base: @_id
type: 'post'
Posts.update {_id: @_id},{$set: {indexed: true}}

Template.ownText.events
'submit form': (e) ->
e.preventDefault()
tarea = $(e.target).find('textarea').first()
text = tarea.val()
id = CustomContent.insert {text: text}
Spomet.add new Spomet.Findable text, 'custom', id, 'custom', 1
Spomet.add
text: text
path: 'custom'
base: id
type: 'custom'
tarea.val ''


15 changes: 12 additions & 3 deletions tests/client.coffee
Original file line number Diff line number Diff line change
Expand Up @@ -5,9 +5,18 @@ suite 'Client Find', () ->
server.eval () ->
Spomet.reset()

e1 = new Spomet.Findable 'this should be easily found', '/', 'OID1', 1
e2 = new Spomet.Findable 'much more harder to find', '/', 'OID2', 1
e3 = new Spomet.Findable 'more harder to find, is that really the case', '/', 'OID3', 1
e1 =
text: 'this should be easily found'
path: '/',
base: 'OID1'
e2 =
text: 'much more harder to find'
path: '/'
base: 'OID2'
e3 =
text: 'more harder to find, is that really the case'
path: '/'
base: 'OID3'

Spomet.add e1
Spomet.add e2
Expand Down
43 changes: 31 additions & 12 deletions tests/index.coffee
Original file line number Diff line number Diff line change
Expand Up @@ -5,15 +5,20 @@ suite 'Index', () ->
server.eval () ->
Spomet.reset()

doc = new Spomet.Findable 'some simple text', '/', 'oid1', 'post', 1
Spomet.Index.add doc, (docId, message) ->
emit 'added', docId
doc =
text: 'some simple text'
path: '/'
base: 'oid1'
type: 'post'

emit 'threegram', Spomet.ThreeGramIndex.collection.find().fetch()
emit 'fullword', Spomet.FullWordIndex.collection.find().fetch()
emit 'wordgroup', Spomet.WordGroupIndex.collection.find().fetch()
emit 'custom', Spomet.CustomIndex.collection.find().fetch()
emit 'docs', Spomet.Documents.collection.find({meta: {$exists: false}}).fetch()
docSpec = Spomet.Index.add doc
emit 'added', Spomet._docId docSpec

emit 'threegram', Spomet.ThreeGramIndex.collection.find().fetch()
emit 'fullword', Spomet.FullWordIndex.collection.find().fetch()
emit 'wordgroup', Spomet.WordGroupIndex.collection.find().fetch()
emit 'custom', Spomet.CustomIndex.collection.find().fetch()
emit 'docs', Spomet.Documents.collection.find({meta: {$exists: false}}).fetch()

server.once 'added', (docId) ->
assert.equal docId, 'post-oid1-/-1'
Expand All @@ -33,7 +38,7 @@ suite 'Index', () ->
server.once 'docs', (docs) ->
assert.equal 1, docs.length
assert.equal docs[0].docId, 'post-oid1-/-1'
assert.equal docs[0].findable.text, 'some simple text'
assert.equal docs[0].text, 'some simple text'

iTokens = docs[0].indexTokens
assert.equal iTokens.length, 25
Expand All @@ -43,11 +48,25 @@ suite 'Index', () ->
server.eval () ->
Spomet.reset()

doc1 = new Spomet.Findable 'can be found', '/', 'oid1', 'post', 1
doc1 =
text: 'can be found'
path: '/'
base: 'oid1'
type: 'post'
Spomet.Index.add doc1
doc2 = new Spomet.Findable 'this not', '/', 'oid2', 'post', 1

doc2 =
text: 'this not'
path: '/'
base: 'oid2'
type: 'post'
Spomet.Index.add doc2
doc3 = new Spomet.Findable 'quite hard to find', '/', 'oid3', 'post', 1

doc3 =
text: 'quite hard to find'
path: '/'
base: 'oid3'
type: 'post'
Spomet.Index.add doc3


Expand Down
38 changes: 31 additions & 7 deletions tests/server.coffee
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,24 @@ suite 'Server Find', () ->
server.eval () ->
Spomet.reset()

e1 = new Spomet.Findable 'this is should be easily found', '/', 'OID1', 'post', 1
e1 =
text: 'this is should be easily found'
path: '/'
base: 'OID1'
type: 'post'
Spomet.add e1
e2 = new Spomet.Findable 'this is is', '/', 'OID2', 'post', 1

e2 =
text: 'this is is'
base: 'OID2'
type: 'post'
Spomet.add e2
e3 = new Spomet.Findable 'much more much more harder to find', '/', 'OID3', 'post', 1

e3 =
text: 'much more much more harder to find'
path: '/'
base: 'OID3'
type: 'post'
Spomet.add e3

emit 'tg1', Spomet.ThreeGramIndex.collection.find().fetch()
Expand All @@ -21,7 +34,8 @@ suite 'Server Find', () ->

emit 'docCount1', Spomet.FullWordIndex.collection.findOne {token: 'is'}

Spomet.remove e2.docId
Spomet.remove
base: e2.base

emit 'tg2', Spomet.ThreeGramIndex.collection.find().fetch()
emit 'fw2', Spomet.FullWordIndex.collection.find().fetch()
Expand Down Expand Up @@ -93,9 +107,19 @@ suite 'Server Find', () ->
server.eval () ->
Spomet.reset()

e1 = new Spomet.Findable 'this should be easily found', '/', 'OID1', 'post', 1
e2 = new Spomet.Findable 'harder to find', '/', 'OID2', 'post', 1
e3 = new Spomet.Findable 'much more harder to find', '/', 'OID3', 'post', 1
e1 =
text: 'this should be easily found'
path: '/'
base: 'OID1'
type: 'post'
e2 =
text: 'harder to find'
base: 'OID2'
type: 'post'
e3 =
text: 'much more harder to find'
base: 'OID3'
type: 'post'

Spomet.add e1
Spomet.add e2
Expand Down

0 comments on commit 3b863b4

Please sign in to comment.