Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Multithreaded #333

Open
mmatthias opened this issue Nov 13, 2014 · 5 comments
Open

Multithreaded #333

mmatthias opened this issue Nov 13, 2014 · 5 comments
Assignees

Comments

@mmatthias
Copy link

I think it is no big deal to make the resources thread safe. It would be an advantage to initialize some classes only once.

These classes are thread safe:

KeywordBasedFragmentAnnotator
new
SentenceAsFragmentAnnotator

They only have one final variable and that one is thread safe
more information here http://www.javamex.com/tutorials/synchronization_final.shtml

These classes are also thread safe but could be implemented in a better way:

TokenAsFragmentAnnotator
DependencyAsFragmentAnnotator

tokenPOSFilter, dependencyTypeFilter, governorPOSFilter, dependentPOSFilter - These variables could be final because the access is only a read. These methods are used by the maps "contain" and "isEmpty".

These classes have to be modified:

CachedLAPAccess

This class is not thread safe. Main modification should be done by converting the Hashmap into an ConcurrentHashMap. There are some other variables that maybe has to be discussed if they have to be uniq for each call. Are the stored information unique for a fragment graph? Is it good to have a global CachedLAPAccess or should we have global Lap implementations and use CachedLAPAccess only in local context?

@gilnoh
Copy link
Member

gilnoh commented Nov 13, 2014

(On the issue of CachedLAPAccess)
Yep. the class is not thread safe. I will work on it to make thread safe, via

  • make variables thread-safe.
  • lock / synchronize a region where underlying tool is not thread safe (underlying TreeTagger calling mechanism is not thread safe).

I will add more (detailed) comments later, but for now, I believe that main gain of cached LAP comes from detecting / saving common calls and not-calling underlying tool (so saves time). This is possible because we only keep one instance (global). If we move to "local" one, then I would say we should use normal (non-cached) LAP. Which will simply use less memory. CachedLAPAccess saves time by using up more memory (storing CASes) ... (I am assuming that the same sentence can appear in other fragments, too.)

I will check the class in terms of thread-safety, and then get back to you again.

@mmatthias
Copy link
Author

I do not know if this is the right way to solve this. I think the better way is to have a local cached one because our webserver runs without any shout down for month. If we cache the growing dataset in a global collection than we could run some day out of memory. ... As we only need to build the whole graph in one call I think it is enough to have the cache local. Would it be possible to have the implementations of the laps global and use the cache in a local way? ... I think we could modify the classes in this way. I could make the adaption with Kathrin when she is next time in our office.

@gilnoh
Copy link
Member

gilnoh commented Nov 13, 2014

Oh, now I see more clearly about what you mean by "global" and "local". I completely agree.

CachedLAP is only required when you build a graph; and once the graph is stable, you can throw out the CachedLAP instance (and all its memory). When you next requires to expand the graph, only then you need another (a new, clean) instance of CachedLAP. So in this sense, you should only activate one CachedLAP instance when you build / extend entailment trees.

I will make cached LAP more thread-safe and get back to this comment once more...

@salvadora
Copy link
Contributor

I added the final keyword in the fragment annotators as requested.

@gilnoh
Copy link
Member

gilnoh commented Nov 13, 2014

I've updated CachedLAP as thread-safe as much as I can see. #336

Underlying LAPAccess is not thread safe (which would not be called much, since we are using cache), and I've surrounded it with synchornized method; and I've updated all instance variables to be accessed in a way that is thread safe.

I do not know the context of thread-safe attempt --- but just let me outline that, many other codes (as I see) of TL is not thread-safe. For example, codes like Entailment graph generations are not thread safe, and I guess it would be very hard to make all of them thread-safe (unless you do simply blocking such as wrap everything in sync, etc).

So, :-) I am assuming that you are trying to make codes that are related to USE-case 2 "usage" of the already built graph --- but not on building parts. Is this correct? :-) Just wanted to know this is the case.

Anyhow, CachedLAPAccess would be now thread-safe. --- eh, or so I believe.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants