-
Notifications
You must be signed in to change notification settings - Fork 0
/
rss.xml
268 lines (260 loc) · 41.3 KB
/
rss.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
<?xml version="1.0" encoding="utf-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Lijax</title><link>http://www.yeyanbo.com/</link><description></description><atom:link href="http://www.yeyanbo.com/rss.xml" rel="self"></atom:link><lastBuildDate>Mon, 16 Sep 2013 11:20:00 +0800</lastBuildDate><item><title>Google Summer of Code 14</title><link>http://www.yeyanbo.com/en/google-summer-of-code-14.html</link><description><h2>Last Week(9.2-9.8)</h2>
<ul>
<li>Tested consensus methods on some real data;</li>
<li>Identified and fixed bugs of adam consensus method.</li>
</ul>
<h2>This Week(9.9-9.15)</h2>
<ul>
<li>Add examples and document on biopython wiki.</li>
</ul></description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Yanbo Ye</dc:creator><pubDate>Mon, 16 Sep 2013 11:20:00 +0800</pubDate><guid>tag:www.yeyanbo.com,2013-09-16:en/google-summer-of-code-14.html</guid><category>GSOC</category><category>biopython</category><category>adam consensus</category></item><item><title>Google Summer of Code 13</title><link>http://www.yeyanbo.com/en/google-summer-of-code-13.html</link><description><h2>Last Week(9.2-9.8)</h2>
<ul>
<li>Tested tree construction methods on some real data;</li>
<li>Set 'identity' model as default and adapted to protein sequence as well in DistanceCalculator;</li>
<li>Improved the ParsimonyTreeConstructor and test.</li>
</ul>
<h2>This Week(9.9-9.15)</h2>
<ul>
<li>Improve test and documentation for Consensus module;</li>
<li>Cleanup all code</li>
</ul></description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Yanbo Ye</dc:creator><pubDate>Tue, 10 Sep 2013 14:25:00 +0800</pubDate><guid>tag:www.yeyanbo.com,2013-09-10:en/google-summer-of-code-13.html</guid><category>GSOC</category><category>biopython</category><category>test</category><category>tree construction</category></item><item><title>Google Summer of Code 12</title><link>http://www.yeyanbo.com/en/google-summer-of-code-12.html</link><description><h2>Last Week(8.26-9.1)</h2>
<ul>
<li>Implemented bootstrap method for MSA object and two interface methods;</li>
<li>Restructured TreeConstructor classes to allow MSA independent of constructors;</li>
<li>Fix two bugs in <code>upgma</code> and <code>nj</code> algorithms.</li>
</ul>
<h2>This Week(9.2-9.8)</h2>
<ul>
<li>Test more on TreeConstruction module; Improve documentation;</li>
</ul></description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Yanbo Ye</dc:creator><pubDate>Tue, 03 Sep 2013 09:25:00 +0800</pubDate><guid>tag:www.yeyanbo.com,2013-09-03:en/google-summer-of-code-12.html</guid><category>GSOC</category><category>biopython</category><category>bootstrap methods</category><category>tree construction</category></item><item><title>Google Summer of Code 11</title><link>http://www.yeyanbo.com/en/google-summer-of-code-11.html</link><description><h2>Last Week(8.19-8.25)</h2>
<ul>
<li>Implemented the branch support method <code>get_support(target_tree, trees)</code> for a target tree and a series of bootstrap replicate trees;</li>
</ul>
<h2>This Week(8.26-9.1)</h2>
<ul>
<li>Implement bootstrap method for MSA object;</li>
<li>Implement two interface methods for bootstrap trees and the final consensus tree.</li>
</ul></description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Yanbo Ye</dc:creator><pubDate>Mon, 26 Aug 2013 11:30:00 +0800</pubDate><guid>tag:www.yeyanbo.com,2013-08-26:en/google-summer-of-code-11.html</guid><category>GSOC</category><category>biopython</category><category>branch support</category></item><item><title>Google Summer of Code 10</title><link>http://www.yeyanbo.com/en/google-summer-of-code-10.html</link><description><h2>Last Week(8.12-8.18)</h2>
<ul>
<li>Improved BitString documentation;</li>
<li>Fixed majority consensus bugs;</li>
<li>Finished adam consensus algorithm;</li>
</ul>
<h2>This Week(8.19-8.25)</h2>
<ul>
<li>Implement the branch support method for a target tree and a series of bootstrap replicate trees;</li>
</ul></description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Yanbo Ye</dc:creator><pubDate>Mon, 19 Aug 2013 15:30:00 +0800</pubDate><guid>tag:www.yeyanbo.com,2013-08-19:en/google-summer-of-code-10.html</guid><category>GSOC</category><category>biopython</category><category>majority rule consensus</category><category>adam consensus</category></item><item><title>Google Summer of Code 9</title><link>http://www.yeyanbo.com/en/google-summer-of-code-9.html</link><description><h2>Last Week(8.5-8.11)</h2>
<ul>
<li>Add two new methods <code>independent</code> and <code>iscompatible</code> to <code>BitString</code> class;</li>
<li>Improved <code>_count_clades</code> method to store branch length as well and changed the <code>strict_consensus</code> method accordingly;</li>
<li>Implemented majority-rule consensus method;</li>
<li>Wrote the first version of <code>adam_consensus</code> method, not tested yet;</li>
</ul>
<h2>This Week(8.12-8.18)</h2>
<ul>
<li>Complete the <code>adam_consensus</code> method;</li>
<li>Improve test and document for consensus methods;</li>
</ul></description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Yanbo Ye</dc:creator><pubDate>Mon, 12 Aug 2013 18:10:00 +0800</pubDate><guid>tag:www.yeyanbo.com,2013-08-12:en/google-summer-of-code-9.html</guid><category>GSOC</category><category>biopython</category><category>majority rule consensus</category><category>adam consensus</category></item><item><title>Google Summer of Code 8</title><link>http://www.yeyanbo.com/en/google-summer-of-code-8.html</link><description><h2>Last Week(7.29-8.4)</h2>
<ul>
<li>Fixed a small bug and improved the code structure of TreeConsruction module.</li>
<li>Improved documents and tests for TreeConsruction module.</li>
<li>Submitted mid-term evaluation.</li>
</ul>
<h2>This Week(8.5-8.11)</h2>
<ul>
<li>Write majority-rule consensus method;</li>
<li>Get clear understanding of adams consensus method and try to implement it.</li>
</ul></description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Yanbo Ye</dc:creator><pubDate>Mon, 05 Aug 2013 11:10:00 +0800</pubDate><guid>tag:www.yeyanbo.com,2013-08-05:en/google-summer-of-code-8.html</guid><category>GSOC</category><category>biopython</category><category>test</category><category>document</category><category>midterm</category></item><item><title>Google Summer of Code 7</title><link>http://www.yeyanbo.com/en/google-summer-of-code-7.html</link><description><h2>Last Week(7.22-7.28)</h2>
<ul>
<li>Implemented a <code>BitString</code> class with binary manipulation methods in <code>Consensus</code> module.</li>
<li>Finished <code>strict_consensus</code> method using <code>BitString</code>.</li>
<li>Wrote a method to import and convert protein substitution matrices from <code>SubsMat.MatrixInfo</code> to <code>Matrix</code> of nested list in <code>TreeConstruction</code> module.</li>
</ul>
<h2>This Week(7.29-8.2)</h2>
<ul>
<li>Cleanup existing code, improve tests and document;</li>
<li>Write and submit mid-term evaluations.</li>
</ul></description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Yanbo Ye</dc:creator><pubDate>Mon, 29 Jul 2013 11:10:00 +0800</pubDate><guid>tag:www.yeyanbo.com,2013-07-29:en/google-summer-of-code-7.html</guid><category>GSOC</category><category>biopython</category><category>bitstring</category><category>strict consensus</category></item><item><title>Google Summer of Code 6</title><link>http://www.yeyanbo.com/en/google-summer-of-code-6.html</link><description><h2>Last Week(7.15-7.21)</h2>
<ul>
<li>Rewrote and tested the ParsimonyScorer class with the combination of Fitch algorithm and Sankoff algorithm;</li>
<li>Implemented the NNITreeSearcher class for searching the best parsimony tree;</li>
<li>Restructured all parsimony classes and finished the ParsimonyTreeConstructor class.</li>
</ul>
<h2>This Week(7.22-7.28)</h2>
<ul>
<li>Write more tests for finished algorithms.</li>
<li>Write a binary class that will be used for clade storing and counting in the consensus tree searching algorithms.</li>
</ul>
<h2>Sample Usage for Parsimony Tree</h2>
<div class="highlight"><pre> <span class="kn">from</span> <span class="nn">Bio</span> <span class="kn">import</span> <span class="n">AlignIO</span>
<span class="kn">from</span> <span class="nn">Bio</span> <span class="kn">import</span> <span class="n">Phylo</span>
<span class="kn">from</span> <span class="nn">Bio.Phylo.TreeConstruction</span> <span class="kn">import</span> <span class="n">Matrix</span>
<span class="kn">from</span> <span class="nn">Bio.Phylo.TreeConstruction</span> <span class="kn">import</span> <span class="n">ParsimonyScorer</span>
<span class="kn">from</span> <span class="nn">Bio.Phylo.TreeConstruction</span> <span class="kn">import</span> <span class="n">NNITreeSearcher</span>
<span class="kn">from</span> <span class="nn">Bio.Phylo.TreeConstruction</span> <span class="kn">import</span> <span class="n">ParsimonyTreeConstructor</span>
<span class="c"># alignment</span>
<span class="n">aln</span> <span class="o">=</span> <span class="n">AlignIO</span><span class="o">.</span><span class="n">read</span><span class="p">(</span><span class="nb">open</span><span class="p">(</span><span class="s">&#39;TreeConstruction/msa.phy&#39;</span><span class="p">),</span> <span class="s">&#39;phylip&#39;</span><span class="p">)</span>
<span class="c"># start tree</span>
<span class="n">tree</span> <span class="o">=</span> <span class="n">Phylo</span><span class="o">.</span><span class="n">read</span><span class="p">(</span><span class="s">&#39;./TreeConstruction/upgma.tre&#39;</span><span class="p">,</span> <span class="s">&#39;newick&#39;</span><span class="p">)</span>
<span class="c">#create parsimony scorer from a penalty matrix</span>
<span class="n">alphabet</span> <span class="o">=</span> <span class="p">[</span><span class="s">&#39;A&#39;</span><span class="p">,</span> <span class="s">&#39;T&#39;</span><span class="p">,</span> <span class="s">&#39;C&#39;</span><span class="p">,</span> <span class="s">&#39;G&#39;</span><span class="p">]</span>
<span class="n">penalty_matrix</span> <span class="o">=</span> <span class="p">[[</span><span class="mi">0</span><span class="p">],</span>
<span class="p">[</span><span class="mf">2.5</span><span class="p">,</span> <span class="mi">0</span><span class="p">],</span>
<span class="p">[</span><span class="mf">2.5</span><span class="p">,</span> <span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">],</span>
<span class="p">[</span> <span class="mi">1</span><span class="p">,</span> <span class="mf">2.5</span><span class="p">,</span> <span class="mf">2.5</span><span class="p">,</span> <span class="mi">0</span><span class="p">]]</span>
<span class="n">matrix</span> <span class="o">=</span> <span class="n">Matrix</span><span class="p">(</span><span class="n">alphabet</span><span class="p">,</span> <span class="n">step_matrix</span><span class="p">)</span>
<span class="n">scorer</span> <span class="o">=</span> <span class="n">ParsimonyScorer</span><span class="p">(</span><span class="n">matrix</span><span class="p">)</span>
<span class="c"># create tree searcher of Nearest Neighbor Interchange</span>
<span class="n">searcher</span> <span class="o">=</span> <span class="n">NNITreeSearcher</span><span class="p">(</span><span class="n">scorer</span><span class="p">)</span>
<span class="c"># create parsimony tree constructor</span>
<span class="n">constructor</span> <span class="o">=</span> <span class="n">ParsimonyTreeConstructor</span><span class="p">(</span><span class="n">aln</span><span class="p">,</span> <span class="n">searcher</span><span class="p">,</span> <span class="n">tree</span><span class="p">)</span>
<span class="c"># build the best parsimony tree</span>
<span class="n">best_tree</span> <span class="o">=</span> <span class="n">constructor</span><span class="o">.</span><span class="n">build_tree</span><span class="p">()</span>
</pre></div></description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Yanbo Ye</dc:creator><pubDate>Sun, 21 Jul 2013 11:10:00 +0800</pubDate><guid>tag:www.yeyanbo.com,2013-07-21:en/google-summer-of-code-6.html</guid><category>GSOC</category><category>biopython</category><category>maximum parsimony</category><category>consensus tree</category></item><item><title>Google Summer of Code 5</title><link>http://www.yeyanbo.com/en/google-summer-of-code-5.html</link><description><p>I was working on my future job and a bioinformatics course during last two weeks. So only few things have been finished. While, everything now works as plained, as I finished the NJ algorithm one week ahead of original schedule.</p>
<h2>Last Two Weeks(7.1-7.14)</h2>
<ul>
<li>Solved all problems about parsimony method with the help of Mark and redesigned all classes for this algorithm;</li>
<li>Implemented generic ParsimonyScorer accepting a scoring matrix for calculating parsimony score given a tree and an alignment.</li>
</ul>
<h2>This week(7.15-7.21)</h2>
<ul>
<li>Write test for ParsimonyScorer;</li>
<li>Implement NNI(Nearest Neighbor Interchanges) algorithm.</li>
</ul></description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Yanbo Ye</dc:creator><pubDate>Tue, 16 Jul 2013 08:10:00 +0800</pubDate><guid>tag:www.yeyanbo.com,2013-07-16:en/google-summer-of-code-5.html</guid><category>GSOC</category><category>biopython</category><category>maximum parsimony</category></item><item><title>Google Summer of Code 4</title><link>http://www.yeyanbo.com/en/google-summer-of-code-4.html</link><description><h2>Last Week(6.24-6.30)</h2>
<ul>
<li>Implemented the UPGMA and NJ algorithms(<code>DistanceTreeConstructor</code>) by porting my Java code for those two.</li>
<li>Added test code in <code>test_TreeConstruction</code> for both algorithms</li>
</ul>
<h2>This week(7.1-7.7)</h2>
<ul>
<li>Get more clear understanding of the parsimony methods for both DNA and protein sequences.</li>
<li>Design the parsimony score method and write document and tests for it;</li>
<li>Implement method to calculate the parsimony score for a given tree and an alignment;</li>
</ul>
<h2>DistanceTreeConstructor Usage</h2>
<div class="highlight"><pre> <span class="kn">from</span> <span class="nn">Bio</span> <span class="kn">import</span> <span class="n">AlignIO</span>
<span class="kn">from</span> <span class="nn">Bio.Phylo.TreeConstruction</span> <span class="kn">import</span> <span class="n">DistanceCaluculator</span>
<span class="kn">from</span> <span class="nn">Bio.Phylo.TreeConstruction</span> <span class="kn">import</span> <span class="n">DistanceTreeConstructor</span>
<span class="kn">from</span> <span class="nn">Bio.Phylo.TreeConstruction</span> <span class="kn">import</span> <span class="n">TreeConstruction</span>
<span class="n">aln</span> <span class="o">=</span> <span class="n">AlignIO</span><span class="o">.</span><span class="n">read</span><span class="p">(</span><span class="nb">open</span><span class="p">(</span><span class="s">&#39;TreeConstruction/msa.phy&#39;</span><span class="p">),</span> <span class="s">&#39;phylip&#39;</span><span class="p">)</span>
<span class="n">calculator</span> <span class="o">=</span> <span class="n">DistanceCaluculator</span><span class="p">(</span><span class="n">aln</span><span class="p">,</span> <span class="s">&#39;blosum62&#39;</span><span class="p">)</span>
<span class="n">dm</span> <span class="o">=</span> <span class="n">calculator</span><span class="o">.</span><span class="n">get_distance</span><span class="p">()</span>
<span class="n">constructor</span> <span class="o">=</span> <span class="n">DistanceTreeConstructor</span><span class="p">(</span><span class="n">dm</span><span class="p">)</span>
<span class="n">upgma_tree</span> <span class="o">=</span> <span class="n">constructor</span><span class="o">.</span><span class="n">upgma</span><span class="p">()</span>
<span class="n">nj_tree</span> <span class="o">=</span> <span class="n">constructor</span><span class="o">.</span><span class="n">nj</span><span class="p">()</span>
</pre></div></description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Yanbo Ye</dc:creator><pubDate>Mon, 01 Jul 2013 17:20:00 +0800</pubDate><guid>tag:www.yeyanbo.com,2013-07-01:en/google-summer-of-code-4.html</guid><category>GSOC</category><category>biopython</category><category>UPGMA</category><category>NJ</category><category>parsimony</category></item><item><title>Google Summer of Code 3</title><link>http://www.yeyanbo.com/en/google-summer-of-code-3.html</link><description><h2>First week summary</h2>
<p>Last week, I designed the <a href="https://github.com/lijax/biopython/blob/master/Bio/Phylo/TreeConstruction.py">TreeConstruction</a> module implemented the <code>DistanceMatrix</code> and <code>DistanceCalculator</code> classes, almost the same as planed. From the original <code>DistanceMatrix</code> plan, I extracted a <code>Matrix</code> base class so that it can be used for scoring matrices or be extended later.</p>
<p>Usage demo:</p>
<div class="highlight"><pre><span class="kn">from</span> <span class="nn">Bio</span> <span class="kn">import</span> <span class="n">AlignIO</span>
<span class="kn">from</span> <span class="nn">Bio.Phylo.TreeConstruction</span> <span class="kn">import</span> <span class="n">DistanceMatrix</span>
<span class="kn">from</span> <span class="nn">Bio.Phylo.TreeConstruction</span> <span class="kn">import</span> <span class="n">DistanceCaluculator</span>
<span class="c"># get a multiple alignment</span>
<span class="n">alignment</span> <span class="o">=</span> <span class="n">AlignIO</span><span class="o">.</span><span class="n">read</span><span class="p">(</span><span class="nb">open</span><span class="p">(</span><span class="s">&#39;msa.phy&#39;</span><span class="p">),</span> <span class="s">&#39;phylip&#39;</span><span class="p">)</span>
<span class="c"># construct a distance calculator from the alignment and the given scoring matrix name(DNA: identity, blastn, trans; Protein: blosum40/62/90, pam90/120/250) </span>
<span class="n">calculator</span> <span class="o">=</span> <span class="n">DistanceCaluculator</span><span class="p">(</span><span class="n">alignment</span><span class="p">,</span> <span class="s">&#39;identity&#39;</span><span class="p">)</span>
<span class="c"># get the distance matrix</span>
<span class="n">dm</span> <span class="o">=</span> <span class="n">calculator</span><span class="o">.</span><span class="n">get_distance</span><span class="p">()</span>
<span class="c"># print a lower triangular format of the distance matrix</span>
<span class="k">print</span> <span class="nb">str</span><span class="p">(</span><span class="n">dm</span><span class="p">)</span>
<span class="c"># get the distance from sequence &#39;Alpha&#39; to &#39;Beta&#39;(the id from the SeqRecord of the MSA object)</span>
<span class="k">print</span> <span class="n">dm</span><span class="p">[</span><span class="s">&#39;Alpha&#39;</span><span class="p">,</span> <span class="s">&#39;Beta&#39;</span><span class="p">]</span>
<span class="c"># delete a element from the distance matrix</span>
<span class="k">del</span> <span class="n">dm</span><span class="p">[</span><span class="s">&#39;Alpha&#39;</span><span class="p">]</span>
<span class="c"># insert a element with the distances at the position 1 </span>
<span class="n">dm</span><span class="o">.</span><span class="n">insert</span><span class="p">(</span><span class="s">&#39;Alpha&#39;</span><span class="p">,</span> <span class="p">[</span><span class="mi">1</span><span class="p">,</span> <span class="mi">0</span><span class="p">,</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">4</span><span class="p">],</span> <span class="mi">1</span><span class="p">)</span>
</pre></div>
<p>A <a href="https://github.com/lijax/biopython/blob/master/Tests/test_TreeConstruction.py">unittest</a> was also written in the Tests directory.</p>
<p>Hope I can get feedbacks to improve my python coding.</p>
<h2>Plan for this week</h2>
<p>Implement the UPGMA and NJ algorithms. This should be easy as I wrote both of them in Java before.</p>
<h2>Problems</h2>
<p>One common operation in both algorithms is to delete and insert elements in the DistanceMatrix object. This may cause unexpected error if there are other operations on the original DistanceMatrix object after any of the algorithm. I think one solution is to use the <code>deepcopy</code> to make another copy of the DistanceMatrix object at the beginning of the algorithm. A little slower.</p></description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Yanbo Ye</dc:creator><pubDate>Mon, 24 Jun 2013 08:35:00 +0800</pubDate><guid>tag:www.yeyanbo.com,2013-06-24:en/google-summer-of-code-3.html</guid><category>GSOC</category><category>biopython</category><category>distance matrix</category><category>distance calculation</category></item><item><title>Google Summer of Code 2</title><link>http://www.yeyanbo.com/en/google-summer-of-code-2.html</link><description><p>Coding period will begin next Monday. It's time to work. </p>
<h2>Tree construction module design</h2>
<p>The first task of this project is to implement a tree construction module providing three basic tree construction algorithms(UPGMA, NJ and MP). I'll name this module TreeConstruction. Classes design are as follows:</p>
<ol>
<li><code>TreeConstructor</code>: basic class for all tree constructors.</li>
<li><code>DistanceTreeConstructor</code>: This class accepts a <code>DistanceMatrix</code> to create a constructor object and provide two methods, <code>upgma</code> and <code>nj</code>, to construct and return a Tree object. Though we can construct the distance tree directly from a <code>MSA</code>, I think it's better to separate different responsibilities into different classs or methods. </li>
<li><code>ParsimonyTreeConstructor</code>: This class accepts a <code>MSA</code> to create a constructor object and provide a <code>mp</code> method to construct and return a Tree object. Two assistant methods <code>__parsimony_score</code> and <code>__nni</code> will be used to calculate the parsimony score and to do the Nearest Neighbor Interchanges to search the best tree.</li>
<li><code>DistanceMatrix</code>: This class accepts a name list and lower triangle matrix to create the object. Some built-in methods <code>__getitem__</code>, <code>__setitem__</code>, <code>__delitem__</code>, <code>__len__</code> and a <code>insert</code> method will be implemented to assist distance tree construction.</li>
<li><code>DistanceCalculator</code>: This class accepts a <code>MSA</code> to create the object. Two methods <code>dna_distance</code> and <code>protein_distance</code> can be provided to calculate DNA and protein distances respectively and return a <code>DistanceMatrix</code> object, and two assistant methods <code>dna_pair</code> and <code>protein_pair</code> to calculate pairwise distance.</li>
</ol>
<h2>First week work plan</h2>
<ul>
<li>
<p>Implement the <code>DistanceMatrix</code> first so that the distance based method can be worked on later. For an object <code>dm</code> of the <code>DistanceMatrix</code>, the expected functions are:</p>
<ul>
<li><code>dm[1]</code>, <code>dm['name']</code>: to get or set the distances related to taxa of the index '1' or the 'name';</li>
<li><code>dm[1,2]</code>, `dm['name1','name2']: to get or set the specified distance;</li>
<li><code>del dm[1]</code>, <code>del dm['name']</code>: to delete one branch.</li>
<li><code>dm.insert('name', distances)</code>: to insert a taxa with related distances.</li>
<li>Those functions will be used in UPGMA and NJ algorithms.</li>
</ul>
</li>
<li>
<p>If there is enough time, try to implement <code>DistanceCalculator</code>. The works include:</p>
<ul>
<li>check and identify the <code>Alphabet</code> of the <code>MSA</code> (why it's <code>SingleLetterAlphabet()</code> no matter what the sequences are?);</li>
<li>choose and prepare scoring matrices for dna and protein;</li>
<li>write distance methods for dna and proteins.</li>
<li>write tests for distance calculation.</li>
</ul>
</li>
</ul>
<h2>Problems and Challenges</h2>
<p>I'm sure the <code>DistanceMatrix</code> class can be completed this week. So it won't affect the works for the next few weeks.</p>
<p>For the <code>DistanceCalculator</code>, I estimate it will consume too much time on test design and data preparation.</p>
<p>One problem is how to identify the alphabet of the <code>MSA</code> so as to decide which distance method to use. Let the user define?</p>
<p>Another one is which scoring matrices we should choose. Provide all and let the user select?</p>
<p>Maybe we can implement or improve the <code>DistanceCalculator</code> later if we extent this too much.</p>
<h2>Conclusion</h2>
<p>Work out the <code>DistanceMatrix</code> and try the <code>DistanceCalculator</code>.</p></description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Yanbo Ye</dc:creator><pubDate>Sun, 16 Jun 2013 09:35:00 +0800</pubDate><guid>tag:www.yeyanbo.com,2013-06-16:en/google-summer-of-code-2.html</guid><category>GSOC</category><category>biopython</category><category>phylo</category><category>tree construction</category><category>distance matrix</category></item><item><title>Google Summer of Code 1</title><link>http://www.yeyanbo.com/en/google-summer-of-code-1.html</link><description><p>I'm very excited that I'm accepted for this year's <a href="http://www.google-melange.com/gsoc/homepage/google/gsoc2013">Google Summer of Code(GSOC)</a>. In recent days, I have been busy preparing my master's thesis and defense. This news is like good stress reliever for me. The project I'm going to work on is "<a href="http://informatics.nescent.org/wiki/Phyloinformatics_Summer_of_Code_2013#Phylogenetics_in_Biopython:_Filling_in_the_gaps">Phylogenetics in Biopython: Filling in the gaps</a>", which is to implement some phylogenetics algorithms for <a href="http://biopython.org/wiki/Main_Page">Biopython</a>. I believe it will be an exciting coding experience.</p>
<h2>Get to Know GSOC</h2>
<p>The first time I got to know the GSOC was from <a href="http://www.biojava.org">Biojava</a> homepage when I was trying to use Biojava for my own bioinformatics work. As I thought most of the applicants and biojava contributors might be from the computer background, I never had the courage to apply this. Last September, I got the chance to know Professor Allen and Karen when they were visiting our lab. And Karen told us more details about the GSOC and also the <a href="http://www.nescent.org/">NESCent</a>, and that they had been the mentoring organization for several years. I must say this finally inspired me to apply the GSOC this year.</p>
<h2>Application</h2>
<p>The application is certainly through the <a href="http://informatics.nescent.org/wiki/Phyloinformatics_Summer_of_Code_2013">Phyloinformatics Summer of Code</a> from NESCent. I originally wanted to apply the project of "Discovering links to ToLWeb content from a tree in the Open Tree of Life's software system". This project is based on several existing Java projects and also need some knowledge of HTML, XML, Javascript and Python. As my first programming language is Java and I know other related languages and techniques, this project is good for me. After the Biopython projects being added in, I found the current project was more suitable for me. Because most of the algorithms in this project are implemented in <a href="https://github.com/bigwiv/BlastGraph">BlastGraph</a>, a software I wrote in Java. I'm very familiar with those algorithms. Also, the former project has another applicant, while this one did not have any. As the project can only have one student and every student can only work on one project, maybe it's better to avoid the competition so that everyone can have a higher chance to be selected. Another major reason to choose this project is that I want to improve my python programming skill, which I use far less than Java before.</p>
<h2>Project Description</h2>
<p>As the name implies, this project is to implement some phylogenetic algorithms that are currently absent in the Biopython.Phylo package. In this package, some basic phylogenetics functions, such as tree operations, parsers for Newick, Nexus and PhyloXML, and wrappers for Phyml, Raxml and PAML, are already implemented. While there are some important components that remain to be filled in to better support phylogenetic workflows. These include simple tree construction algorithms, consensus tree searching, tree comparison and visualization. In this project, I will focus on the first two functions: tree construction and consensus tree searching. The tree construction part includes three algorithms: <a href="http://en.wikipedia.org/wiki/UPGMA">UPGMA</a>, <a href="http://en.wikipedia.org/wiki/Neighbor-joining">Neighbor Joining</a>, and <a href="http://en.wikipedia.org/wiki/Maximum_parsimony_(phylogenetics)">Maximum Parsimony</a>. And the consensus tree part includes another three: Strict, Majority-rule and Adams Consensus. So after this project, there will be two separate modules providing those algorithms in Biopython.Phylo package.</p>
<h2>Works for the Next Two Weeks</h2>
<p>The coding time will start on June 17. So during the next two weeks, I will read related source code in Biopython and trying to design two draft modules for both two parts.</p></description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Yanbo Ye</dc:creator><pubDate>Tue, 04 Jun 2013 21:13:00 +0800</pubDate><guid>tag:www.yeyanbo.com,2013-06-04:en/google-summer-of-code-1.html</guid><category>necent</category><category>GSOC</category></item><item><title>Switch to Pelican</title><link>http://www.yeyanbo.com/en/switch-to-pelican.html</link><description><p>As I need to post my working progress for the Google Summer of Code project, this github blog is a good place to share this information. My project is a Biopython project. Before start the real coding, it's better to play with some existing python code. So yesterday, I changed the blog generating system from <a href="http://octopress.org/">Octopress</a> to <a href="http://getpelican.com/">Pelican</a>. Octopress is based on Ruby and is widely used by github bloggers. But, as I know little about Ruby, it is complicated and hard to use for me. Pelican is pure Python. It really get much more easier. Actually I think Pelican is more easy to install and use than Octopress even for those who know little about both languages. The comment system--DISQUS has not been setup yet, as I forget my DISQUS account. Will fix it later.</p></description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Yanbo Ye</dc:creator><pubDate>Sat, 01 Jun 2013 21:00:00 +0800</pubDate><guid>tag:www.yeyanbo.com,2013-06-01:en/switch-to-pelican.html</guid><category>python</category><category>pelican</category></item><item><title>Tools I used for markdown writing</title><link>http://www.yeyanbo.com/en/tools-i-used-for-markdown-writing.html</link><description><h2>Editors</h2>
<h3>Sublime Text and plugins on Linux</h3>
<p><a href="http://www.sublimetext.com/" title="Sublime Text">Sublime Text</a> is an awesome text editor supporting almost all programming languages. It also support Windows and Mac including Linux. Some useful functions users love are <em>Goto Anything</em>, <em>Multiple Selections</em>, <em>Command Palette</em> and <em>Split Editing</em>, etc. While the most fantastic feature is its easy and elegant <a href="http://www.sublimetext.com/docs/2/api_reference.html" title="plugin API">plugin system</a> using JSON and python language and its open source <a href="http://wbond.net/sublime_packages/community" title="Sublime Text Plugin Community">plugin community</a>.</p>
<p>So it can support markdown with highlight and there are some plugins make markdown writing much easier.</p>
<p>List of markdown plugins:</p>
<ul>
<li><a href="https://github.com/revolunet/sublimetext-markdown-preview" title="Markdown Preview">Markdown Preview</a>: preview in browser.</li>
<li><a href="http://ogom.github.com/sublimetext-markdown-slideshow" title="Markdown Slideshow">Markdown Slideshow</a>: markdown slideshow to Html5 slideshow.</li>
<li><a href="http://johnmacfarlane.net/pandoc/" title="Pandoc">Pandoc</a>: Pandoc integration in Sublime.</li>
<li><a href="https://github.com/phyllisstein/Pandown" title="Pandown">Pandown</a>: Another Pandoc integration with more output support.</li>
<li><a href="https://github.com/larlequin/PandocAcademic" title="Pandoc Academic">Pandoc Academic</a>: Another Pandoc integration with bibliography support.</li>
</ul>
<h3>Editors on other platforms</h3>
<ul>
<li>Windows: <a href="http://markdownpad.com/" title="MarkdownPad">MarkdownPad</a></li>
<li>Chrome: <a href="https://chrome.google.com/webstore/detail/made/oknndfeeopgpibecfjljjfanledpbkog" title="MaDe Editor for markdown">MaDe</a></li>
<li>Android: <a href="https://play.google.com/store/apps/details?id=com.jamesmc.writer" title="Writer">Writer</a> with Dropbox syncing.</li>
</ul>
<h2>Pandoc</h2>
<p><a href="http://johnmacfarlane.net/pandoc/" title="Pandoc">Pandoc</a> is a general markup converter. It can convert from one markup format to another and to DOCX, PDF, EPUB document. The blow network shows the supported markup formats and conversions. Pandoc allow you to write document in markdown and release it to any other format.</p>
<p><img alt="PandocFormat" src="http://johnmacfarlane.net/pandoc/diagram.png" title="Pandoc Format" /></p>
<!-- links -->
<!-- images --></description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Yanbo Ye</dc:creator><pubDate>Sun, 03 Mar 2013 21:13:00 +0800</pubDate><guid>tag:www.yeyanbo.com,2013-03-03:en/tools-i-used-for-markdown-writing.html</guid><category>programing</category><category>markdown</category><category>Pandoc</category><category>Sublime Text</category></item><item><title>First Blog</title><link>http://www.yeyanbo.com/en/first-blog.html</link><description><h2>Start</h2>
<p>The first github blog post. Just a simple test.</p>
<h2>Installed octopress</h2>
<p>Installed octopress this afternoon. I thought it was a real dynamic blogging or CMS system just like wordpress. While I was wrong. But I must admit it's the perfect blogging system for programmers on github.</p>
<!-- more -->
<p>I'm not a ruby coder and I don't care about how it works. Just want use it to help me get familiar with github and markdown. </p>
<p>So let's start blogging on github with markdown. </p>
<h2>Test</h2>
<h3>Emphasize</h3>
<p>This is a <em>emphasize</em> <strong>test</strong>.</p>
<h3>List</h3>
<p>This is a list test.</p>
<ul>
<li>list1</li>
<li>list2</li>
<li>list3</li>
</ul>
<h3>Code block test</h3>
<p>{% include_code sample.py sample.py %}</p>
<h2>Works?</h2>
<p>Enough. It works?</p></description><dc:creator xmlns:dc="http://purl.org/dc/elements/1.1/">Yanbo Ye</dc:creator><pubDate>Sat, 02 Mar 2013 17:30:00 +0800</pubDate><guid>tag:www.yeyanbo.com,2013-03-02:en/first-blog.html</guid><category>octopress</category></item></channel></rss>