-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathplotlytabular.xml
217 lines (170 loc) · 9.23 KB
/
plotlytabular.xml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
<tool name="plotlytabular" id="plotlytabular" version="3.0">
<!--Source in git at: https://github.com/fubar2/galaxy_tf_overlay-->
<!--Created by [email protected] at 11/08/2023 17:40:15 using the Galaxy Tool Factory.-->
<description>Plotly plot generator for Galaxy tabular data.</description>
<requirements>
<requirement version="2.2.2" type="package">pandas</requirement>
<requirement version="5.22.0" type="package">plotly</requirement>
<requirement version="0.2.1" type="package">python-kaleido</requirement>
</requirements>
<stdio>
<exit_code range="1:" level="fatal"/>
</stdio>
<version_command><![CDATA[echo "3.0"]]></version_command>
<command><![CDATA[python
$runme
--input_tab
$input_tab
--htmlout
$htmlout
--xcol
'$xcol'
--ycol
'$ycol'
--colourcol
'$colourcol'
--hovercol
'$hovercol'
--title
'$title'
--header
'$header'
--image_type
'$outputimagetype']]></command>
<configfiles>
<configfile name="runme"><![CDATA[#raw
import argparse
import shutil
import sys
import plotly.express as px
import pandas as pd
# Ross Lazarus July 2023
# based on various plotly tutorials
MAXHTMLROWS = 5000 # empirically, browsers die at 10k so stop here for safety and give png
parser = argparse.ArgumentParser()
a = parser.add_argument
a('--input_tab',default='')
a('--header',default='')
a('--htmlout',default="test_run.html")
a('--xcol',default=3, type=int)
a('--ycol',default=2, type=int)
a('--colourcol',default=None, type=int)
a('--hovercol',default=None, type=int)
a('--title',default='Default plot title')
a('--image_type',default='short_html')
args = parser.parse_args()
isColour = False
isHover = False
if args.colourcol > 0:
isColour = True
if args.hovercol > 0:
isHover = True
df = pd.read_csv(args.input_tab, sep='\t')
MAXLEN=35
NCOLS = df.columns.size
NROWS = len(df.index)
defaultcols = ['col%d' % (x+1) for x in range(NCOLS)]
testcols = df.columns
if args.image_type in ['short_html', 'long_html']: # refuse to create browser crashing gob stopper html
if NROWS > MAXHTMLROWS:
sys.stderr.write('## CRITICAL USAGE ERROR (not a bug!): 5k+ rows (you supplied %d) in html breaks browsers, so please rerun the job using png format output.' % NROWS)
sys.exit(6)
if max(args.xcol-1, args.ycol-1) > NCOLS: # out of range
sys.stderr.write('## CRITICAL USAGE ERROR (not a bug!): xcol %d or ycol %d are greater than the number of columns found %d' %(args.xcol, args.ycol, NCOLS))
if isHover and isColour:
fig = px.scatter(df, x=df.columns[args.xcol-1], y=df.columns[args.ycol-1], color=df.columns[args.colourcol-1], hover_name=df.columns[args.hovercol-1])
elif isHover:
fig = px.scatter(df, x=df.columns[args.xcol-1], y=df.columns[args.ycol-1], hover_name=df.columns[args.hovercol-1])
elif isColour:
fig = px.scatter(df, x=df.columns[args.xcol-1], y=df.columns[args.ycol-1], color=df.columns[args.colourcol-1])
else:
fig = px.scatter(df, x=df.columns[args.xcol-1], y=df.columns[args.ycol-1], )
if args.title:
ftitle=dict(text=args.title, font=dict(size=50))
fig.update_layout(title=ftitle)
for scatter in fig.data:
scatter['x'] = [str(x)[:MAXLEN] + '..' if len(str(x)) > MAXLEN else x for x in scatter['x']]
scatter['y'] = [str(x)[:MAXLEN] + '..' if len(str(x)) > MAXLEN else x for x in scatter['y']]
if args.colourcol > 0:
sl = str(scatter['legendgroup'])
if len(sl) > MAXLEN:
scatter['legendgroup'] = sl[:MAXLEN]
if args.image_type == "short_html":
fig.write_html(args.htmlout, full_html=False, include_plotlyjs='cdn')
elif args.image_type == "long_html":
fig.write_html(args.htmlout)
elif args.image_type == "small_png":
ht = 768
wdth = 1024
fig.write_image('plotly.png', height=ht, width=wdth)
shutil.copyfile('plotly.png', args.htmlout)
else:
ht = 1200
wdth = 1920
fig.write_image('plotly.png', height=ht, width=wdth)
shutil.copyfile('plotly.png', args.htmlout)
#end raw]]></configfile>
</configfiles>
<inputs>
<param name="input_tab" type="data" optional="false" label="Tabular input file to plot. A header row is recommended." help="Must be tab delimited text. Binary excel will fail. If 5000+ rows, html output will fail, but png will work." format="tabular" multiple="false"/>
<param name="xcol" use_header_names="true" type="data_column" label="x axis for plot" help="Choose a column containing the x axis value" data_ref="input_tab"/>
<param name="ycol" use_header_names="true" type="data_column" label="y axis for plot" help="Choose a column containing the y axis value" data_ref="input_tab"/>
<param name="colourcol" use_header_names="true" type="data_column" label="colour values for each point" help="Choose a column containing the colour grouping class column" data_ref="input_tab"/>
<param name="hovercol" data_ref="input_tab" use_header_names="true" type="data_column" label="columname for hover string" help="A pop-up label can be used for html outputs when the mouse hovers near the point. Choose the column"/>
<param name="title" type="text" label="Title for the plot" help="Special characters will probably be escaped so do not use them"/>
<param name="header" type="text" value="" label="Use this comma delimited list of lables instead of the headers for this tabular file." help="Default is empyty, when headers will be used"/>
<param name="outputimagetype" type="select" label="Select the output format for this plot image. If over 5000 rows of data, HTML breaks browsers, so your job will fail. Use png only if more than 5k rows." help="Small and large png are not interactive but best for many (+10k) points. Stand-alone HTML includes 3MB of javascript. Short form HTML gets it the usual way so can be cut and paste into documents.">
<option value="short_html">Short HTML interactive format</option>
<option value="long_html">Long HTML for stand-alone viewing where network access to libraries is not available.</option>
<option value="large_png">Large (1920x1200) png image - not interactive so hover column ignored</option>
<option value="small_png">small (1024x768) png image - not interactive so hover column ignored</option>
</param>
</inputs>
<outputs>
<data name="htmlout" format="html" label="Plotlytabular $title on $input_tab.element_identifier" hidden="false">
<change_format>
<when input="outputimagetype" format="png" value="small_png"/>
<when input="outputimagetype" format="png" value="large_png"/>
</change_format>
</data>
</outputs>
<tests>
<test>
<output name="htmlout" value="htmlout_sample" compare="sim_size" delta="5000"/>
<param name="input_tab" value="input_tab_sample"/>
<param name="xcol" value="sepal_length"/>
<param name="ycol" value="sepal_width"/>
<param name="colourcol" value="petal_width"/>
<param name="hovercol" value="species"/>
<param name="title" value="Iris data"/>
<param name="header" value=""/>
<param name="outputimagetype" value="short_html"/>
</test>
</tests>
<help><![CDATA[
This is a generic version of the plotlyblast specific blastn Galaxy search output file plotter.
PNG images are not interactive but best for very large numbers of data points. Hover column will be ignored.
HTML interactive plots are best for a few thousand data points at most because
the hover information becomes uncontrollable with very dense points.
Using the shorter format HTML relies on internet access when viewed, and saves 3MB of javascript being embedded.
The long format is useful if potentially viewed offline.
.. class:: warningmark
Long strings in x and y tickmarks WILL BE TRUNCATED if they are too long - ".." is added to indicate truncation - otherwise some plots are squished.
.. class:: warningmark
Columns with very small scientific notation floats will need to be pre-scaled in a way that doesn't confuse plotly.express with their values.
----
This tool can plot an interactive scatter plot with a hover text column specified, that appears when hovering over each data point, to supply useful additional information.
It is only useful with a relatively small number of points when they can be distinguished. If many thousands, the density makes them relatively useless so use png output and
forget the hover text.
Column names are auto-generated as col1,...coln *unless* a comma separated list of column names is supplied as the header parameter, *or* pandas can
find the values supplied as parameters by the user in the first row of data. This sounds more complex than it is.
For example, using a Galaxy blastn output with 25 columns, the following comma delimited string supplied as the "header" parameter will match the names of each column.
qaccver,saccver,piden,length,mismatch,gapopen,qstart,qend,sstart,send,evalue,bitscore,sallseqid,score,nident,positive,gaps,ppos,qframe,sframe,qseq,sseq,qlen,slen,salltitles
When a header is supplied, the xcol and other column names must match one of those supplied column names.
So for example, xcol = "qaccver" for the blastn header example rather than xcol = "col1" when no header is supplied.
Relies on Plotly python code released under the MIT licence: https://github.com/plotly/plotly.py/blob/master/LICENSE.txt
]]></help>
<citations>
<citation type="doi">10.1093/bioinformatics/bts573</citation>
</citations>
</tool>