-
Notifications
You must be signed in to change notification settings - Fork 83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support multiple output formats #92
Comments
Hey @fsmunoz, I 'm sorry, I've been busy with other things here and though I have read through this and the previous conversions (linked above) from a few years ago, I'm still not really sure what kind of solution you're looking for for this. SASPy and SAS_Kernel are very different. SAS_Kernel is just a Jupyter extension, so it could have some Jupyter specific things, like you have in your example above, and I suppose it would have to. SASPy isn't a Jupyter anything. It doesn't know if it's running in Jupyter or what specific things Jupyter can do as opposed to any other environment (other than render html), including other notebooks or UI's that it also supports. It can run anywhere Python runs, so anything that is specific to Jupyter isn't really ideal. As for returning something other than HTML5, which is what's returned by way of using ODS, it's possible to return other ODS output, but how that would get rendered in any UI isn't something I have a clear understanding of. And, I don't know enough about ODS to be sure changing it to try to return other things would work for everything either. That would have to be investigated. I tried having it return a pdf and tried to get it to render, but I wasn't successful with that. The idea of having helper functions to convert HTML to other formats, using whatever packages that can work seems like it could be less pervasive to SASPy in general. As it would just be optional and dependent on having the right packages and the UI working with them. In the thread from a couple few years ago I think you mentioned even just documenting examples of doing this; the saspy_examples repo would be a good place for that! Especially if it's Jupyter specific things that don't necessarily work everywhere, then having example functions that do work in that environment would be good for anyone to use. Of course, if they work in other UI/s then that great too. Sorry, that's as far as I've been able to get on this. |
Hello @tomweber-sas , I'll try to be as to the point as possible, even because I think that my excessive contextualisation might have muddied the main points somewhat.
Now: Using SAS Kernel only returns
I am aware, but since SAS Kernel makes a call to SASpy to get the output, when discussing adding other output formats it can be argued that SASpy could implement that at that level:
This is why I mentioned SASpy. I am aware it is different, but it's a dependency of SAS Kernel. Implementing the second option would also solve problems where Jupyter isn't used (like the original discussion, using
My example above only works because I have implemented the code in my linked fork. It's not something that can be done completely at the user level. Without changes to the SAS Kernel, SAS output can't be used when the target output is not HTML.
The rendering isn't a problem that either SASpy or SAS Kernel should care much about. Just as returning HTML assumes that it can be displayed (a reasonable assumption for Jupyter default use-case), returning other MIME types should make the same assumption. In my example, Quarto (and the same happens for The current support for Zeppelin is also related to this: Zeppelin can't display the iPython HTML object, which is why it has an option. My issue is a more general version of this.
Agreed, but in general the rendering isn't a problem: if I ask for PDF output, then I am taking ownership of the ability to render that with whatever apporach I'm using. Having that possibility opens a lot of doors. Specifically, returning
See above for my take on the relative implications of each. I agree that it's less impactful, and would work for my use case (using Quarto, that runs SAS through the Jupyter kernel), although not for others (e.g. using
It's already in there: SASpy with R Markdown . Note that:
This user example would be possible without any specific programming if SASpy supported returning
|
Thanks for trying to clarify, I think I can try to do that as well. First, yes, I get that if saspy did the work, the sas_kernel would just get it since it's really just a jupyter extension wrappering the submit() method. It may need to be tweaked to know what to render, or rather how to render it. On my side, some of my confusion is what output you want. I see you're saying things like text/html, and text/plain, image/png, text/image, text/md and other image/* formats. As well as mention of pdf and markdown. SASPy doesn't create any output. The output is created by SAS, ODS specifically. ODS supports a number of different output types and it works with SAS to create output for whatever code is run. SASPy doesn't know what code you're submitting or what output may or may not be generated, for the submit methods. It knows the code it's generating for it's methods, but still the output is created by SAS/ODS and not saspy. So from my side, I could return what ODS supports, but it doesn't support the things you're saying, other than maybe pdf. Here's the doc for ODS saying what it can produce. HTML5 was the obvious choice for saspy since the different notebooks can actually render it, and the images are embedded in the html, so this works when connected to remote sessions. This is even good when not in a notebook (line mode python or batch script), as the HTML can be written to files which can then be rendered by anything that supports html; just a browser or any UI. Being able to know what output was created and how to access it is also part of this too; like for the analytic methods, the output isn't simply one html document. It produces many different plot/graphs,... that can be individually accessed. So, that's where my confusion comes in. saspy isn't in the business of creating or manipulating output to try to make it become various jupyter supported types. |
Indeed this is an important question, and I've mentioned different things in different places. The answer is not a closed one, but there are two approaches I think:
The iPython documentation indicates the following as "typical":
Some of these do not make sense (JSON, for example), others are different image types,
That's clear, but as it's currently done, it passes
This would be the ideal, ambitious scenario. I would even drop several of them immediately if it helps: Excel, Doc, PPT, RTF, EPUB, these do not make much sense for this - but PostScript makes a lot of sense since it can be used instead of images to produce PDFs. We end up being back at:
This would be an amazing improvement - even half of them! Especially because I'm looking at a specific scenario, but having this expanded ability would almost surely allow other usages in the future.
This makes complete sense and it's the way every other kernel is done. I wouldn't change it as a default, the way I see it nothing should change in terms of how it works today for the Jupyter usage scenario.
Yes, and I have investigated ways of working with saved files, but it does require changes to the tools I'm using as examples because some output formats are not easily produced with a 100% HTML source format.
I haven't thought about this, tbh. This is a good point. I know that when using StatRep I explicitly select the output object that was produced in the previous step, but I haven't found a problem with my initial approach of getting an image out of the HTML, but that's because I have assumed I'm getting a single image inside the HTML. |
You can easily try any of the ODS outputs and see if they get what you want. When the results='HTML, I submit ODS statements around the code being submitted. If results='TEXT' then I don't. So you can code any ODS you want to see what you get by just doing the following. I tried it with PDF and couldn't get the PDF rendered, but again, I don't know anything about these formats w/ jupyter or packages that are needed to be able to process and render any of that.
searching for how to display this I found and tried
but that didn't work, and I don't know why and didn't have enough time to try to get further on it. If any of these ODS outputs work as expected for the various cases, it wouldn't be hard to add support. But if they don't just work as needed and have to be manipulated to produce something that works, that's a different story. Give ODS a try and see if you can get the formats you need, |
Is your feature request related to a problem? Please describe.
Jupyter notebooks support multiple output formats for cell execution; depending on what's desired, one or more different entries in the
data
section, identified by MIME type, will contain the output in the corresponding format.This is useful in general, but becomes critical when using any document production system that uses Jupyter kernels as the engine to create different types of documents (i.e. most of the solutions that fall within the "literate programming" approach): they will pick HTML if the desired output is HTML, and pick e.g. a PNG representation of a plot if the output is a PDF.
I will use Quarto as an example, but the idea is generic and applicable to other solutions. Consider the following source document:
This works without any change for HTML output, because
sas_kernel
usesHTML(output)
, which automatically creates atext/html
entry in theoutputs
array, and the HTML target makes use of HTML.This doesn't work if we specify PDF as the output, because the toolchain (in the case of Quarto, using pandoc and LaTeX) will have no way to render the HTML, and there is no alternative representation:
The table works because Quarto has some automation that parses HTML tables and converts them to LaTeX, but it isn't able to convert the HTML plot into something that can be included in the PDF.
Quarto here is, and I must stress this, just an example: in general, the ability to have more outputs in the MIME bundle of the Jupyter cell outputs will be usable by any other tool.
Describe the solution you'd like
The solution I would like is different from the one I have prototyped: the one I think would likely be better would be to change things at the SASpy level to make use of the extremely rich capabilities of ODS, allowing specifying other output formats at that level, which would then be used in
sas_kernel
.That said, I've quickly made something to show how this could work by making changes solely on
sas_kernel
, after studying the code and the use of MetaKernel: MetaKernel has some plumbing in place at the_formatter
method to go through methods of an object and create the necessary outputs. I've created aSASOutput
class that implements_repr_png_
and_repr_latex
.(consider this code a MVP and not something that I am proposing as a PR, this is to illustrate the possibilities more than anything)
This assumes that the input is HTML, which seems to always be the case in SASpy. With this, the previous PDF example works, because the
sas.ipynb
that is created by Quarto contains atext/png
with the plot (it would also output it for "regular" Jupyter notebook usage, but Jupyter would prefer the HTML version).The
.ipynb
will contain the different formats, so the existing behaviour (text/html
) would be unchanged:Describe alternatives you've considered
As mentioned, this approach makes use of how SASpy currently works, which seems to hardcode
ods html5
for non-text output (I could be completely wrong here, I'm basing my assertion from this documentation). The LaTeX parser above cuts the return LaTeX that is present in the middle of a lot of HTML code, for example. Making it possible to specify the desired output format through SASpy would likely be better, given some way of specifying the desired format. Currently, using things likeods latex
in the cells will give the expected output in the middle of HTML code, but I haven't tested this extensively.Ideally, we would also be able to pass additional formatting options down the line: things like the plot title, the image width, they are generally implemented in a consistent way in this sort of tools so that the same syntax can be used. For Quarto, execution options support things like
fig-width
that are applied to R, Python, and Julia - this must be supported in Quarto itself, but it requires a way to pass that information.Additional context
This is something that I've been using/following for a while and has several previous references, and there seems to be a growing interest.
The text was updated successfully, but these errors were encountered: