It seems like that dude dug out one of his time-wasting projects and put some more work into it. This time in a public repo even. And for some reason he wants to let the world know, how and what he is doing. The journey starts here.
So, I am back at writing my Low Overhead Virtual Embedded Machine. From scratch. Everything I had was dumped (have the code still, somewhere, but it is okay to start anew - I learned during my previous attempts).
Why am I doing this? Well, that has a history. Basically, I am writing firmware at work for our IIoT devices. They are pretty versatile, so configuring them tends to be rather complicated. And still, I want them to be able to do more: react to situations depending on sensor data, prepare data read from sensors, so that it transmitted with less overhead, etc. Right now, that would mean writing custom firmware for those customer cases (in C) and deploy it for their devices - and maintain those firmwares over years. And no one wants to pay for that! Nor do I care to do the maintaining.
What's the alternative? Add those features to the standard firmware and add more configuration features. Great. So it will be even more complicated. And every second time you have a new use case, you will find your current solution insufficient, so you need to modify your firmware again to include that one more special case. And make your config more powerful (please keep it backwards compatible, while you at it, thank you very much - remember, there are thousands of devices out there, that still need to work with their configuration, when the firmware update hits them).
And your config? You want to be able to react to triggers, and you want to do react in any random way. And you want to be able to manipulate your data points in any way needed. So when you walk that road for some time, you will end up with a configuration that is basically a programming language, since that is the only thing powerful enough, to do all that. And it will be a badly grown one, you can be sure about that!
So let's embrace that consequence, and simply start with using a scripting language as means for advanced configuration! We will end there, cut some corners on the journey!
We are in need of a scripting language that runs on a very constrained device. Think of a microcontroller that has 352 kiB flash space and 191 kiB RAM for our complete firmware. And keep in mind that most of the behaviour of our device will not be implemented in the scripting language. There will be a number of hooks that should give control to the user supplied script, which will execute for a very short time, collect some data from sensors, act on them (maybe control actuators, but mostly generate data to be uploaded), and then return control to the firmware. And yeah, we will need to store the "script" somewhere on that device, so it would be great if it was not multiple kiB of program. I could use an SD-card in the dive (so I guess could store 1 TiB of script on the device if I needed), but those are not that reliable and are an optional extension that could already have a different use.
There are many wheels out there, why o why do you want to invent it again? Well, are there, though? Because that is what I thought. I started looking at what I know. Soo...
Lua is a tested scripting language to use in host languages like C. I first experimented with it when I was trying to write games for fun, back in the early 2000s. I was somewhat intrigued when I came upon it again some 10 years later while playing heavily modded Minecraft. In ComputerCraft you have a block that is a computer, which you can write programs for in Lua. It even has a little operating system where you store your files (edit them, horribly, in an editor on an in-game monitor), execute programs that you can store on in-game floppies to carry around. It was a horrible kind of fun to do just anything inside that world.
Lua was invented to solve a similar sounding problem: scripting in computer games. Level designers, story writers, etc. should not be bothered with having to write C-code to achieve their tasks (and re-compiling during developing those is not the way). So yeah, that is, more or less, my problem. And you can even compile Lua to byte code which is run in the interpreter. Neado!
But, oh, the interpreter... turn's out, it is quite big! At least when you are working with embedded hardware. To quote lua users:
Smaller footprint than Python. e.g. Look at the size of python22.dll, 824kb. A basic Lua engine, including parser/compiler/interpreter, but excluding standard libraries, weighs in at under 100kb.
That's fine and all, but still a bit much for me - to be fair, I would need neither parser nor compiler. Other sources give numbers like <300 kB - which is overkill. I did compile it for our architecture - and the VM alone, without any of our own code doing stuff, exceeded the flash size I had. This stackoverflow question quotes the eLua FAQ to recommend 256 kB flash and 64k kB RAM which is too much for me - at time of writing this, eLua documentation seems offline in parts, so that does not give me confidence either. Quote from an answer to that question:
I would recommend LUA (or eLUA http://www.eluaproject.net/ ). I've "ported" LUA to a Cortex-M3 a while back. From the top of my head it had a flash size of 60~100KB and needed about 20KB RAM to run. I did strip down to the bare essentials, but depending on your application, that might be enough. There's still room for optimization, especially about RAM requirements, but I doubt you can run it comfortable in 8KB.
Back then I found a post I cannot find again that claimed, you can get the footprint of the Java VM smaller than that of the Lua VM (if you cut standard lib, which is part of Java and not of its VM). That sounds possible to me, when you have a glimpse on how those languages work. But then again you would not have any of the parts you are used to in Java. Also, there are some thoughts on how fitting that language is for my case, I'll have something about that later on.
So... to the JVM then? To be honest: I do not want to go there. It does not feel right! JVM does not mean Java, I know that. I could use the VM and create my own language that compiles to this highly optimised VM. I could use any of those many languages that already compile to Java bytecode. And yes, JVM does not equal Oracle; there are free open JVM implementations out there. I admit I did not try to find out how small that VM would be. But it just feels so wrong on so many levels. I simply cannot imagine JVM is the tool for the task. As I teasered for Lua before, more thoughts on this later.
I did not even try to find a solution for running JavaScript on the device. I am sure there are some. But so there are reasons against using this language. Once again, more on that later, when I reflect more on my use-case.
I do like Python. But it is pretty big. There are some broken projects like tinypy. That looks dead. And there is, of course MicroPython.
MicroPython is packed full of advanced features such as an interactive prompt, arbitrary precision integers, closures, list comprehension, generators, exception handling and more. Yet it is compact enough to fit and run within just 256k of code space and 16k of RAM.
That 256k is a pretty big "just" for my liking. It is meant for the pyboard, having an STM with 1024 KiB flash ROM and 192 KiB RAM. And that device will not have a main firmware "next to it". So again, not really my use-case.
I googled. I looked at quite a few of them. It never feels close to what I want. First of all I found, that "embedded scripting" is a term that most of the time is not meant as in "embedded device". That's because the scripting language itself is what is embedded in the host language (be it C, Java, Rust, or whatever). Lua is a prime example on that terminology problem. So what I am really looking for is an "embedded embedded scripting language". Good luck on googling that!
There are projects that try to be what I am looking for. Few such projects seem to be in a state that I would by willing to use them in a commercial product. Think long term maintainability here.
And, again, they often do not aim at my problem very well. They want some ease of usage, which is fine, but they tend to have a too-high-level approach for my linking. Yes, I will start to talk about what I mean, soon.
Maybe I should have taken a closer look at languages like Neko. But the first impression was hinting at many of the problems I try to describe here.
No language was sticking out. I did not spend much time on any other language.
So, languages are never a good fit on what I want. They are hard to integrate in my existing system. They are too big. They are often not well maintained.
Is this already the end of my journey? It does not have to be. But it will be a very different journey, if I proceed.
This is why every existing scripting language is objectively bad! Sorry, I wanted to say: This is my problem and languages do not seem to be designed for it.
I was mentioning it, was I not? Languages do not seem to fit very well on by problem. What do I mean by that?
I am doing very low level stuff. I am pushing bytes, often even bits around. Imagine receiving a bunch of raw bytes from a sensor attached via UART. You dump them in a buffer. The task for the script is now, to parse a few specific bytes out of that buffer, and make sense of them. Some are uint16 integers in little endian. Others are int32, spread over two uint16 BE registers, that are not next to each other, and you need to combine the two uint16 BE values in LE order to get your value. This scenario is fictional, but much more likely, than you would expect.
All this sound horrible, and it is sometimes tricky, but of course you can do all this in any language that gives you access to bytes in any way. If you ever worked with LoRaWAN, you might have had to do such things in your network server (e.g. TTN), to parse your uploaded data from bytes into, say, JSON. On many network servers you can do so with your own scripts (hey, that's close to what I want to do). And they give you the language suited best for this kind of problems: JavaScript.
No, really. You are doing bit-manipulation on your bytes in a language where every number is stored as a float. You push your data around in JSON, a format that does not support byte arrays, so you have to communicate your bytes encoded in base64 or hex and store those inside strings. And you hope that the receiving end is able to decide if the date should be interpreted as a string or as hex or as base64 (and for hex strings, all of that can be possible at the same time).
That is a problem, that I have with most scripting languages that I encountered. You get a giant infrastructure supporting classes with multiple inheritance support and polymorphism. You get on-the-go code interpreting. You get asynchronous execution support, dynamical typing, garbage collection, and whatnot.
And I want to write a function, that is called when needed, and gets handed a few bytes. I want it to extract a few of those bytes, interpret them as a number, compare that number to a threshold, and if the value exceeds said threshold, call a different function with a few bytes, that are then send back over some peripheral (but that is not for the script language to control, just pass them to the system).
Those languages tend to have a huge set of features that I do not need (or even to not want to have), while lacking many features that would be useful to me. So all that features would have to be implemented by me somehow, anyway.
You see now, why I cannot find any language that I like?
Okay, okay. Let's say you bought my argumentation. Go ahead, hack together some scripting, knock yourself out. Just parse it in your firmware and execute it.
Yeah, I could do that. Simple syntax. Parse it on the fly. Store variables in some hashmap, execute functions by name, have them call functions supplied by the firmware to interact with the hardware. And you can just throw those scripts into your git repo. Easy peasy. Only it wouldn't. But that language would grow oh ever so horribly. And it would never be good. Ever tried to parse an expression like f = 3 * a + (b + 2) * 1.2. In C? And that expression is not too complex even. There would be so many parsing errors that only happen at runtime (on the device, remote, in the field, without any logging, let alone debugging). Besides: I do not want the complicated (read: big) parsing code on all of my devices. That is space (and execution time, which translates to power usage) that could be done once, on a more powerful device that I can monitor directly (that is: my laptop). Also: source code is long! I will need to store that on my device somewhere. And trying to write source code extra short makes it even worse.
So what is the solution here? We need a virtual machine that executes programs precompiled into bytecode. And we want that VM to be lightweight. If you design it carefully, a VM can be pretty small. What bloats things up often is the standard library with all the tools you need to efficiently write programs. But I do have a mighty host language (C, mostly), that already has a huge library of functions ready to be used (and which are often used already and henceforth already inside my firmware). I only need to provide a wrapper, that exposes them to my VM, and I can have them all: sinus/cosinus, logarithms, AES-encryption, Ethernet. You name it, we got it (well, most of it... be sensible.. we should at least be able to find an implementation somewhere...).
And the best part? I postpone the pain of having to design the language. If you have a solid VM that supports the operations you need to get your work done nicely, you can pretty much design a language any way you want. You just need a bytecode compiler. You can even have multiple languages, in case you have too much time on your hands. But more important: you can develop the language without needing to change your VM (if you know what you do and if you plan well enough). That means: no need to update the firmware on your devices everytime the language advances. As long as your bytecode stays compatible.
Is it realistic to finish this project, maybe even, to build something good?
I highly doubt it. This is a huge project, if I make it all I want it to be. But at least I have learned quite a lot on the way so far. Why do you think I threw everything away (for the second time) and started on an empty board?
\ No newline at end of file
diff --git a/2022-06/NAV.html b/2022-06/NAV.html
new file mode 100644
index 0000000..d140e30
--- /dev/null
+++ b/2022-06/NAV.html
@@ -0,0 +1 @@
+ NAV - Lovem
\ No newline at end of file
diff --git a/2022-06/index.html b/2022-06/index.html
new file mode 100644
index 0000000..fb6922a
--- /dev/null
+++ b/2022-06/index.html
@@ -0,0 +1 @@
+ Journal entries from June 2022 - Lovem
Okay, okay. Let's say you bought my argumentation. Go ahead, hack together some scripting, knock yourself out. Just parse it in your firmware and execute it.
This is why every existing scripting language is objectively bad! Sorry, I wanted to say: This is my problem and languages do not seem to be designed for it.
I was mentioning it, was I not? Languages do not seem to fit very well on by problem. What do I mean by that?
There are many wheels out there, why o why do you want to invent it again? Well, are there, though? Because that is what I thought. I started looking at what I know. Soo...
[Lua][lua] is a tested scripting language to use in host languages like C. I first experimented with it when I was trying to write games for fun, back in the early 2000s. I was somewhat intrigued when I came upon it again some 10 years later while playing heavily modded Minecraft. In [ComputerCraft][computercraft] you have a block that is a computer, which you can write programs for in Lua. It even has a little operating system where you store your files (edit them, horribly, in an editor on an in-game monitor), execute programs that you can store on in-game floppies to carry around. It was a horrible kind of fun to do just anything inside that world.
It seems like that dude dug out one of his time-wasting projects and put some more work into it. This time in a public repo even. And for some reason he wants to let the world know, how and what he is doing. The journey starts here.
So, I am back at writing my Low Overhead Virtual Embedded Machine. From scratch. Everything I had was dumped (have the code still, somewhere, but it is okay to start anew - I learned during my previous attempts).
\ No newline at end of file
diff --git a/2022-06/lovem-again.html b/2022-06/lovem-again.html
new file mode 100644
index 0000000..0579003
--- /dev/null
+++ b/2022-06/lovem-again.html
@@ -0,0 +1,2 @@
+ Lovem again! - Lovem
It seems like that dude dug out one of his time-wasting projects and put some more work into it. This time in a public repo even. And for some reason he wants to let the world know, how and what he is doing. The journey starts here.
So, I am back at writing my Low Overhead Virtual Embedded Machine. From scratch. Everything I had was dumped (have the code still, somewhere, but it is okay to start anew - I learned during my previous attempts).
Why am I doing this? Well, that has a history. Basically, I am writing firmware at work for our IIoT devices. They are pretty versatile, so configuring them tends to be rather complicated. And still, I want them to be able to do more: react to situations depending on sensor data, prepare data read from sensors, so that it transmitted with less overhead, etc. Right now, that would mean writing custom firmware for those customer cases (in C) and deploy it for their devices - and maintain those firmwares over years. And no one wants to pay for that! Nor do I care to do the maintaining.
What's the alternative? Add those features to the standard firmware and add more configuration features. Great. So it will be even more complicated. And every second time you have a new use case, you will find your current solution insufficient, so you need to modify your firmware again to include that one more special case. And make your config more powerful (please keep it backwards compatible, while you at it, thank you very much - remember, there are thousands of devices out there, that still need to work with their configuration, when the firmware update hits them).
And your config? You want to be able to react to triggers, and you want to do react in any random way. And you want to be able to manipulate your data points in any way needed. So when you walk that road for some time, you will end up with a configuration that is basically a programming language, since that is the only thing powerful enough, to do all that. And it will be a badly grown one, you can be sure about that!
So let's embrace that consequence, and simply start with using a scripting language as means for advanced configuration! We will end there, cut some corners on the journey!
We are in need of a scripting language that runs on a very constrained device. Think of a microcontroller that has 352 kiB flash space and 191 kiB RAM for our complete firmware. And keep in mind that most of the behaviour of our device will not be implemented in the scripting language. There will be a number of hooks that should give control to the user supplied script, which will execute for a very short time, collect some data from sensors, act on them (maybe control actuators, but mostly generate data to be uploaded), and then return control to the firmware. And yeah, we will need to store the "script" somewhere on that device, so it would be great if it was not multiple kiB of program. I could use an SD-card in the dive (so I guess could store 1 TiB of script on the device if I needed), but those are not that reliable and are an optional extension that could already have a different use.
\ No newline at end of file
diff --git a/2022-06/script-or-virtual.html b/2022-06/script-or-virtual.html
new file mode 100644
index 0000000..1ed09cd
--- /dev/null
+++ b/2022-06/script-or-virtual.html
@@ -0,0 +1,2 @@
+ Script or virtual - Lovem
Okay, okay. Let's say you bought my argumentation. Go ahead, hack together some scripting, knock yourself out. Just parse it in your firmware and execute it.
Yeah, I could do that. Simple syntax. Parse it on the fly. Store variables in some hashmap, execute functions by name, have them call functions supplied by the firmware to interact with the hardware. And you can just throw those scripts into your git repo. Easy peasy. Only it wouldn't. But that language would grow oh ever so horribly. And it would never be good. Ever tried to parse an expression like f = 3 * a + (b + 2) * 1.2. In C? And that expression is not too complex even. There would be so many parsing errors that only happen at runtime (on the device, remote, in the field, without any logging, let alone debugging). Besides: I do not want the complicated (read: big) parsing code on all of my devices. That is space (and execution time, which translates to power usage) that could be done once, on a more powerful device that I can monitor directly (that is: my laptop). Also: source code is long! I will need to store that on my device somewhere. And trying to write source code extra short makes it even worse.
So what is the solution here? We need a virtual machine that executes programs precompiled into bytecode. And we want that VM to be lightweight. If you design it carefully, a VM can be pretty small. What bloats things up often is the standard library with all the tools you need to efficiently write programs. But I do have a mighty host language (C, mostly), that already has a huge library of functions ready to be used (and which are often used already and henceforth already inside my firmware). I only need to provide a wrapper, that exposes them to my VM, and I can have them all: sinus/cosinus, logarithms, AES-encryption, Ethernet. You name it, we got it (well, most of it... be sensible.. we should at least be able to find an implementation somewhere...).
And the best part? I postpone the pain of having to design the language. If you have a solid VM that supports the operations you need to get your work done nicely, you can pretty much design a language any way you want. You just need a bytecode compiler. You can even have multiple languages, in case you have too much time on your hands. But more important: you can develop the language without needing to change your VM (if you know what you do and if you plan well enough). That means: no need to update the firmware on your devices everytime the language advances. As long as your bytecode stays compatible.
Is it realistic to finish this project, maybe even, to build something good?
I highly doubt it. This is a huge project, if I make it all I want it to be. But at least I have learned quite a lot on the way so far. Why do you think I threw everything away (for the second time) and started on an empty board?
\ No newline at end of file
diff --git a/2022-06/that-use-case.html b/2022-06/that-use-case.html
new file mode 100644
index 0000000..c54aad0
--- /dev/null
+++ b/2022-06/that-use-case.html
@@ -0,0 +1,2 @@
+ That use-case I was talking about - Lovem
This is why every existing scripting language is objectively bad! Sorry, I wanted to say: This is my problem and languages do not seem to be designed for it.
I was mentioning it, was I not? Languages do not seem to fit very well on by problem. What do I mean by that?
I am doing very low level stuff. I am pushing bytes, often even bits around. Imagine receiving a bunch of raw bytes from a sensor attached via UART. You dump them in a buffer. The task for the script is now, to parse a few specific bytes out of that buffer, and make sense of them. Some are uint16 integers in little endian. Others are int32, spread over two uint16 BE registers, that are not next to each other, and you need to combine the two uint16 BE values in LE order to get your value. This scenario is fictional, but much more likely, than you would expect.
All this sound horrible, and it is sometimes tricky, but of course you can do all this in any language that gives you access to bytes in any way. If you ever worked with LoRaWAN, you might have had to do such things in your network server (e.g. TTN), to parse your uploaded data from bytes into, say, JSON. On many network servers you can do so with your own scripts (hey, that's close to what I want to do). And they give you the language suited best for this kind of problems: JavaScript.
No, really. You are doing bit-manipulation on your bytes in a language where every number is stored as a float. You push your data around in JSON, a format that does not support byte arrays, so you have to communicate your bytes encoded in base64 or hex and store those inside strings. And you hope that the receiving end is able to decide if the date should be interpreted as a string or as hex or as base64 (and for hex strings, all of that can be possible at the same time).
That is a problem, that I have with most scripting languages that I encountered. You get a giant infrastructure supporting classes with multiple inheritance support and polymorphism. You get on-the-go code interpreting. You get asynchronous execution support, dynamical typing, garbage collection, and whatnot.
And I want to write a function, that is called when needed, and gets handed a few bytes. I want it to extract a few of those bytes, interpret them as a number, compare that number to a threshold, and if the value exceeds said threshold, call a different function with a few bytes, that are then send back over some peripheral (but that is not for the script language to control, just pass them to the system).
Those languages tend to have a huge set of features that I do not need (or even to not want to have), while lacking many features that would be useful to me. So all that features would have to be implemented by me somehow, anyway.
You see now, why I cannot find any language that I like?
\ No newline at end of file
diff --git a/2022-06/we-need-another-wheel.html b/2022-06/we-need-another-wheel.html
new file mode 100644
index 0000000..5da8ca3
--- /dev/null
+++ b/2022-06/we-need-another-wheel.html
@@ -0,0 +1,2 @@
+ We need another wheel - Lovem
There are many wheels out there, why o why do you want to invent it again? Well, are there, though? Because that is what I thought. I started looking at what I know. Soo...
Lua is a tested scripting language to use in host languages like C. I first experimented with it when I was trying to write games for fun, back in the early 2000s. I was somewhat intrigued when I came upon it again some 10 years later while playing heavily modded Minecraft. In ComputerCraft you have a block that is a computer, which you can write programs for in Lua. It even has a little operating system where you store your files (edit them, horribly, in an editor on an in-game monitor), execute programs that you can store on in-game floppies to carry around. It was a horrible kind of fun to do just anything inside that world.
Lua was invented to solve a similar sounding problem: scripting in computer games. Level designers, story writers, etc. should not be bothered with having to write C-code to achieve their tasks (and re-compiling during developing those is not the way). So yeah, that is, more or less, my problem. And you can even compile Lua to byte code which is run in the interpreter. Neado!
But, oh, the interpreter... turn's out, it is quite big! At least when you are working with embedded hardware. To quote lua users:
Smaller footprint than Python. e.g. Look at the size of python22.dll, 824kb. A basic Lua engine, including parser/compiler/interpreter, but excluding standard libraries, weighs in at under 100kb.
That's fine and all, but still a bit much for me - to be fair, I would need neither parser nor compiler. Other sources give numbers like <300 kB - which is overkill. I did compile it for our architecture - and the VM alone, without any of our own code doing stuff, exceeded the flash size I had. This stackoverflow question quotes the eLua FAQ to recommend 256 kB flash and 64k kB RAM which is too much for me - at time of writing this, eLua documentation seems offline in parts, so that does not give me confidence either. Quote from an answer to that question:
I would recommend LUA (or eLUA http://www.eluaproject.net/ ). I've "ported" LUA to a Cortex-M3 a while back. From the top of my head it had a flash size of 60~100KB and needed about 20KB RAM to run. I did strip down to the bare essentials, but depending on your application, that might be enough. There's still room for optimization, especially about RAM requirements, but I doubt you can run it comfortable in 8KB.
Back then I found a post I cannot find again that claimed, you can get the footprint of the Java VM smaller than that of the Lua VM (if you cut standard lib, which is part of Java and not of its VM). That sounds possible to me, when you have a glimpse on how those languages work. But then again you would not have any of the parts you are used to in Java. Also, there are some thoughts on how fitting that language is for my case, I'll have something about that later on.
So... to the JVM then? To be honest: I do not want to go there. It does not feel right! JVM does not mean Java, I know that. I could use the VM and create my own language that compiles to this highly optimised VM. I could use any of those many languages that already compile to Java bytecode. And yes, JVM does not equal Oracle; there are free open JVM implementations out there. I admit I did not try to find out how small that VM would be. But it just feels so wrong on so many levels. I simply cannot imagine JVM is the tool for the task. As I teasered for Lua before, more thoughts on this later.
I did not even try to find a solution for running JavaScript on the device. I am sure there are some. But so there are reasons against using this language. Once again, more on that later, when I reflect more on my use-case.
I do like Python. But it is pretty big. There are some broken projects like tinypy. That looks dead. And there is, of course MicroPython.
MicroPython is packed full of advanced features such as an interactive prompt, arbitrary precision integers, closures, list comprehension, generators, exception handling and more. Yet it is compact enough to fit and run within just 256k of code space and 16k of RAM.
That 256k is a pretty big "just" for my liking. It is meant for the pyboard, having an STM with 1024 KiB flash ROM and 192 KiB RAM. And that device will not have a main firmware "next to it". So again, not really my use-case.
I googled. I looked at quite a few of them. It never feels close to what I want. First of all I found, that "embedded scripting" is a term that most of the time is not meant as in "embedded device". That's because the scripting language itself is what is embedded in the host language (be it C, Java, Rust, or whatever). Lua is a prime example on that terminology problem. So what I am really looking for is an "embedded embedded scripting language". Good luck on googling that!
There are projects that try to be what I am looking for. Few such projects seem to be in a state that I would by willing to use them in a commercial product. Think long term maintainability here.
And, again, they often do not aim at my problem very well. They want some ease of usage, which is fine, but they tend to have a too-high-level approach for my linking. Yes, I will start to talk about what I mean, soon.
Maybe I should have taken a closer look at languages like Neko. But the first impression was hinting at many of the problems I try to describe here.
No language was sticking out. I did not spend much time on any other language.
So, languages are never a good fit on what I want. They are hard to integrate in my existing system. They are too big. They are often not well maintained.
Is this already the end of my journey? It does not have to be. But it will be a very different journey, if I proceed.
\ No newline at end of file
diff --git a/2022-07/ALL.html b/2022-07/ALL.html
new file mode 100644
index 0000000..d0c89c0
--- /dev/null
+++ b/2022-07/ALL.html
@@ -0,0 +1,822 @@
+ July 2022 complete - Lovem
Since I am always focused on my work on lovem, I will never get sidetracked. Unrelated: I spent a few days on reworking the journal on this site.
So, no update on the core project today, sorry. I was very unhappy with my first solution, on how the Journal entries where created. Way too much to do by hand – that is not what I learned programming for. But mkdocs is python, and python I can do. So did. And now I can write my Journal entries (like this one) as plain Markdown files with very few metadata entries. And I get entries in the navigation and pages listing the whole month. I even included a whole month in single page version of the journal. I feel it is quite fancy. I will need to do a bit of work on the static content of the site, but one step at a time.
I want to write my Journal entries (aka blog posts) as a nice standalone markdown file, one file per entry. I will need to add include a bit of metadata, at least the release date/time. And I want the entries to look fancy without adding the fanciness to each file. Maybe I will be changing the layout later, hmm? And create those teaser pages for me, thank you very much.
I use a plugin called mkdocs-gen-files, by @oprypin, that creates additional mkdocs source files on the fly. It does not really put the files on disk, but they are parsed by mkdocs, as if they were in the docs directory.
I have a directory journal next to my docs directory, where I put all my posts in a single markdown file each. My script walks through that directory, and processes each file. The content is modified a bit (to put in the card with the author's name and other metadata), and then put in a virtual file inside docs, so that the pages with the entries are created by mkdocs, as if I hat them inside docs.
The script also generates two pages for each month: one that shows that month's posts as teasers, with a "continue reading" link, and a second one that shows all posts from a month on a single page, so that you can read them without changing pages all the time.
The remaining part is adding all the pages, that the script creates, to the navigation in a way that makes sense. The order is a critical part, being a central aspect of a journal or a log. For that I use another plugin by @oprypin: mkdocs-literate-nav. With it, you can control your navigation (completely or in parts) by adding markdown source files with lists of links. This goes together well with the gen-files plugin, because I can just create that navigation files with it in my script.
The plugins are a bit light on the documentation side. It took me a while to understand, that you cannot do multiple layers of nested navigation in those files. That is not a problem, because you can always just add another nesting layer by adding more of those nav files as children. Also, what you can do in those files is very limited. I wanted to do some fancy things in the navigation (adding a second link in a single line with alternative representation). I would guess that those limitations come from the ways mkdocs itself handles the navigation, so that is okay. But a word on that would have been nice. And the error messages popping up did not help at all, because the actual error happens way later in the process inside mkdocs itself and is some weird side effect problem.
If you want to take a look, see blogem.py. That will be the script in its current state. For the version of the script at the time of writing, see the permalink, the original blogem.py.
Reality strikes again, and code will be written from scratch once more. And the reason is this site.
You want me to get to the code. And I really should. I have written so much already, and I want to show it, but there is so much around it. And after I had written up a long text on how I started, I realised that I had no commits during the early state. So I had to write it all again, slower, and with code to be presentable in this journal.
If you are reading this live (and no-one is, because I did not even tell anyone I am doing this), you can of course look at the code I was writing earlier, it exists. I put it in a branch too-early. But I will not give explanations to that. I am rewriting it on the master branch, and that will be showed and discussed in the journal. I advise you to wait for that.
Yes, it will take a while. As it looks now, it will be slow. But I have written some new posts on the new code already, and I think it is worth it. There will be more background before we get there. Next entry will be a longer one, so there is that.
So, how do you build a Virtual Machine. There are actually two quite different approaches:
Register Machine vs. Stack Machine
Let's take a look at those concepts first. This will be very brief and basic. You can, of course, also have some combination of those concepts, and not everything I say here is true for every implementation of virtual machine, but it will be close enough for this article.
Most physical computers are register machines. At least those you will be thinking of. You are most likely using one right now to read this article. Virtual register machines use the same concepts, but not in physical hardware, instead inside another computer as software. This allows them to do some things a bit more flexible than a real hardware machine would.
A register is nothing more than a dedicated place to store a portion of data where it can be accessed for direct manipulation. They are more or less a variable of the machine's basic data type that have a fixed address, and that can be accessed and manipulated directly by the processing unit. Register machines use those to actually compute and change data. All other storage places are only that: places where data is put when it is not needed at the moment. Register machines have a multitude of registers, from a very few (maybe 4 or 8 in simplistic designs) to hundreds or more in modern computers. The size of the registers often gives the architecture its name. E.g. in the x86-x64 architecture, that most current CPUs by Intel and AMD are of, a register is 64 bits long.
The instructions for a register machine are encoded in code words. A code word is a bunch of bytes that tell the machine what to do in the next program step. For simple designs, code words are of a fixed length. This code word length is often longer than the register size. So a 16 bit architecture could have 32 bit instructions. The reason for this is, that instructions consist of an operation code that defines what operation should be executed in the next step, but they also contain the arguments passed to that operation. Because the number and size of arguments needed for an operation differ for different operations, decoding the instruction can be quite complicated. When you put multiple instructions together, you end up with a program. This representation of a computer program is called machine code. For a virtual machine it is also called bytecode, although I think this term fits better for stack machines (more on that later).
If you want to understand what I tried to describe here, read this really short article: Creating a Virtual Machine/Register VM in C. It builds a simplistic register VM in C (the whole thing is 87 lines long). It demonstrates the principles used in a register machine (fetch, decode, execute), and shows you what a register is and how it is used. You will understand, how machine code is decoded and executed. The article only uses 16 bit code words and 16 bit data words (register size). If you know C, you should be able to understand what I am talking about in about an hour of reading and coding. If you ever wanted to understand how a computer works on the inside, this might be a nice place to start, before you read about an actual physical computer.
A register machine normally has multiple stacks it uses. This does not make it a stack machine, those are just needed to store data when it is not currently used.
So a typical operations would be: * "Take the number from register 0, take the number from register 1, add those two numbers together, write the result in register 0." * "Take the lower 16 bits of this instruction and write them in register 2."
Lua and Neko are virtual register machines (at least in current versions).
And then there are Stack Machines. They are, I think, easier to understand than register machines, but following a program during execution is more confusing, since the manipulated data is more complicated to follow.
A stack is just a pile of data. Data is portioned in fixed sizes, a portion is called a word. All you can normally do is put a word on top of the stack - we will call that operation a push, or you can take the word that is currently on top of the stack (if there is one) - we will call that a pop. No other direct manipulations of the stack are allowed (I say "direct manipulations", because indirectly there often are ways that this is done, but that is a detail for later).
Manipulation of data is done this way by the machine. If you want to add two numbers, say 5 and 23, you would write a program that does this:
Push the first number to the stack.
Push the second number to the stack.
Execute the "ADD" operation.
That operation will pop the two numbers from the stack, add them, and push their sum back on the stack (so that after the operation there will be one word less on the stack).
A stack machine will also typically have some additional place to store words when you do not need them on the stack. These places can relate to variables inside a program.
As you can see from the example above, instructions in a stack machine often do not need to have arguments. If data is to be manipulated, it is always on top of the stack. There is no need to address its location, as you would do in a register machine.
Because of this, the instructions for a stack machine are typically encoded in a single byte. This byte holds a number we will call opcode (short for operation code), that simply identifies the operation to execute. If your operation does need additional arguments, you write them to the bytes following your opcode byte (the oparg), so that the operation can read them from your program. This structure of single bytes encoding our program is why we call this representation bytecode.
The concept of a stack machine is easy to implement in software, but it is not so easy to do so in hardware. That is why your typical computer is a register machine. There are, however, a lot of historical examples of important physical stack machines.
The most famous example of a virtual stack machine is the Java VM. Java source code is compiled to bytecode that is executed inside a virtual machine, the JVM. This vm is so common, that many newer programming languages compile to Java bytecode. It makes it possible to run programs written in that languages on any system that has a JVM; and that includes just about every major and many minor computer systems. A second example for a stack machine is the Python VM.
Some random thought on register and stack machines¶
While writing this down, describing the two kinds of machines I couldn't help but notice a curious fact:
A register machine manipulates data inside addressable registers. When the data is not need, it can be stored away in some kind of stack.
A stack machine manipulates data inside a stack. When the data is not needed, it can be stored away in some kind of addressable spaces, not unlike registers.
It looks as if you just need both concepts to work efficiently.
So I have been talking a lot about VMs without doing anything concrete. Well that is not true, I have done quite a bit already, but I am still describing earlier steps. We will get there.
When I was looking around for a scripting language to use inside our embedded devices, I came across an article I mentioned in an earlier post: Creating a Virtual Machine/Register VM in C.
Reading it made me want to try working with a register machine, mainly because I have not been stuff like this since my early semesters. Never hurts to refresh rusty knowledge.
So I started designing a register VM, starting from that code, but more complex, with longer data words and longer instruction words, more registers, and so forth. For this project I came up with lovem as a working title. It still stuck to now, two approaches and a year later. I also started implementing some concepts I still want to add to lovem in my current approach, but that is for a later post to discuss.
I was experimenting with a quite complicated instruction word encoding. I was trying to fit everything in a few bits (32 of them if I recall correctly) with varying instruction code length and quite long arguments. I wanted to include instructions on three registers, which takes up quite some bits to address. Of course, you can get away with two-register operations only - or if you are fancy you can even use a single address or even no address for most instructions. You will just end up with a lot of register swapping. I guess my rational for having three addresses in an instruction was code size. For what I want to do, 32 bit instruction words feel quite long (4 bytes per instruction!). And every swap would mean another 4 bytes of program size. So trying to optimise for fewer operations by having more flexible instructions.
I do not even know if that rational makes sense. I guess I would have needed to try different layouts to find out. Or maybe read more about that topic, other people have done similar things I assume. But I never got that far. The experiment showed me, that I do not want to build lovem as a register machine. I think building a clever register based architecture for my goals would make it too complicated. I want simple. To reduce the VM's overhead, but also on principle. Complexity is the enemy.
I'm pretty sure, that code still exists somewhere, but there is no sense in publishing it or even in me reading it again, so you will never see it. I think of it as a pre-study with a very useful conclusion: not a register machine.
So a stack machine it is! I have looked at a few during my research for lovem, looking at instruction sets and design ideas. It is not the first time, I have been working with those. In a different project (around the same time I started work on the register based machine), I was starting to implement a stack machine. That one had a different aim and therefore very different challenges. It was more of an object-oriented approach with dynamic program loading and calling code in different programs. It could do quite a few things already, but it will never be continued. I learned a bit about calling conventions and found out that it is not so simple, when you want to switch between multiple programs and objects. That is where the project got too frustrating for me (and some external events made it obsolete, so that is okay). But I take it for a pre-study on stack machines and calling conventions. Not that I have developed a proven concept for it, but I know about the problems there...
I had a PoC for lovem as a stack machine back then, too (right after I ditched the register approach). That code won't be published either, but the attempt showed me, that I want to take that road for a serious approach on creating lovem.
I guess this concludes the prehistory of the lovem story. I am, for whatever reason, back on the project, currently with a decent amount of motivation. You never know how long that lasts, but right now I like the idea of continuing the development, while talking about the development process, sharing my thoughts on decisions I make. Next post should start on sharing newer thoughts.
Finally, I will be showing some source code. Not directly in the journal, but I will link you to GitHub, for a start.
I have written code. And this time, I (re-)started lovem in a public git repository, so you can see what I do, if you are interested. And I hope it puts enough pressure on me, to keep on the project for a while.
In fact, there is quite a bit of code there already. I started coding, before writing any of this, and it went so well. I like how it feels. I was working any hour I could spare. When a friend asked me what I was doing, I started a somewhat complex backstory why I was doing it, instead of actually explaining anything of the stuff I was doing – and was interrupted quite early, so there was more to tell in me still. The next day, I sat down and started to write all of that down as a little story. I wanted to put it somewhere, so I started this journal to publish it. And I decided to do it in blog form, so I am publishing that background story bit by bit.
So, as of writing this, there is a lot of work completed on the VM. Tt is amazing what things it can do for how little code there is. When this post goes public, there should be quite lot more done...
I plan to continue sharing my thoughts while I work on the VM. So you will be able to follow my failures and see the attempts that I will be ditching later. I think the format of this journal can work out, but we will see how I like it over time. It will be behind on progress, as I want to take time to share things as they unfold. And this should help to produce a somewhat continuous publication stream. Git being what git is, should support me in showing you the things I do back in time, using the power of commits.
As things are with blogs, my entries will be very different, depending on what I want to tell and on what I did. So far most blogs where conceptional thinking, some research, and a lot of blabla, which I tell because it interests me myself. In the future, there should be concrete problems I find and solve in source code - or which I fail to solve.
Me original first commit was way too late and contained way too much code. Also, I did not plan to show it to you like this, back then. So, as mentioned before, I rolled back and started again, with more commits. And I am keeping tags now, so that I have well-defined versions for my blog posts. That should make it easy for you to follow up, if you want to.
The new, artificial "first commit" is now a tag/release: v0.0.1-journey. You can view the code for any tag online, this one you will find under:
I think this will be a theme of this journal: linking you to what I did, when I am writing about it. And I will try to share my trails of thoughts, leading to my decisions (and errors, as it will be). I will do that, for that v0.0.1-journey, soon, don't worry, I will explain everything I did. But the next journal entry will be about some decisions again; mainly about the language I am using.
It is not the original initial commit, as I did commit way too late, and it was not suitable for writing a story about it. So I created a new, clean version, with just very simple concepts that I can explain in a single entry. In the next entry, that is.
If you are thinking: "What is that weird source code?", then you are in for a real treat (and a lot of pain), should you chose to follow up. The code you are seeing is written in Rust.
Why Rust? Because Rust! Writing Rust can feel so good! And for something like a VM, it is such a good choice. If you have never heard of the language (or heard of it, but never looked into it), it is hard to understand why this is so. My advice: try it! use it! Or read along this journal, code along, you might like it.
When you start, chances are high that you will not like Rust. The compiler is a pedantic pain in the ass. But at the same time it is incredibly polite, trying to help you find out, what you did wrong, and suggesting what you might want to do instead. And Rust really, really tries, to keep you from shooting yourself in the foot. It tries to make common mistakes impossible or at least hard to do – those mistakes that happen everywhere in C/C++ programs and their like. Yes, those mistakes that are the cause of the majority of all security problems and crashes. Buffer overruns, use after free, double free, memory leak – to name just some common ones from the top of my head. And Rust makes all it can to make those mistakes impossible during compilation! So it does not even add runtime overhead. That is so powerful!
And it is so painful. Half of the things you do, when writing C/C++, you will not be able to do in Rust in the same way. Every piece of memory is owned. You can borrow it and return it, but it cannot be owned in two places at once. And if any part of the program has writing access to it, no other part may have any access. This makes some data structures complicated or impossible (there are ways around it), and you will have to think quite differently. But if you give in on that way of thinking, you can gain so much. Even peace of the mind, as the coding world will look a lot saner inside Rust source code. This will, of course, come with the price, that all code in other languages will start to feel dirty to you, but that is the way.
Also, there are a lot of ways to write code, that you cannot add to a language that already exists. C and C++ will never be freed of their heritage; they will stay what they are, with all their pros and cons. Things are solved differently in Rust. Did I mention there is no NULL? And I have never missed it for a moment. Rust solves the problems other languages solve with NULL by using enums. That comes with certainty and safety all the way. There are no exceptions either. That problem is also solved by using enums. The way the language embraces those, they are a really powerful feature! And there are lot more convenient ways of organising code, that I keep missing in my daily C/C++ life.
I will not write an introduction into Rust here. At least not your typical "how to get started in rust" intro. There are a lot of those out there, and I am already 10 posts into my Journal without programming. Maybe the Journal will become a different kind of Rust introduction, as it will try to take you along a real project, as it develops, from the beginning on. I will run into problems along the way and try to solve them in Rusty ways. This might be a good way, to start thinking in Rust. But, to be honest, I did never finish a project in Rust, yet. I got quite a bit running and functional, and I think in some parts in a rust-like way. But this is for me as much as anyone else as a learning project. I will make weird things. But the basics, I have worked with, yeah.
The initial learning curve will be steep! I try to not get too fancy in the first draft, so the code will not be good Rust there! So, if you are shocked at how bad my Rust is – it will be very different, soon. But I want to give everyone a fair chance to hop on without understanding all the concepts. The initial code should be not too hard to follow, if you know C/C++, I hope. Learning a new thing (writing a VM) in a new, quite different language is a mouth full, I know.
Yes I did say that. And I do use those. It is not easy to change that, when you have a certain amount of legacy code (and not much experience with the new language, as we do not really have, yet). But we do have a saying these days. Often, after a debugging session that lasted for hours, when we find the bug, understand it and fix it, there is this realisation, that fits in the sentence:
"Mit Rust wär' das nicht passiert." — "This would not have happened with Rust."
So, this will not happen to me with this project, because those things will not happen with Rust!
The first draft of source code, that will be our VM, explained.
I dumped some source code in front of you, and then I started to talk about programming languages. Time now, to explain what I did and why. We only have 132 lines, including comments. We will go through all parts of it. And I will talk a little about how Rust's basic syntax works, while I use it. Not too much, since it is not good Rust code, yet, but to help you start. This will be a longer entry.
I swear, if I do not see some code in this post...¶
Nothing fancy, just a struct that will represent our Virtual Machine. Only three fields for now:
stack: Obviously our stack machine would need one of those. This will hold values during execution. I am using a Vector. That is nothing more than a chunk of memory, that knows how much capacity it has and how many values are in it at the moment. It does support resizing, but I do not want to use that.
pc will be our program counter. That is a register 1 holding the progress in the program during execution. It will always point at the instruction that is to be executed next.
op_cnt will be counting the number of operations executed. For now, I want that information out of curiosity, but later it will be useful for limiting execution time for programs.
usize and i64 are Rust's names for integer types. The language is very explicit in those terms (and very strict, as in every aspect). I will not give a real introduction to Rust for you (there are pages that do that), but I will try to start slowly and give you hints on the important things I introduce, so that you get the chance to learn about them parallel to this journal. I hope, that makes it easier to follow for Rust beginners. To readers that know Rust: please excuse the crude code here! I will make it more rusty, soon. Skip to the next post, if you cannot handle it.
We will also need a program that we will run in our VM. For the start, a crude array of bytes will do. The VM will be running bytecode after all. And that really is only that: a bunch of bytes, that you will soon be able to understand.
// assign `pgm` to hold a program:
+letpgm=[0x00asu8,0x01,100,0xff];
+
We will use a program that is a bit longer, but right now I wanted you to see a program, that is actually nothing but a collection of bytes in Rust code. let declares and assigns a variable here, named pgm. It is an array of 4 bytes (u8 is an unsigned 8bit integer - you might know it as uint8_t from other languages). And that variable will not be variable at all. By default, all variables in Rust are immutable. If you want to change it, later, you would have to declare it using the modifier mut.
There is no need to modify the program after creation, we just want to read it for execution. But our VM will have to be mutable, as it has changing internal state. Here is our complete main function, creating the (immutable) program and the (mutable) VM, and running the program. Of course, the run(...) method is still missing. And you will see the program, we will be using (with some constants that I did not define, yet).
fnmain(){
+// Create a program in bytecode.
+// We just hardcode the bytes in an array here:
+letpgm=[op::NOP,op::PUSH_U8,100,op::PUSH_U8,77,op::ADD,op::POP,0xff];
+// Crate our VM instance.
+letmutvm=VM{
+stack: Vec::with_capacity(100),
+pc: 0,
+op_cnt: 0
+};
+// Execute the program in our VM:
+vm.run(&pgm);
+}
+
So far we only have an initialized data structure and some bytes. Let's do something with it. Rust does not really use objects (and I think that is good). But it has associated functions that work on types, and methods that work on instances of types. We will write some methods for our VM struct. Let's start with the one for reading our program:
implVM{
+/// Fetch the next byte from the bytecode, increase program counter, and return value.
+fnfetch_u8(&mutself,pgm: &[u8])-> u8{
+ifself.pc>=pgm.len(){
+panic!("End of program exceeded");
+}
+letv=pgm[self.pc];
+self.pc+=1;
+v
+}
+}
+
The fetch method will work on our VM instance. The first parameter is &mut self – that tells us it works on an instance of the type VM. It will work on a reference to the instance (indicated by the &), and it can modify the data (indicated by the mut). It will also take the reference to an array of u8s, but that it will not be able to modify (no mut). It returns a u8.
What it does is simply read and return a byte from the program, and increase the VMs internal program counter by one, so that the next call to fetch will return the next byte. Simple.
So, what is that panic!() you might ask? Well, if we reach that instruction, it will start to panic, and then it will die. That is not a nice way to act. Do not worry, we will change that to something more reasonable, when we start writing better Rust. And what about the naked v in the last line? It will have the function return the value of v.
Now, let's look at that run method, we were calling in main:
implVM{
+/// Executes a program (encoded in bytecode).
+pubfnrun(&mutself,pgm: &[u8]){
+// initialise the VM to be in a clean start state:
+self.stack.clear();
+self.pc=0;
+self.op_cnt=0;
+
+// Loop going through the whole program, one instruction at a time.
+loop{
+// Log the vm's complete state, so we can follow what happens in console:
+println!("{:?}",self);
+// Fetch next opcode from program (increases program counter):
+letopcode=self.fetch_u8(pgm);
+// We count the number of instructions we execute:
+self.op_cnt+=1;
+// If we are done, break loop and stop execution:
+ifopcode==op::FIN{
+break;
+}
+// Execute the current instruction (with the opcode we loaded already):
+self.execute_op(pgm,opcode);
+}
+// Execution terminated. Output the final state of the VM:
+println!("Terminated!");
+println!("{:?}",self);
+}
+}
+
The comments should explain, what is going on there. Initialise VM, then loop over the program, fetching one instruction at a time and executing it, until we reach the end. And you might have noticed, that our program will be very talkative. I added a lot of printlns, that tell just about everything that happens, during execution.
I guess it is time to look at those op:: constants I keep using.
/// Module holding the constants defining the opcodes for the VM.
+pubmodop{
+/// opcode: Do nothing. No oparg.
+///
+/// pop: 0, push: 0
+/// oparg: 0
+pubconstNOP: u8=0x00;
+/// opcode: Pop value from stack and discard it.
+///
+/// pop: 1, push: 0
+/// oparg: 0
+pubconstPOP: u8=0x01;
+/// opcode: Push immediate value to stack.
+///
+/// pop: 0, push: 1
+/// oparg: 1B, u8 value to push
+pubconstPUSH_U8: u8=0x02;
+/// opcode: Add top two values on stack.
+///
+/// pop: 2, push: 1
+/// oparg: 0
+pubconstADD: u8=0x10;
+/// opcode: Terminate program.
+///
+/// pop: 0, push: 0
+/// oparg: 0
+pubconstFIN: u8=0xff;
+}
+
Just 5 u8 constants there, grouped in a module as a namespace. And a lot of comments to explain them. We have 5 different operations for our VM. The only thing missing is some code, that actually executes those instructions:
implVM{
+/// Executes an instruction, using the opcode passed.
+///
+/// This might load more data from the program (opargs) and
+/// manipulate the stack (push, pop).
+fnexecute_op(&mutself,pgm: &[u8],opcode: u8){
+println!("Executing op 0x{:02x}",opcode);
+matchopcode{
+op::NOP=>{
+println!(" NOP");
+// do nothing
+},
+op::POP=>{
+println!(" POP");
+letv=self.stack.pop().unwrap();
+println!(" dropping value {}",v);
+},
+op::PUSH_U8=>{
+println!(" PUSH_U8");
+letv=self.fetch_u8(pgm);
+println!(" value: {}",v);
+self.stack.push(vasi64);
+},
+op::ADD=>{
+println!(" ADD");
+leta=self.stack.pop().unwrap();
+letb=self.stack.pop().unwrap();
+self.stack.push(a+b);
+},
+_=>{
+panic!("unknown opcode!");
+}
+}
+}
+}
+
You can think of the match as a switch statement. It is much more than that, but here we use it as one. Each of our opcodes is handled individually. And we log a lot, so that we can read what is happening, when we run it. Ignore the unwrap() thingies for the time being. They are just there to try and ignore potential runtime errors. Again, not good Rust style, but, you know: later.
The four operations get more complex in what they do. Let's go through them one by one:
NOP – this does nothing, it just wastes bytecode and execution time. I have included it simply to be the most basic operation possible.
POP – this is our first modification of the stack. It simply discards the topmost value, decreasing the stack's size by one.
PUSH_U8 – this is the only operation that reads additional data from the program. It only reads a single byte (increasing the program counter by one), and puts it on top of the stack, increasing the stack's size by one. This is how you can get data from your program into the VM, to work with them. It is how numeric literals in your program are handled.
ADD – the only operation that works on data. It pops its two operands from the stack, adds them, and pushes the sum back on the stack. This is how data is manipulated in a stack machine. The operation reduces the stack's size by one effectively, but there need to be at least 2 values on it for it to be executed.
That is the out complete VM so far, and it will execute a program, if you compile and run it (which we will do in the next post).
The easy way, to get the code and play with it, would be to clone the git repository and check out the tag v0.0.1-journey. If you did not understand any of that, you might want to do a tutorial on git, before you continue reading. Anyways, here is some copy&paste commands, you can hack into your bash prompt, to do, what I just told you to do. Use at your own risk, I'm not responsible for what you do to your system.
This will copy all source code from GitHub and its history to your computer, and it will roll the source code to the state we are looking at in this entry. The last command cargo run lovem will compile and execute the program - that is, if Rust is installed and ready to run (and in the correct version). cargo is Rust's package manager, that handles dependencies and compiles your projects. I will not explain those things further, but now you know what to look for.
Now, that we have a VM, we will run a program on it.
So we built our very first VM and studied the code in detail. It is time to execute a program on it and look at it's output. We will look at every single step the program takes. Aren't we lucky, that our VM is so talkative during execution?
If you missed the code, look at the previous post, A VM.
It is quite talkative. And isn't it nice, how easy it is, to print the complete state of our VM in Rust? And it costs no overhead during runtime, as it is generated during compilation for us. Isn't that something?
So, what is happening there? Our program pgm looks like this:
That are 8 bytes that consist of 6 instructions. Each instruction has a 1 byte opcode. Two of those instructions (the PUSH_U8) have one byte of argument each, making up the remaining two bytes of our program. Here they are listed:
NOP
PUSH_U8 [100]
PUSH_U8 [77]
ADD
POP
FIN
The NOP does not do anything. I just put it in front of the program to let you see fetching, decoding, and executing without any effects:
We just increased the program counter by one (we advance one byte in the bytecode), and the operation counter counts this executed instruction. Let's look at the next instruction, that is more interesting:
Here the PC is increased by two. That happens, because we fetch an additional value from the bytecode. The op_cnt is only increased by one. And we now have our first value on the stack! It is the byte we read from the bytecode. Let's do that again:
Now there is only one value left on the stack, and it is the sum of the two values we had. There happened quite a lot here. The two values we had before where both popped from the stack (so it was shortly empty). The add operation adds them, and pushes their sum back on the stack. So now there is one value on the stack, and it is the result of our adding operation.
What's next?
VM { stack: [177], pc: 6, op_cnt: 4 }
+Executing op 0x01
+ POP
+ dropping value 177
+VM { stack: [], pc: 7, op_cnt: 5 }
+
It is always nice to leave your workplace all tidied up, when you are done. We can do that by popping our result back from the stack, leaving it empty. And besides, our POP operation prints the value it drops. One more instruction to go:
So, we ran a program in a VM. Hooray, we are done. Only 132 lines of code, including excessive comments and logging. That was easy.
Well yeah - it doesn't do much. But you can understand the root principle that makes up a stack machine. It's that simple.
Go play around with it a bit. It is the best way to learn and to understand. I mean it! Write a longer program. What happens to the stack? Add another opcode – how about subtraction? Will your program execute at all? What happens, if it does not?
After we got our Proof of Concept running, we clean up our code and make it look like a respectable Rust program.
Did you play around with the program from the previous post? If you are new to Rust, you really should! At least mess around with our bytecode. You should find, that our VM does not react well to errors, yet. It simply panics! That is no behaviour for a respectable rust program.
We will make it more rusty, look at the enhanced version:
If you do not know your way around Rust, some of those things will be difficult to understand. It might be time to read up on some Rust, if you intend to follow my journey onwards. I will not explain everything here, but I will give you some leads right now, if you want to understand the things I did in that change.
The most important thing to understand for you will be Enums. Yeah, I know. That is what I thought at first learning Rust. "I know enums. Yeah, they are handy and useful, but what could be so interesting about them?"
Well, in fact, enums in Rust completely change the way you are writing code. They are such an important part of the language that they have an impact on just about every part of it.
It is obviously a datatype to communicate runtime errors of different nature. And I use it a bit like you would exceptions in some other languages. Nevermind the #[derive...] part for now. That is just for fancy debug output (and a bit more). Once you understand line 33: InvalidOperation(u8),, you are on the right track! To put it into easy terms: values of enums in Rust can hold additional values. And, as you see in our RuntimeError, not all values have to hold the same kind of additional value, or a value at all. This is, what makes enums really powerful.
If you know what happens in the return type of fn push in line 70, you are golden. The Result type can communicate a value on success or an error condition on failure. The great difference to typical exceptions form other languages is, that there is no special way to pass on the errors, as with exceptions that are thrown. It is just your normal return statement used. And this is done, you guessed it, with enums. If you want to read up on Result, try understanding Option first. I am using that in my code, even though you cannot see it.
If you are wondering now about the return of fn push, that does not have a return statement to be seen, you should find out, while some of my lines do not have a semicolon ; at the end, while most do.
So, this is what will get you through a lot here. Try to understand those in the given order:
Option
Some(v) vs. None
Result<v, e>
Ok(v) vs. Err(e)
if let Some(v) =
match
Result<(), e>
Ok(())
unwrap()
?
Bonus: ok(), ok_or(), and their likes
If you understand for each of those, and why I put them in the list, you are prepared to handle most Rust things I will be doing in the next time. If you have problems with parts of it, still, move on. It gets better after a while, when you use them.
After a few days of progress on the project itself, I spent a bit of time on the site again. We have the fancy link to our GitHub repo in the upper right corner now. But more important, I added support for comments on my entries. You can now react and ask questions or share your thought.
I am using giscus.app (and, again, I copied that idea from @squidfunk and their site on mkdocs-material, which is what I did for this complete site, more or less). Giscus is an open source app that stores the comments completely inside GitHub discussions, so the content is stored along the lovem repository and at the one place where everything is stored already anyway. If you want to participate in the comments, you need to log in using your GitHub account. That is great, because I don't need to care about user management, nor about any database.
Feel free to use this entry to try out the new feature, because that is what I am gonna do!
We turn our project from a binary project into a library project.
So far, our lovem cargo project holds a single binary. That is not very useful for something that should be integrated into other projects. What we need is a library. How is that done? Simple: we rename our main.rs to lib.rs.
But wait? What about fn main()? We do not need that inside a library. But it would be nice to still have some code that we can execute, right? Well, no problem. Your cargo project can only hold a single library, but it can hold even multiple binaries, each with its own fn main(). Just stuff them in the bin subdir.
While we are at it, I split the project up into multiple source files, to get it organised. It is small, still, but we will have it grow, soon. Here is, what we are at now:
The only real configuration in that file is edition = "2021". Rust has a major edition release every three years. These are used to introduce braking changes. You have to specify the edition you use explicitly, and there are migration guides. We use the most recent one, 2021.
Rust manages projects by using default project layouts. That is why we need not write a lot into the Cargo.toml. The src directory holds our source code. The fact that it holds a lib.rs makes it a library, and lib.rs is the entry point. This is what is in it:
pubmodop;
+pubmodvm;
+
+// re-export main types
+pubusecrate::vm::VM;
+
Really not a lot. It declares the two modules op and vm and makes them public. So, whatever rust project will be using our library will have access to those modules. The modules will be in the files op.rs and vm.rs. What a coincidence, that are exactly the remaining two source files in this directory!
The last line just re-exports a symbol from one of those submodules, so that programs using our library can access more easily. Will will be doing that in our binary.
Back in v0.0.2-journey, we already had a module called op to hold the opcodes. We had it stuffed in our main.rs. Now it lives in a separate file, so we do not have to scroll over it every time.
This holds the rest of our source code (except for fn main() which has no place in a lib). The only new thing, compared with our former main.rs is the first line:
usecrate::op;
+
This simply pulls the module op into the namespace of this module, so that we can access our opcode constants as we did before. The rest remains the way we already know.
So how do we use our lib in a project? That is best illustrated by doing it. And we can do so inside our project itself, because we can add binaries. Just put a Rust source file with a fn main() inside the bin subdir. There we can write a binary as we would in a separate project, that can use the lib.
We did that in the file test-run.rs:
uselovem::{op,VM};
+
+fnmain(){
+// Create a program in bytecode.
+// We just hardcode the bytes in an array here:
+letpgm=[op::NOP,op::PUSH_U8,100,op::PUSH_U8,77,op::ADD,op::POP,0xff];
+// Crate our VM instance.
+letmutvm=VM::new(100);
+// Execute the program in our VM:
+matchvm.run(&pgm){
+Ok(_)=>{
+println!("Execution successful.")
+}
+Err(e)=>{
+println!("Error during execution: {:?}",e);
+}
+}
+}
+
This is the fn main() function from our former main.rs. Instead of having all the functions and definitions, it just has this single line at the top:
uselovem::{op,VM};
+
Nothing too complicated. It tells the compiler, that our program uses the library called lovem (which is, of course, the one we are writing ourselves here). It also tells it to bring the two symbols op and VM from it into our namespace.
The op one is simply the module op defined in op.rs. Because lib.rs declares the module public, we can access it from here. VM does not refer to the module in vm.rs, as that module is called vm (in lower case). VM is actually the struct we defined in vm, that we use to hold the state of our Virtual Machine.
We could include the struct as lovem::vm::VM, which is its full path. But I find that a bit anoying, as VM is the main type of our whole library. We will always be using that. So I re-exported it in lib.rs. Remember the line pub use crate::vm::VM;? That's what it did.
So, how do we run our program now? Back in v0.0.2-journey we simply called cargo run. That actually still works, as long as we have exactly one binary.
But we can have multiple binaries inside our project. If we do, we need to tell cargo which it should run. That can easily be done:
cargorun--bintest-run
+
The parameter to --bin is the name of the file inside bin, without the .rs. And no configuration is needed anywhere, it works by convention of project layout.
What, homework again? Yeah, why not. If it fits, I might keep adding ideas for you to play around with. Doing things yourself is understanding. Stuff we just read, we tend to forget. So here is what might help you understand the project layout stuff I was writing about:
Add a second binary, that runs a different program in the VM (with different bytecode). You have all the knowledge to do so. And then run it with cargo.
In earlier posts I included explicit links to the source code at the time of writing. That got annoying to do really fast. So I added a new feature to my blogem.py that I use to write this journal. Entries like this, that are explaining a specific state of the source of lovem will have a tag from now on. This corresponds to a tag inside the git repository, as it did in earlier posts. You will find it in the card at the top of the post (where you see the publishing date and the author). It is prefixed with a little tag image. For this post it looks like this:
At the bottom of the entry (if you view it in the entry page, not in the "whole month" page), you will find it again with a list of links that help you access the source in different ways. The best way to work with the code, is to clone the repository and simply check out the tag. I also added a page on this site, explaining how you do that. You can find it under Source Code.
So, in future I will not be adding explicit links, only this implicit ones. And there will be a link to the explaining page at the bottom. This should be convenient for both, you and me.
Many design decisions must be made for lovem. Here I talk about some of those in the current state.
I have shared and discussed source code in the recent posts. Now it is time again, to write about design decisions. I made a few of them for the code you saw. So far I have not been reasoning about those here, and some of you might have wondered already. Let's talk about them.
Let me remind you: lovem is a research project for myself. And an education project for myself as well. None of my choices at this stage are set into stone. I will make lots of mistakes that I will be changing later. I even choose some paths, that I know I will be leaving again. I might just take any solution for a problem, at this stage, as I do not know, what is the right choice. So start somewhere, see where it goes. Some of those are deliberately weird or bad choices, but they make things clearer or simpler at this stage.
Let us address two of those choices you can find in the current source code.
I talked about register sizes defining architecture, back in What is a Virtual Machine anyway?. And then I went totally silent about that topic and just used i64 as type for my stack. Is that a good idea? I used it for simplicity. The idea goes back to when I was experimenting with using a register machine for lovem. Having a simple datatype that can handle big values seems simple. After all, other languages/VMs use some version of float as their single numeric datatype:
JavaScript
JavaScript Numbers are Always 64-bit Floating Point
Unlike many other programming languages, JavaScript does not define different types of numbers, like integers, short, long, floating-point etc.
JavaScript numbers are always stored as double precision floating point numbers, following the international IEEE 754 standard.
Well, reducing complexity is good. But having each little number you use in your programs eat up 8 bytes of memory does not sound low overhead to me. And that is, after all, the goal. So I guess, that will change in the future. But let's keep it for the time being. There will be some interesting things we will be doing in the near future; even if we might dump those features later. I already implemented them during the early phase (when I was not writing a public journal), so not adding them here would be insincere. Having 64 bit values is a part of our journey.
I have no glossary, yet, so you have to live with me inventing terms on the spot. I used that word in the source code already. What I mean by it, are the arguments to an instruction inside the bytecode, that follow the opcode and influence the operation. They are the arguments you give inside your program's code.
As of v0.0.3-journey we only have a single opcode that takes an oparg, and that is push_u8. You can see how there is a fetch_u8() instruction in the code that handles that operation, and none in the other operations. See execute_op.
So we have different behaviour depending on the opcode. push_u8 fetches an additional byte from the bytecode, the other opcodes do not. Existing VMs handle this differently. The Java VM, for example, has a dynamic number of opargs, too. They call them operands:
2.11. Instruction Set Summary
A Java Virtual Machine instruction consists of a one-byte opcode specifying the operation to be performed, followed by zero or more operands supplying arguments or data that are used by the operation. Many instructions have no operands and consist only of an opcode.
The Python VM on the other hand, uses exactly one byte as oparg on all instructions
The bytecode can be thought of as a series of instructions or a low-level program for the Python interpreter. After version 3.6, Python uses 2 bytes for each instruction. One byte is for the code of that instruction which is called an opcode, and one byte is reserved for its argument which is called the oparg.
[...]
Some instructions do not need an argument, so they ignore the byte after the opcode. The opcodes which have a value below a certain number ignore their argument. This value is stored in dis.HAVE_ARGUMENT and is currently equal to 90. So the opcodes >=dis.HAVE_ARGUMENT have an argument, and the opcodes < dis.HAVE_ARGUMENT ignore it.
That does remove some complexity. And adds new complexity - for opcodes with more than one oparg byte - they exist in python and are handled with a special opcode, that adds an additional oparg byte. I think it will make execution faster, as fetching can be done it advance. If you do not know, how many bytes you need, before your read your opcode, you cannot prefetch the next instructions.
For our goal, keeping the bytecode small is much more important than execution time. So I am pretty sure we will stick with the dynamic number of oparg bytes in lovem.
The basic operation of the VM is working. Let us add a few more opcodes, so that we can do calculations.
We have created a rust library that holds our virtual register machine. We can now add multiple executables to it, so that makes it easier, to write different programs and keep them (to mess around with the VM). We will add a few more opcodes to our repertoire, because only adding numbers is just plain boring.
I put some sort into what opcodes to introduce; but be advised, that none of them are final. Not only is the VM experimental and in a very early state, I introduce codes that I do not intend to keep on purpose. This is also a demonstration/introduction. So I add codes that are helpful at the time of writing, for experimenting. FIN is an example of a code, that will most likely be removed at some point. But for now it is nice to have a simple way to explicitly terminate the program. It gives some confidence, when we reach that point, that our program works as intended, and that we did not mess up the bytecode.
Baby steps. No rush here. We had adding as a first example. We will introduce subtraction, multiplication, division, and modulo. Sounds like not much, but we will run in some complications, anyways... Here is our addtion to op.rs.
/// opcode: Subtract top two values on stack.
+///
+/// pop: 2, push: 1
+/// oparg: 0
+pubconstSUB: u8=0x11;
+
+/// opcode: Multiply top two values on stack.
+///
+/// pop: 2, push: 1
+/// oparg: 0
+pubconstMUL: u8=0x12;
+
+/// opcode: Divide top two values on stack.
+///
+/// pop: 2, push: 1
+/// oparg: 0
+pubconstDIV: u8=0x13;
+
+/// opcode: Calculate modulo of top two values on stack.
+///
+/// pop: 2, push: 1
+/// oparg: 0
+pubconstMOD: u8=0x14;
+
Simple enough those new codes, just copy and paste from ADD. But it turns out, subtraction is not as easy as addition. Here is the handling code we used for ADD:
As my math teacher liked to say: "... dann fliegt die Schule in die Luft!" – If we do that the school building will blow up. It is his way of dealing with the issue, that pupils are told "you must never divide by zero", but that they are never given an understandable reason for it. So just own it, and provide a completely absurde one.
What happens, is we keep it like this? Well, not much - until you write a program that divides by zero. Then, this will happen:
[...]
+VM { stack: [4, 0], pc: 4, op_cnt: 2 }
+Executing op 0x13
+ DIV
+thread 'main' panicked at 'attempt to divide by zero', src/vm.rs:142:31
+stack backtrace:
+ 0: rust_begin_unwind
+ at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/std/src/panicking.rs:584:5
+ 1: core::panicking::panic_fmt
+ at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/panicking.rs:143:14
+ 2: core::panicking::panic
+ at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/panicking.rs:48:5
+ 3: lovem::vm::VM::execute_op
+ at ./src/vm.rs:142:31
+ 4: lovem::vm::VM::run
+ at ./src/vm.rs:85:13
+ 5: modulo::main
+ at ./src/bin/modulo.rs:10:11
+ 6: core::ops::function::FnOnce::call_once
+ at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/ops/function.rs:227:5
+note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
+
+Process finished with exit code 101
+
Our program panics! I told you earlier, that this is not good behaviour. I introduced you to a lot of weird Rust stuff, just to avoid those. So, let us not re-introduce them now. So, what can we do instead?
Division by zero is a runtime error, for sure (at least in this numerical domain we are working with). But it should not be a runtime error in our virtual machine, it should be a runtime error in the program it is running. Luckily, we already have that mechanism in our VM. So let us add a new runtime error:
/// An error that happens during execution of a program inside the VM.
+##[derive(Debug, Clone, PartialEq)]
+pubenumRuntimeError{
+EndOfProgram,
+UnknownOpcode(u8),
+StackUnderflow,
+StackOverflow,
+DivisionByZero,
+}
+
We add a check for the DIV and MOD handlers (modulo is a division as well). If we run that program dividing by zero again, we now get this:
[...]
+VM { stack: [4, 0], pc: 4, op_cnt: 2 }
+Executing op 0x13
+ DIV
+Error during execution: DivisionByZero
+
+Process finished with exit code 0
+
Yes, it still fails. But only the execution of the bytecode fails, not the execution of our virtual machine. You can now handle the problem inside your Rust program in a way that fits your needs. Much better. In the next post, we will be using our new instructions in a fancy way, that works well with a stack machine.
Oh, not sure. Play around with it, I guess? As always. Feel free to write a calculation into a program and compare the results. It should work, unless I messed up again. You should have at least, at some point, write a program in bytecode yourself, so that you know how that feels.
We are using the design of a stack machine to efficiently execute some calculations.
The way stack machines work can be used in programs that execute calculations. We will look at it by implementing an example from the Wikipedia page about stack machines.
I will quote a lot of it here. You can see the full text of the article and its authors when you follow the Wikipedia permalink to the article.
Design
Most or all stack machine instructions assume that operands will be from the stack, and results placed in the stack. The stack easily holds more than two inputs or more than one result, so a rich set of operations can be computed. In stack machine code (sometimes called p-code), instructions will frequently have only an opcode commanding an operation, with no additional fields identifying a constant, register or memory cell, known as a zero address format.2 This greatly simplifies instruction decoding. Branches, load immediates, and load/store instructions require an argument field, but stack machines often arrange that the frequent cases of these still fit together with the opcode into a compact group of bits.
The instruction set carries out most ALU actions with postfix (reverse Polish notation) operations that work only on the expression stack, not on data registers or main memory cells. This can be very convenient for executing high-level languages, because most arithmetic expressions can be easily translated into postfix notation.
For example, consider the expression A*(B-C)+(D+E), written in reverse Polish notation as A B C - * D E + +. Compiling and running this on a simple imaginary stack machine would take the form:
# stack contents (leftmost = top = most recent):
+push A # A
+push B # B A
+push C # C B A
+subtract # B-C A
+multiply # A*(B-C)
+push D # D A*(B-C)
+push E # E D A*(B-C)
+add # D+E A*(B-C)
+add # A*(B-C)+(D+E)
+
Well, I don't know about a "simple imaginary stack machine" - but as it happens to be, we have a very real simple stack machine at our disposal. You know where we will be going next!
The program from the Wikipedia article uses 5 variables A to E. We do not support any kind of variables, yet, but that isn't important here. We use immediates (literals from your program) to put some concrete values into the calculation. Let's just take some numbers, totally at random:
//! A small program demonstrating execution of arithmetics in our VM.
+//!
+//! For an explanation of what we are doing here, look at this wikipedia article:
+//! https://en.wikipedia.org/w/index.php?title=Stack_machine&oldid=1097292883#Design
+uselovem::{op,VM};
+
+// A*(B-C)+(D+E)
+// A B C - * D E + +
+// A = 5, B = 7, C = 11, D = 13, E = 17
+// 5 * (7 - 11) + (13 + 17) = 10
+
+fnmain(){
+// Create a program in bytecode.
+// We just hardcode the bytes in an array here:
+letpgm=[op::PUSH_U8,5,op::PUSH_U8,7,op::PUSH_U8,11,op::SUB,op::MUL,
+op::PUSH_U8,13,op::PUSH_U8,17,op::ADD,op::ADD,op::POP,op::FIN];
+// Create our VM instance.
+letmutvm=VM::new(100);
+// Execute the program in our VM:
+matchvm.run(&pgm){
+Ok(_)=>{
+println!("Execution successful.")
+}
+Err(e)=>{
+println!("Error during execution: {:?}",e);
+}
+}
+}
+
The comments spoil the result, but we want to check it calculates correctly, so that is okay. The program is the same as before: create a VM and run some hardcoded bytecode on it. Since the VM logs excessively, we will see what happens, when we run it. So the only new thing here is the bytecode program. I'll write it down in a more readable form:
To no-ones surprise, this code is the same as in the article - only with the variables replaced by numbers, and I added a pop and a fin at the end, to keep our program clean.
The output shows you the stack after every instruction. You can compare it to the stack contents in the Wikipedia listing, and you will find them identical (the order of the stack listing is switched, and of course you have numbers instead of arithmetic expressions with variables – but if you insert our numbers on the Wikipedia listing they should match).
Our PoC stack machine really can do what the imaginary one is claimed to do. That's nice.
You should really read the article on Reverse Polish Notation(permalink to article at time of writing). It will give some background on why it is important, not at least historically. The Z3, for example, arguably the first computer built by mankind3, was using it.
All our programs have been linear so far. Let's build the base for jumping around.
In every program we have written so far, each instruction just advances the PC4, until we reach the end. That is very linear. We will now introduce a new opcode, that jumps to a different position in the program.
How do we implement that? That is actually quite easy. Do you remember what I said about the PC? It is a special register, that always points to the instruction in the bytecode, that is executed next. So all our operation needs to do is modify the PC. We will give that opcode an oparg of two bytes, so we can tell it, where to jump to. Here is our new opcode in op.rs:
Now we have the dreaded goto. Don't be scared - on bytecode level, that is all well. We are not designing a high level language here, there will be gotos. But how do we fetch an i16 from our bytecode? So far we can only fetch u8. So we add some more fetching:
/// Reads the next byte from the bytecode, increase programm counter, and return byte.
+fnfetch_u8(&mutself,pgm: &[u8])-> Result<u8,RuntimeError>{
+ifletSome(v)=pgm.get(self.pc){
+self.pc+=1;
+Ok(*v)
+}else{
+Err(RuntimeError::EndOfProgram)
+}
+}
+
+/// Reads the next byte from the bytecode, increase programm counter, and return byte.
+fnfetch_i8(&mutself,pgm: &[u8])-> Result<i8,RuntimeError>{
+ifletSome(v)=pgm.get(self.pc){
+self.pc+=1;
+Ok(*vasi8)
+}else{
+Err(RuntimeError::EndOfProgram)
+}
+}
+
+/// Reads the next two bytes from the bytecode, increase programm counter by two, and return as i16.
+fnfetch_i16(&mutself,pgm: &[u8])-> Result<i16,RuntimeError>{
+lethi=self.fetch_i8(pgm)?asi16;
+letlo=self.fetch_u8(pgm)?asi16;
+Ok(hi<<8|lo)
+}
+
We already know fn fetch_u8(). fn fetch_i8() does almost the exact thing, only that it casts that byte from u8 to i8. Simple enough. Casting in Rust has the beautiful syntax <value> as <type>.
So why do we need i8? Because we are building an i16 from an i8 and a u8. Just a bit of bit arithmetic. We can pass on potential EndOfProgram runtime errors easily with ? and Result. It allows us to write some short but still easy-to-read code, I think. So now we can fetch the value, we need for our jump. So let us write the handler for the opcode in fn execute_op() of vm.rs.
Yeah - Rust does not allow us to do calculations with different types of integers. We need to explicitly cast everything. Rust tries to avoid ambiguity, so no implicit conversions. And, to be honest, the compiler has a good point. We should care even more about that calculation; we want our VM to be robust. We change the handler to:
Now, let us write a new program that uses the goto opcode:
//! Create a VM and run a small bytecode program in it.
+//!
+//! This demonstrates the goto operation with an endless loop.
+uselovem::{op,VM};
+
+fnmain(){
+// Create a program in bytecode.
+// We just hardcode the bytes in an array here:
+letpgm=[op::PUSH_U8,123,op::GOTO,0xff,0xfb,op::FIN];
+// Create our VM instance.
+letmutvm=VM::new(100);
+// Execute the program in our VM:
+matchvm.run(&pgm){
+Ok(_)=>{
+println!("Execution successful.")
+}
+Err(e)=>{
+println!("Error during execution: {:?}",e);
+}
+}
+}
+
I will write that bytecode down in a more readable format again:
push_u8 123
+goto -5
+fin
+
Only 3 instructions. And the fin will never be reached. That 0xff, 0xfb after the op::GOTO is the 2 byte oparg: an i16 with the value -5. But why -5? When the goto executed, we have read both oparg bytes, so the PC points to the fin at index 5. So adding -5 to it will set the PC to 0. The next executed instruction will be the push_u8 once again. This is an endless loop. So will the program run forever? What do you think will happen? Let's try:
There is a push_u8 operation in our endless loop. So it will fill our stack until it is full! The program hits a runtime error after 200 executed instructions. Great, now we tested that, too.
That is not very dynamic. We want to make decisions! We want to choose our path. What we want is branching. We will introduce a new opcode, that will decide, which branch the execution of our program will take, based on a value during runtime. If this sounds unfamiliar to you, let me tell you, what statement we want to introduce: it is the if statement.
So, how does that work? As mentioned, normally the PC is incremented on each byte we fetch from the bytecode. And the PC always points to the next instruction, that will be executed. So if we want to change the path of execution, what we have to do is change the value of the PC.
An operation, that simply changes the PC statically, would be a GOTO statement. But there is no branching involved in that, the path that will be executed is always clear. The if statement on the other hand only alters the PC, if a certain condition is met.
/// opcode: Branch if top value is equal to zero.
+///
+/// pop: 1, push: 0
+/// oparg: 2B, i16 relative jump
+pubconstIFEQ: u8=0x20;
+
Our new operation pops only one value. So what does it get compared to? That's easy: zero. If you need to compare two values to each other, just subtract them instead, and then you can compare with zero. That gives the same result.
And what kind of oparg does this operation take? A signed integer. That is the value that should be added to the PC, if our condition is met. This will result in a relative jump.
Same as always. Write some bytecode. Try some jumping around. Run into troubles! You can write a program, that has a fin in the middle, but executes code that lies behind that instruction.
I have had it with these motherloving bytes in this motherloving bytecode!
By now you should have come to a realisation: writing bytecode sucks! It wasn't fun to begin with, but now that we introduce jumps in our code, we need to count how many bytes the jump takes – and that with instructions that have different numbers of bytes as opargs. Encoding negative numbers in bytes is also no fun. And just think about it: if you change your program (e.g. add a few instructions), you have to adjust those relative jumps! How horrible is that? Can't someone else do it? Well, yeah, of course. We invented a machine that can do annoying and monotone tasks that require accuracy and that must be done over and over again. That machine is, of course, the computer.
Well, lucky us, that we know how to tell a computer what it should do. So let's write a program, that writes bytecode for us. I am not talking about compiling a programming language into our VM; at least not yet, not for a long time. But something that lets us write those instructions in a way that is at least a bit more human friendly.
Maybe you remember that I already tried to write some of the bytecode programs I showed you in a more readable way, like this:
The listing up there looks a bit like assembler code. And on the earlier draft of lovem I did already write a program that could translate those listings into bytecode. We will do that again, together. But this will take us some time (that is, multiple journal entries). We need to acquire some additional Rust skills for that. And there is so much to explain inside that assembler program itself.
Once again, I am making this up along the way. Yes, I have a plan, but I will just start to introduce syntax for the assembler, and it might not be ideal. That means, I might change it all again later. As the VM itself, our assembler will be experimental. You are welcome to give me ideas for the syntax; we do have the comments now, unter each post, feel free to use them. There is the whole GitHub discussions page as well. And you can still find me on Twitter. Find the link at the bottom of this page.
The assembler will be a binary that you call with parameters. A typical command line tool, just like gcc or rustc are. So what we need to do, is to learn how one writes a command line tool in Rust. One that can read files, because I plan to write assembly programs in text files. And I have no desire to start parsing command line arguments myself. Neither do I want to write an introduction on writing command line tools in Rust. All this has been done. So I kindly direct you to an online book:
That is where I got what I will be using here. They use a crate called clap, which seems to be the most used lib for building command line tools in Rust. It takes about 10 minutes to read. Finding out how to use the options of clap that I want took longer, but that will not be a thing for you, as I will just be using those options.
This is the first time we are using external crates in Rust. We need to add our dependencies to Cargo.toml, before we can use them:
Now let us start with the assembler. We create a new binary that will become our assembler: lovas.rs
//! An experimental assembler for lovem
+useclap::Parser;
+useanyhow::{Context,Result};
+
+/// Struct used to declare the command line tool behaviour using clap.
+///
+/// This defines the arguments and options the tool provides. It is also used to
+/// generate the instructions you get when calling it with `--help`.
+##[derive(Parser, Debug)]
+##[clap(name = "lovas",
+long_about = "An experimental assembler for lovem, the Low Overhead Virtual Embedded Machine.",
+)]
+structCli{
+#[clap(parse(from_os_str), help = "Path to assembler source file.")]
+source: std::path::PathBuf,
+}
+
+fnmain()-> Result<()>{
+// read, validate, and evaluate command line parameters:
+letargs=Cli::parse();
+// read complete source file into String:
+letcontent=std::fs::read_to_string(&args.source)
+.with_context(
+||format!("could not read file `{}`",args.source.as_path().display().to_string())
+)?;
+// For now, just print our all the lines in the file:
+for(n,line)incontent.lines().enumerate(){
+println!("{:4}: '{}'",n+1,line);
+}
+// We succeeded in our work, so return Ok() as a Result:
+Ok(())
+}
+
As it happens with Rust, the code is very dense. I try to explain what I do inside the code using comments. This does not look like it does too much. Yet it does. You can call it using cargo run --bin lovas, as we learned earlier:
kratenko@jotun:~/git/lovem$ cargo run --bin lovas
+
+ Finished dev [unoptimized + debuginfo] target(s) in 0.02s
+ Running `target/debug/lovas`
+error: The following required arguments were not provided:
+ <SOURCE>
+
+USAGE:
+ lovas <SOURCE>
+
+For more information try --help
+
That is already a lot! It finds out that you did not supply a required argument and tells you in a somewhat understandable error message. We did not write any of that. And it even directs you how to get help: add --help to your call.
Now if we use cargo to run our binary, we need to add an extra bit to the call, because we need to tell cargo where its own arguments end, end where the arguments to the called binary begin. This is done (as it is custom) by adding --, to indicate the end of cargo's arguments. So if we want to pass --help to lovas, we can do it like this:
kratenko@jotun:~/git/lovem$ cargo run --bin lovas -- --help
+
+ Finished dev [unoptimized + debuginfo] target(s) in 0.02s
+ Running `target/debug/lovas --help`
+lovas
+An experimental assembler for lovem, the Low Overhead Virtual Embedded Machine.
+
+USAGE:
+ lovas <SOURCE>
+
+ARGS:
+ <SOURCE>
+ Path to assembler source file.
+
+OPTIONS:
+ -h, --help
+ Print help information
+
How helpful! Also, now you can see why I added those two strings to our Cli struct; they show up in the help message.
It looks like we need to give it a file to read, if we want the program to succeed and not exit with an error. I did write a little assembly program that we can use: hallo-stack.lass. Our assembler will not so anything too useful with it, because we did not write an assembler, yet. It will simply print out the lines of the file, prefixed with the line number (the call to .enumerate() is what I use to count the lines, while iterating over them).
Well - if you have not done so, read the book I linked. At least up until chapter 1.4, I guess, that is what we need for now.
And try to trigger some errors when calling lovas. What if the file you tell it to open does not exist? What if it cannot be read? Do you understand how those error messages propagate through the program and end up as a readable message in your console?
We introduce an API for assembly to our lovem library.
Last time, we built the frame of a command line program, that will become our new assembler, lovas. It is time that we give that program the power to assemble.
lovas.rs is just the executable wrapper around the actual assembler, that will live inside the library. All lovas.rs does, is supply the command line interface. And that CLI-part does not belong in a library function. We got it nicely separated. And programs using the library can assemble source to bytecode themselves, without calling an external binary.
We alter lovas.rs a bit. The part that just printed out the source lines is gone. We replace it with a call to a new library function, that can transfer assembly code into bytecode:
fnmain()-> Result<()>{
+...thesameasbefore...
+
+// run the assembler:
+matchasm::assemble(&name,&content){
+Ok(pgm)=>{
+// we succeeded and now have a program with bytecode:
+println!("{:?}",pgm);
+Ok(())
+},
+Err(e)=>{
+// Something went wrong during assembly.
+// Convert the error report, so that `anyhow` can do its magic
+// and display some helpful error message:
+Err(Error::from(e))
+},
+}
+}
+
The important part is the call to asm::assemble(&name, &constent). We created a new module asm inside our lib. It exposes only a single function assemble and a few types for error handling. There will be a lot to unpack inside that module.
The good news for us is: we do not need to restrain ourselves as much as we do in the VM itself. Resource usage is not really an issue here, because the assembler is not meant to run in a restricted environment. The idea of lovem is, that you write your programs elsewhere, outside the restricted environment, and only run the compiled bytecode in the VM on the restricted device. And since the scope handled by the assembler will still be defined by that restricted device, we expect to only write relatively small and simple programs. With modern computers used for assembling, we can use as much memory as we want.
Oh, by the way... Yeah, I seem to stick to these short, cryptic names for the parts of lovem. VM, Pgm, op, asm - I kinda like it that way, and it goes well with the register names etc. That feels right for something as low-lever as a VM. And I give my best to always document those things properly, so that your IDE of choice will always show you, what each thing is.
I wrote a very basic assembler inside asm.rs, and it is already over 250 lines long. Quite a lot to unpack. As before, I try to explain as much as possible inside the source code itself, using comments. This makes it easier to follow, and you can even do so inside the source in the repo, without reading this blog.
There are four types that I introduce inside the mod:
/// Errors that can happen during assembly.
+##[derive(Debug, Clone)]
+pubenumAsmError{
+InvalidLine,
+UnknownInstruction(String),
+UnexpectedArgument,
+MissingArgument,
+InvalidArgument,
+}
+
+/// Report of failed assembly attempt.
+///
+/// Wraps the error that occurred during assembly and supplied information where it did.
+##[derive(Debug)]
+pubstructAsmErrorReport{
+/// Name of the program that failed to assemble.
+name: String,
+/// Line the error occurred during assembly.
+line: usize,
+/// Error that occurred.
+error: AsmError,
+}
+
+/// A single instruction parsed from the line of an assembly program.
+##[derive(Debug)]
+structAsmInstruction{
+/// Number of line the instruction was read from.
+///
+/// The number of the line the instruction was taken from, most likely
+/// from a source file. Line counting starts at 1.
+line_number: usize,
+/// Opcode defining which operation is to be executed.
+opcode: u8,
+/// Arguments used for execution of the operation.
+///
+/// Zero or more bytes.
+oparg: Vec<u8>,
+/// Position inside bytecode (starting at 0).
+///
+/// Number of bytes that come before this instruction in the program.
+pos: usize,
+}
+
+/// A assembler program during parsing/assembling.
+##[derive(Debug)]
+structAsmPgm{
+/// Name of the program (just a string supplied by caller).
+name: String,
+/// Vector of parsed assembler instructions, in the order they are in the source file.
+instructions: Vec<AsmInstruction>,
+/// Current line number during parsing.
+///
+/// Used for error reporting.
+line_number: usize,
+/// Current position inside bytecode during parsing.
+///
+/// Used to calculate the exact position an instruction will be in the bytecode.
+text_pos: usize,
+/// The error that happened during parsing/assembling, if any.
+error: Option<AsmError>,
+}
+
AsmError is easy enough to understand. We used the same idea for the RuntimeError inside the VM. When we run into an Error while trying to assemble the program, we return Err<AsmError> instead of Ok(()), so that we can propagate what happened back to the caller. The nice thing is, that with speaking names for the enum values, and with the occasional embedded value (as in UnknownInstruction(String)), the debug representation of the AsmError alone is enough to make the user understand what error was detected.
AsmErrorReport is a little wrapper we use to add the information where we ran into an error. InvalidArgument is nice hint how to fix your program - but if that program is 2000 lines long, then good luck. When you know the InvalidArgument happened in line 1337, then you will find it much faster. Especially in an assembly language, that has never more than one single instruction per line.
AsmInstruction is used to represent a single instruction inside a program. So each instance of this type will be linked to a specific line in the source file. If you don't remember, what counts as an instruction in lovem (at least at the time of writing), let me repeat: an instruction consists of exactly one operation that is to be executed, which is identified by its opcode (which is a number from 0x00 to 0xff stored in a single byte). Each instruction has zero or more bytes used as an argument, defining how the operation is to be executed. This argument is called oparg. We will also store the number of the line we found our instruction inside the source code, and the position inside the bytecode where the instruction will be.
AsmPgm will represent the complete program during the assembly process. We will collect the instructions we parse from the source in there in a Vector. And we will hold the progress during parsing/assembling. This is not the type that will be returned to the caller, it is only used internally (as you can guess by the fact that it is not defined pub).
/// Parse assembly source code and turn it into a runnable program (or create report).
+pubfnassemble(name: &str,content: &str)-> Result<Pgm,AsmErrorReport>{
+letasm_pgm=AsmPgm::parse(name,content);
+asm_pgm.to_program()
+}
+
It will return an AsmErrorReport, if anything goes wrong and the assembling fails. If the assembler succeeds, it returns an instance of Pgm. Now where does that come from? Our VM takes programs in form of a &[u8]. That will be changed soon, and then it will run programs from a special type Pgm that might have a bit more than just bytecode. I added another new module to the library: pgm.rs. That one is tiny and only holds the new struct Pgm – which itself is basic. But we have a type that holds a program, now. I believe that will be beneficial to us later.
/// Holds a program to be executed in VM.
+##[derive(Debug)]
+pubstructPgm{
+/// Some name identifying the program.
+pubname: String,
+/// Bytecode holding the programs instructions.
+pubtext: Vec<u8>,
+}
+
What is it, that the assembler does, to create such a Pgm. We will start to go through that in the next entry. This has been enough for today.
So far we have read an assembly source file into a string, and we got to know some new data structures. It is time we use the one to fill the other. Let us start parsing.
What we know so far is this:
/// Parse assembly source code and turn it into a runnable program (or create report).
+pubfnassemble(name: &str,content: &str)-> Result<Pgm,AsmErrorReport>{
+letasm_pgm=AsmPgm::parse(name,content);
+asm_pgm.to_program()
+}
+
Our experimental assembler will begin using a simple syntax. Only one instruction per line, short opnames to identify the operation to be executed, optionally a single argument. I have written a short program: hallo-stack.lass.
push_u8 123
+push_u8 200
+add
+pop
+fin
+
Straightforward. And you know the syntax already from my human friendly listings of bytecode. Parsing that looks simple. We do want to allow adding whitespaces, though. And we want to allow comments, for sure. Our assembler needs to handle a bit of noise, as in noice.lass.
## This is an awesome program!
+ push_u8 123
+push_u8 200 # What are we using the # 200 for?
+
+
+add
+ pop
+
+
+## let's end it here!
+fin
+
Those two programs should be identical and produce the same bytecode.
The parse() function we call creates an empty instance of AsmPgm and then processes the source file line after line, filling the AsmPgm on the way.
/// Parse an assembly program from source into `AsmPgm` struct.
+fnparse(name: &str,content: &str)-> AsmPgm{
+// create a new, clean instance to fill during parsing:
+letmutp=AsmPgm{
+name: String::from(name),
+instructions: vec![],
+line_number: 0,
+text_pos: 0,
+error: None,
+};
+// read the source, one line at a time, adding instructions:
+for(n,line)incontent.lines().enumerate(){
+p.line_number=n+1;
+letline=AsmPgm::clean_line(line);
+ifletErr(e)=p.parse_line(line){
+// Store error in program and abort parsing:
+p.error=Some(e);
+break;
+}
+}
+p
+}
+
content.lines() gives us an iterator that we can use to handle each line of the String content in a for loop. We extend the iterator by calling enumerate() on it; that gives us a different iterator, which counts the values returned by the first iterator, and adds the number to it. So n will hold the line number and line will hold the line's content.
We always keep track of where we are in the source. Because the enumerate() starts counting at 0 (as things should be), we need to add 1. File lines start counting at 1. The first thing we do with the line is cleaning it. Then it gets processed by parse_line(line). If this produces an error, we will store that error and abort parsing. All our errors are fatal. The final line p returns the AsmPgm. We do not use a Result this time, but the AsmPgm can contain an error. Only if its error field is None, the parsing was successful.
/// Removes all noise from an assembler program's line.
+fnclean_line(line: &str)-> String{
+// Remove comments:
+letline=ifletSome(pair)=line.split_once("#"){
+pair.0
+}else{
+&line
+};
+// Trim start and end:
+letline=line.trim();
+// Reduce all whitespaces to a single space (0x20):
+ANY_WHITESPACES.replace_all(line," ").to_string()
+}
+
We use multiple techniques to clean our input: splitting, trimming, regular expressions. When we are done, we only have lines as they look in hallo-stack.lass. The cleaned line can also be completely empty.
I want to add a word about that regexp in ANY_WHITESPACES. Where does it come from? I am using some more Rust magic there, and the crate lazy_static:
uselazy_static::lazy_static;
+useregex::Regex;
+
+// Regular expressions used by the assembler.
+// lazy static takes care that they are compiled only once and then reused.
+lazy_static!{
+staticrefANY_WHITESPACES: Regex=regex::Regex::new(r"\s+").unwrap();
+staticrefOP_LINE_RE: Regex=regex::Regex::new(r"^(\S+)(?: (.+))?$").unwrap();
+}
+
I do not pretend to understand the macro magic that happens here. But what happens, is that the regular expressions are compiled only once and then kept as some sort of global static immutable variable, that we can than use again and again all over the program as a reference. Static references are a convenient thing in Rust, if you remember what I told you about ownership. You can always have as many references to immutable static variables, because there is nothing that can happen to them, and they exist throughout the complete runtime of the program.
/// Handles a single cleaned line from an Assembly program.
+fnparse_line(&mutself,line: String)-> Result<(),AsmError>{
+ifline==""{
+// empty line (or comment only) - skip
+returnOk(());
+}
+ifletSome(caps)=OP_LINE_RE.captures(&line){
+letopname=caps.get(1).unwrap().as_str();
+letparm=caps.get(2).map(|m|m.as_str());
+returnself.parse_instruction(opname,parm);
+}
+Err(AsmError::InvalidLine)
+}
+
parse_line() processes each line. Empty ones are just skipped. We use another regular expression, to find out if they match our schema. Because we cleaned it the expression can be rather simple: r"^(\S+)(?: (.+))?$". We look for one or more non-empty chars for our opname. It can be followed by a single argument, which must consist of one or more chars, separated by a single space. That is our optional oparg. If the line fits, we found an introduction we can try to parse. That is the job of parse_instruction(). Everything that is neither empty nor an instruction, is an error, that we can simply return. It will abort the parsing and the caller will know, that there was an invalid line.
parse_instruction() can also run into an error. We use our tried pattern of returning a Result where the successful outcome does not carry any additional information (which is why we return Ok(())). The error case will return an AsmError, that carries the reason for the error. And because of our the Result type and because of Rust's might enum system, we can simply return what parse_instruction() returns to us.
Handling the instruction itself will be handled in the next entry.
Don't let yourself be confused by fancy terms like register. You can think of it as a kind of snobbish variable with a special meaning. In computers sometimes stuff magically happens when you write to a register – but it should always be documented somewhere. ↩
Yeah, I know. The answer to the question "What was the first machine to qualify as a computer?", differs, depending on whom you ask – and also on the country you ask the question in. But the Z3 is a prominent candidate. ↩
PC: the Program Counter, a special register that points to the next instruction to be executed. ↩
\ No newline at end of file
diff --git a/2022-07/NAV.html b/2022-07/NAV.html
new file mode 100644
index 0000000..7e18398
--- /dev/null
+++ b/2022-07/NAV.html
@@ -0,0 +1 @@
+ NAV - Lovem
\ No newline at end of file
diff --git a/2022-07/a-vm.html b/2022-07/a-vm.html
new file mode 100644
index 0000000..98e2f41
--- /dev/null
+++ b/2022-07/a-vm.html
@@ -0,0 +1,132 @@
+ A VM - Lovem
The first draft of source code, that will be our VM, explained.
I dumped some source code in front of you, and then I started to talk about programming languages. Time now, to explain what I did and why. We only have 132 lines, including comments. We will go through all parts of it. And I will talk a little about how Rust's basic syntax works, while I use it. Not too much, since it is not good Rust code, yet, but to help you start. This will be a longer entry.
I swear, if I do not see some code in this post...¶
Nothing fancy, just a struct that will represent our Virtual Machine. Only three fields for now:
stack: Obviously our stack machine would need one of those. This will hold values during execution. I am using a Vector. That is nothing more than a chunk of memory, that knows how much capacity it has and how many values are in it at the moment. It does support resizing, but I do not want to use that.
pc will be our program counter. That is a register 1 holding the progress in the program during execution. It will always point at the instruction that is to be executed next.
op_cnt will be counting the number of operations executed. For now, I want that information out of curiosity, but later it will be useful for limiting execution time for programs.
usize and i64 are Rust's names for integer types. The language is very explicit in those terms (and very strict, as in every aspect). I will not give a real introduction to Rust for you (there are pages that do that), but I will try to start slowly and give you hints on the important things I introduce, so that you get the chance to learn about them parallel to this journal. I hope, that makes it easier to follow for Rust beginners. To readers that know Rust: please excuse the crude code here! I will make it more rusty, soon. Skip to the next post, if you cannot handle it.
We will also need a program that we will run in our VM. For the start, a crude array of bytes will do. The VM will be running bytecode after all. And that really is only that: a bunch of bytes, that you will soon be able to understand.
// assign `pgm` to hold a program:
+letpgm=[0x00asu8,0x01,100,0xff];
+
We will use a program that is a bit longer, but right now I wanted you to see a program, that is actually nothing but a collection of bytes in Rust code. let declares and assigns a variable here, named pgm. It is an array of 4 bytes (u8 is an unsigned 8bit integer - you might know it as uint8_t from other languages). And that variable will not be variable at all. By default, all variables in Rust are immutable. If you want to change it, later, you would have to declare it using the modifier mut.
There is no need to modify the program after creation, we just want to read it for execution. But our VM will have to be mutable, as it has changing internal state. Here is our complete main function, creating the (immutable) program and the (mutable) VM, and running the program. Of course, the run(...) method is still missing. And you will see the program, we will be using (with some constants that I did not define, yet).
fnmain(){
+// Create a program in bytecode.
+// We just hardcode the bytes in an array here:
+letpgm=[op::NOP,op::PUSH_U8,100,op::PUSH_U8,77,op::ADD,op::POP,0xff];
+// Crate our VM instance.
+letmutvm=VM{
+stack: Vec::with_capacity(100),
+pc: 0,
+op_cnt: 0
+};
+// Execute the program in our VM:
+vm.run(&pgm);
+}
+
So far we only have an initialized data structure and some bytes. Let's do something with it. Rust does not really use objects (and I think that is good). But it has associated functions that work on types, and methods that work on instances of types. We will write some methods for our VM struct. Let's start with the one for reading our program:
implVM{
+/// Fetch the next byte from the bytecode, increase program counter, and return value.
+fnfetch_u8(&mutself,pgm: &[u8])-> u8{
+ifself.pc>=pgm.len(){
+panic!("End of program exceeded");
+}
+letv=pgm[self.pc];
+self.pc+=1;
+v
+}
+}
+
The fetch method will work on our VM instance. The first parameter is &mut self – that tells us it works on an instance of the type VM. It will work on a reference to the instance (indicated by the &), and it can modify the data (indicated by the mut). It will also take the reference to an array of u8s, but that it will not be able to modify (no mut). It returns a u8.
What it does is simply read and return a byte from the program, and increase the VMs internal program counter by one, so that the next call to fetch will return the next byte. Simple.
So, what is that panic!() you might ask? Well, if we reach that instruction, it will start to panic, and then it will die. That is not a nice way to act. Do not worry, we will change that to something more reasonable, when we start writing better Rust. And what about the naked v in the last line? It will have the function return the value of v.
Now, let's look at that run method, we were calling in main:
implVM{
+/// Executes a program (encoded in bytecode).
+pubfnrun(&mutself,pgm: &[u8]){
+// initialise the VM to be in a clean start state:
+self.stack.clear();
+self.pc=0;
+self.op_cnt=0;
+
+// Loop going through the whole program, one instruction at a time.
+loop{
+// Log the vm's complete state, so we can follow what happens in console:
+println!("{:?}",self);
+// Fetch next opcode from program (increases program counter):
+letopcode=self.fetch_u8(pgm);
+// We count the number of instructions we execute:
+self.op_cnt+=1;
+// If we are done, break loop and stop execution:
+ifopcode==op::FIN{
+break;
+}
+// Execute the current instruction (with the opcode we loaded already):
+self.execute_op(pgm,opcode);
+}
+// Execution terminated. Output the final state of the VM:
+println!("Terminated!");
+println!("{:?}",self);
+}
+}
+
The comments should explain, what is going on there. Initialise VM, then loop over the program, fetching one instruction at a time and executing it, until we reach the end. And you might have noticed, that our program will be very talkative. I added a lot of printlns, that tell just about everything that happens, during execution.
I guess it is time to look at those op:: constants I keep using.
/// Module holding the constants defining the opcodes for the VM.
+pubmodop{
+/// opcode: Do nothing. No oparg.
+///
+/// pop: 0, push: 0
+/// oparg: 0
+pubconstNOP: u8=0x00;
+/// opcode: Pop value from stack and discard it.
+///
+/// pop: 1, push: 0
+/// oparg: 0
+pubconstPOP: u8=0x01;
+/// opcode: Push immediate value to stack.
+///
+/// pop: 0, push: 1
+/// oparg: 1B, u8 value to push
+pubconstPUSH_U8: u8=0x02;
+/// opcode: Add top two values on stack.
+///
+/// pop: 2, push: 1
+/// oparg: 0
+pubconstADD: u8=0x10;
+/// opcode: Terminate program.
+///
+/// pop: 0, push: 0
+/// oparg: 0
+pubconstFIN: u8=0xff;
+}
+
Just 5 u8 constants there, grouped in a module as a namespace. And a lot of comments to explain them. We have 5 different operations for our VM. The only thing missing is some code, that actually executes those instructions:
implVM{
+/// Executes an instruction, using the opcode passed.
+///
+/// This might load more data from the program (opargs) and
+/// manipulate the stack (push, pop).
+fnexecute_op(&mutself,pgm: &[u8],opcode: u8){
+println!("Executing op 0x{:02x}",opcode);
+matchopcode{
+op::NOP=>{
+println!(" NOP");
+// do nothing
+},
+op::POP=>{
+println!(" POP");
+letv=self.stack.pop().unwrap();
+println!(" dropping value {}",v);
+},
+op::PUSH_U8=>{
+println!(" PUSH_U8");
+letv=self.fetch_u8(pgm);
+println!(" value: {}",v);
+self.stack.push(vasi64);
+},
+op::ADD=>{
+println!(" ADD");
+leta=self.stack.pop().unwrap();
+letb=self.stack.pop().unwrap();
+self.stack.push(a+b);
+},
+_=>{
+panic!("unknown opcode!");
+}
+}
+}
+}
+
You can think of the match as a switch statement. It is much more than that, but here we use it as one. Each of our opcodes is handled individually. And we log a lot, so that we can read what is happening, when we run it. Ignore the unwrap() thingies for the time being. They are just there to try and ignore potential runtime errors. Again, not good Rust style, but, you know: later.
The four operations get more complex in what they do. Let's go through them one by one:
NOP – this does nothing, it just wastes bytecode and execution time. I have included it simply to be the most basic operation possible.
POP – this is our first modification of the stack. It simply discards the topmost value, decreasing the stack's size by one.
PUSH_U8 – this is the only operation that reads additional data from the program. It only reads a single byte (increasing the program counter by one), and puts it on top of the stack, increasing the stack's size by one. This is how you can get data from your program into the VM, to work with them. It is how numeric literals in your program are handled.
ADD – the only operation that works on data. It pops its two operands from the stack, adds them, and pushes the sum back on the stack. This is how data is manipulated in a stack machine. The operation reduces the stack's size by one effectively, but there need to be at least 2 values on it for it to be executed.
That is the out complete VM so far, and it will execute a program, if you compile and run it (which we will do in the next post).
The easy way, to get the code and play with it, would be to clone the git repository and check out the tag v0.0.1-journey. If you did not understand any of that, you might want to do a tutorial on git, before you continue reading. Anyways, here is some copy&paste commands, you can hack into your bash prompt, to do, what I just told you to do. Use at your own risk, I'm not responsible for what you do to your system.
This will copy all source code from GitHub and its history to your computer, and it will roll the source code to the state we are looking at in this entry. The last command cargo run lovem will compile and execute the program - that is, if Rust is installed and ready to run (and in the correct version). cargo is Rust's package manager, that handles dependencies and compiles your projects. I will not explain those things further, but now you know what to look for.
Don't let yourself be confused by fancy terms like register. You can think of it as a kind of snobbish variable with a special meaning. In computers sometimes stuff magically happens when you write to a register – but it should always be documented somewhere. ↩
\ No newline at end of file
diff --git a/2022-07/all-new-once-more.html b/2022-07/all-new-once-more.html
new file mode 100644
index 0000000..2f1fb1a
--- /dev/null
+++ b/2022-07/all-new-once-more.html
@@ -0,0 +1,2 @@
+ All new once more - Lovem
Reality strikes again, and code will be written from scratch once more. And the reason is this site.
You want me to get to the code. And I really should. I have written so much already, and I want to show it, but there is so much around it. And after I had written up a long text on how I started, I realised that I had no commits during the early state. So I had to write it all again, slower, and with code to be presentable in this journal.
If you are reading this live (and no-one is, because I did not even tell anyone I am doing this), you can of course look at the code I was writing earlier, it exists. I put it in a branch too-early. But I will not give explanations to that. I am rewriting it on the master branch, and that will be showed and discussed in the journal. I advise you to wait for that.
Yes, it will take a while. As it looks now, it will be slow. But I have written some new posts on the new code already, and I think it is worth it. There will be more background before we get there. Next entry will be a longer one, so there is that.
\ No newline at end of file
diff --git a/2022-07/assemble.html b/2022-07/assemble.html
new file mode 100644
index 0000000..3e6e169
--- /dev/null
+++ b/2022-07/assemble.html
@@ -0,0 +1,94 @@
+ Assemble! - Lovem
We introduce an API for assembly to our lovem library.
Last time, we built the frame of a command line program, that will become our new assembler, lovas. It is time that we give that program the power to assemble.
lovas.rs is just the executable wrapper around the actual assembler, that will live inside the library. All lovas.rs does, is supply the command line interface. And that CLI-part does not belong in a library function. We got it nicely separated. And programs using the library can assemble source to bytecode themselves, without calling an external binary.
We alter lovas.rs a bit. The part that just printed out the source lines is gone. We replace it with a call to a new library function, that can transfer assembly code into bytecode:
fnmain()-> Result<()>{
+...thesameasbefore...
+
+// run the assembler:
+matchasm::assemble(&name,&content){
+Ok(pgm)=>{
+// we succeeded and now have a program with bytecode:
+println!("{:?}",pgm);
+Ok(())
+},
+Err(e)=>{
+// Something went wrong during assembly.
+// Convert the error report, so that `anyhow` can do its magic
+// and display some helpful error message:
+Err(Error::from(e))
+},
+}
+}
+
The important part is the call to asm::assemble(&name, &constent). We created a new module asm inside our lib. It exposes only a single function assemble and a few types for error handling. There will be a lot to unpack inside that module.
The good news for us is: we do not need to restrain ourselves as much as we do in the VM itself. Resource usage is not really an issue here, because the assembler is not meant to run in a restricted environment. The idea of lovem is, that you write your programs elsewhere, outside the restricted environment, and only run the compiled bytecode in the VM on the restricted device. And since the scope handled by the assembler will still be defined by that restricted device, we expect to only write relatively small and simple programs. With modern computers used for assembling, we can use as much memory as we want.
Oh, by the way... Yeah, I seem to stick to these short, cryptic names for the parts of lovem. VM, Pgm, op, asm - I kinda like it that way, and it goes well with the register names etc. That feels right for something as low-lever as a VM. And I give my best to always document those things properly, so that your IDE of choice will always show you, what each thing is.
I wrote a very basic assembler inside asm.rs, and it is already over 250 lines long. Quite a lot to unpack. As before, I try to explain as much as possible inside the source code itself, using comments. This makes it easier to follow, and you can even do so inside the source in the repo, without reading this blog.
There are four types that I introduce inside the mod:
/// Errors that can happen during assembly.
+#[derive(Debug, Clone)]
+pubenumAsmError{
+InvalidLine,
+UnknownInstruction(String),
+UnexpectedArgument,
+MissingArgument,
+InvalidArgument,
+}
+
+/// Report of failed assembly attempt.
+///
+/// Wraps the error that occurred during assembly and supplied information where it did.
+#[derive(Debug)]
+pubstructAsmErrorReport{
+/// Name of the program that failed to assemble.
+name: String,
+/// Line the error occurred during assembly.
+line: usize,
+/// Error that occurred.
+error: AsmError,
+}
+
+/// A single instruction parsed from the line of an assembly program.
+#[derive(Debug)]
+structAsmInstruction{
+/// Number of line the instruction was read from.
+///
+/// The number of the line the instruction was taken from, most likely
+/// from a source file. Line counting starts at 1.
+line_number: usize,
+/// Opcode defining which operation is to be executed.
+opcode: u8,
+/// Arguments used for execution of the operation.
+///
+/// Zero or more bytes.
+oparg: Vec<u8>,
+/// Position inside bytecode (starting at 0).
+///
+/// Number of bytes that come before this instruction in the program.
+pos: usize,
+}
+
+/// A assembler program during parsing/assembling.
+#[derive(Debug)]
+structAsmPgm{
+/// Name of the program (just a string supplied by caller).
+name: String,
+/// Vector of parsed assembler instructions, in the order they are in the source file.
+instructions: Vec<AsmInstruction>,
+/// Current line number during parsing.
+///
+/// Used for error reporting.
+line_number: usize,
+/// Current position inside bytecode during parsing.
+///
+/// Used to calculate the exact position an instruction will be in the bytecode.
+text_pos: usize,
+/// The error that happened during parsing/assembling, if any.
+error: Option<AsmError>,
+}
+
AsmError is easy enough to understand. We used the same idea for the RuntimeError inside the VM. When we run into an Error while trying to assemble the program, we return Err<AsmError> instead of Ok(()), so that we can propagate what happened back to the caller. The nice thing is, that with speaking names for the enum values, and with the occasional embedded value (as in UnknownInstruction(String)), the debug representation of the AsmError alone is enough to make the user understand what error was detected.
AsmErrorReport is a little wrapper we use to add the information where we ran into an error. InvalidArgument is nice hint how to fix your program - but if that program is 2000 lines long, then good luck. When you know the InvalidArgument happened in line 1337, then you will find it much faster. Especially in an assembly language, that has never more than one single instruction per line.
AsmInstruction is used to represent a single instruction inside a program. So each instance of this type will be linked to a specific line in the source file. If you don't remember, what counts as an instruction in lovem (at least at the time of writing), let me repeat: an instruction consists of exactly one operation that is to be executed, which is identified by its opcode (which is a number from 0x00 to 0xff stored in a single byte). Each instruction has zero or more bytes used as an argument, defining how the operation is to be executed. This argument is called oparg. We will also store the number of the line we found our instruction inside the source code, and the position inside the bytecode where the instruction will be.
AsmPgm will represent the complete program during the assembly process. We will collect the instructions we parse from the source in there in a Vector. And we will hold the progress during parsing/assembling. This is not the type that will be returned to the caller, it is only used internally (as you can guess by the fact that it is not defined pub).
/// Parse assembly source code and turn it into a runnable program (or create report).
+pubfnassemble(name: &str,content: &str)-> Result<Pgm,AsmErrorReport>{
+letasm_pgm=AsmPgm::parse(name,content);
+asm_pgm.to_program()
+}
+
It will return an AsmErrorReport, if anything goes wrong and the assembling fails. If the assembler succeeds, it returns an instance of Pgm. Now where does that come from? Our VM takes programs in form of a &[u8]. That will be changed soon, and then it will run programs from a special type Pgm that might have a bit more than just bytecode. I added another new module to the library: pgm.rs. That one is tiny and only holds the new struct Pgm – which itself is basic. But we have a type that holds a program, now. I believe that will be beneficial to us later.
/// Holds a program to be executed in VM.
+#[derive(Debug)]
+pubstructPgm{
+/// Some name identifying the program.
+pubname: String,
+/// Bytecode holding the programs instructions.
+pubtext: Vec<u8>,
+}
+
What is it, that the assembler does, to create such a Pgm. We will start to go through that in the next entry. This has been enough for today.
The source code for this post can be found under the tag v0.0.8-journey.
\ No newline at end of file
diff --git a/2022-07/becoming-social.html b/2022-07/becoming-social.html
new file mode 100644
index 0000000..1acb561
--- /dev/null
+++ b/2022-07/becoming-social.html
@@ -0,0 +1,2 @@
+ Becoming social - Lovem
After a few days of progress on the project itself, I spent a bit of time on the site again. We have the fancy link to our GitHub repo in the upper right corner now. But more important, I added support for comments on my entries. You can now react and ask questions or share your thought.
I am using giscus.app (and, again, I copied that idea from @squidfunk and their site on mkdocs-material, which is what I did for this complete site, more or less). Giscus is an open source app that stores the comments completely inside GitHub discussions, so the content is stored along the lovem repository and at the one place where everything is stored already anyway. If you want to participate in the comments, you need to log in using your GitHub account. That is great, because I don't need to care about user management, nor about any database.
Feel free to use this entry to try out the new feature, because that is what I am gonna do!
I have had it with these motherloving bytes in this motherloving bytecode!
By now you should have come to a realisation: writing bytecode sucks! It wasn't fun to begin with, but now that we introduce jumps in our code, we need to count how many bytes the jump takes – and that with instructions that have different numbers of bytes as opargs. Encoding negative numbers in bytes is also no fun. And just think about it: if you change your program (e.g. add a few instructions), you have to adjust those relative jumps! How horrible is that? Can't someone else do it? Well, yeah, of course. We invented a machine that can do annoying and monotone tasks that require accuracy and that must be done over and over again. That machine is, of course, the computer.
Well, lucky us, that we know how to tell a computer what it should do. So let's write a program, that writes bytecode for us. I am not talking about compiling a programming language into our VM; at least not yet, not for a long time. But something that lets us write those instructions in a way that is at least a bit more human friendly.
Maybe you remember that I already tried to write some of the bytecode programs I showed you in a more readable way, like this:
The listing up there looks a bit like assembler code. And on the earlier draft of lovem I did already write a program that could translate those listings into bytecode. We will do that again, together. But this will take us some time (that is, multiple journal entries). We need to acquire some additional Rust skills for that. And there is so much to explain inside that assembler program itself.
Once again, I am making this up along the way. Yes, I have a plan, but I will just start to introduce syntax for the assembler, and it might not be ideal. That means, I might change it all again later. As the VM itself, our assembler will be experimental. You are welcome to give me ideas for the syntax; we do have the comments now, unter each post, feel free to use them. There is the whole GitHub discussions page as well. And you can still find me on Twitter. Find the link at the bottom of this page.
The assembler will be a binary that you call with parameters. A typical command line tool, just like gcc or rustc are. So what we need to do, is to learn how one writes a command line tool in Rust. One that can read files, because I plan to write assembly programs in text files. And I have no desire to start parsing command line arguments myself. Neither do I want to write an introduction on writing command line tools in Rust. All this has been done. So I kindly direct you to an online book:
That is where I got what I will be using here. They use a crate called clap, which seems to be the most used lib for building command line tools in Rust. It takes about 10 minutes to read. Finding out how to use the options of clap that I want took longer, but that will not be a thing for you, as I will just be using those options.
This is the first time we are using external crates in Rust. We need to add our dependencies to Cargo.toml, before we can use them:
Now let us start with the assembler. We create a new binary that will become our assembler: lovas.rs
//! An experimental assembler for lovem
+useclap::Parser;
+useanyhow::{Context,Result};
+
+/// Struct used to declare the command line tool behaviour using clap.
+///
+/// This defines the arguments and options the tool provides. It is also used to
+/// generate the instructions you get when calling it with `--help`.
+#[derive(Parser, Debug)]
+#[clap(name = "lovas",
+long_about = "An experimental assembler for lovem, the Low Overhead Virtual Embedded Machine.",
+)]
+structCli{
+#[clap(parse(from_os_str), help = "Path to assembler source file.")]
+source: std::path::PathBuf,
+}
+
+fnmain()-> Result<()>{
+// read, validate, and evaluate command line parameters:
+letargs=Cli::parse();
+// read complete source file into String:
+letcontent=std::fs::read_to_string(&args.source)
+.with_context(
+||format!("could not read file `{}`",args.source.as_path().display().to_string())
+)?;
+// For now, just print our all the lines in the file:
+for(n,line)incontent.lines().enumerate(){
+println!("{:4}: '{}'",n+1,line);
+}
+// We succeeded in our work, so return Ok() as a Result:
+Ok(())
+}
+
As it happens with Rust, the code is very dense. I try to explain what I do inside the code using comments. This does not look like it does too much. Yet it does. You can call it using cargo run --bin lovas, as we learned earlier:
kratenko@jotun:~/git/lovem$ cargo run --bin lovas
+
+ Finished dev [unoptimized + debuginfo] target(s) in 0.02s
+ Running `target/debug/lovas`
+error: The following required arguments were not provided:
+ <SOURCE>
+
+USAGE:
+ lovas <SOURCE>
+
+For more information try --help
+
That is already a lot! It finds out that you did not supply a required argument and tells you in a somewhat understandable error message. We did not write any of that. And it even directs you how to get help: add --help to your call.
Now if we use cargo to run our binary, we need to add an extra bit to the call, because we need to tell cargo where its own arguments end, end where the arguments to the called binary begin. This is done (as it is custom) by adding --, to indicate the end of cargo's arguments. So if we want to pass --help to lovas, we can do it like this:
kratenko@jotun:~/git/lovem$ cargo run --bin lovas -- --help
+
+ Finished dev [unoptimized + debuginfo] target(s) in 0.02s
+ Running `target/debug/lovas --help`
+lovas
+An experimental assembler for lovem, the Low Overhead Virtual Embedded Machine.
+
+USAGE:
+ lovas <SOURCE>
+
+ARGS:
+ <SOURCE>
+ Path to assembler source file.
+
+OPTIONS:
+ -h, --help
+ Print help information
+
How helpful! Also, now you can see why I added those two strings to our Cli struct; they show up in the help message.
It looks like we need to give it a file to read, if we want the program to succeed and not exit with an error. I did write a little assembly program that we can use: hallo-stack.lass. Our assembler will not so anything too useful with it, because we did not write an assembler, yet. It will simply print out the lines of the file, prefixed with the line number (the call to .enumerate() is what I use to count the lines, while iterating over them).
Well - if you have not done so, read the book I linked. At least up until chapter 1.4, I guess, that is what we need for now.
And try to trigger some errors when calling lovas. What if the file you tell it to open does not exist? What if it cannot be read? Do you understand how those error messages propagate through the program and end up as a readable message in your console?
The source code for this post can be found under the tag v0.0.7-journey.
\ No newline at end of file
diff --git a/2022-07/early-vm-decisions.html b/2022-07/early-vm-decisions.html
new file mode 100644
index 0000000..ca1b6f9
--- /dev/null
+++ b/2022-07/early-vm-decisions.html
@@ -0,0 +1,2 @@
+ Early VM decisions - Lovem
Many design decisions must be made for lovem. Here I talk about some of those in the current state.
I have shared and discussed source code in the recent posts. Now it is time again, to write about design decisions. I made a few of them for the code you saw. So far I have not been reasoning about those here, and some of you might have wondered already. Let's talk about them.
Let me remind you: lovem is a research project for myself. And an education project for myself as well. None of my choices at this stage are set into stone. I will make lots of mistakes that I will be changing later. I even choose some paths, that I know I will be leaving again. I might just take any solution for a problem, at this stage, as I do not know, what is the right choice. So start somewhere, see where it goes. Some of those are deliberately weird or bad choices, but they make things clearer or simpler at this stage.
Let us address two of those choices you can find in the current source code.
I talked about register sizes defining architecture, back in What is a Virtual Machine anyway?. And then I went totally silent about that topic and just used i64 as type for my stack. Is that a good idea? I used it for simplicity. The idea goes back to when I was experimenting with using a register machine for lovem. Having a simple datatype that can handle big values seems simple. After all, other languages/VMs use some version of float as their single numeric datatype:
JavaScript
JavaScript Numbers are Always 64-bit Floating Point
Unlike many other programming languages, JavaScript does not define different types of numbers, like integers, short, long, floating-point etc.
JavaScript numbers are always stored as double precision floating point numbers, following the international IEEE 754 standard.
Well, reducing complexity is good. But having each little number you use in your programs eat up 8 bytes of memory does not sound low overhead to me. And that is, after all, the goal. So I guess, that will change in the future. But let's keep it for the time being. There will be some interesting things we will be doing in the near future; even if we might dump those features later. I already implemented them during the early phase (when I was not writing a public journal), so not adding them here would be insincere. Having 64 bit values is a part of our journey.
I have no glossary, yet, so you have to live with me inventing terms on the spot. I used that word in the source code already. What I mean by it, are the arguments to an instruction inside the bytecode, that follow the opcode and influence the operation. They are the arguments you give inside your program's code.
As of v0.0.3-journey we only have a single opcode that takes an oparg, and that is push_u8. You can see how there is a fetch_u8() instruction in the code that handles that operation, and none in the other operations. See execute_op.
So we have different behaviour depending on the opcode. push_u8 fetches an additional byte from the bytecode, the other opcodes do not. Existing VMs handle this differently. The Java VM, for example, has a dynamic number of opargs, too. They call them operands:
2.11. Instruction Set Summary
A Java Virtual Machine instruction consists of a one-byte opcode specifying the operation to be performed, followed by zero or more operands supplying arguments or data that are used by the operation. Many instructions have no operands and consist only of an opcode.
The Python VM on the other hand, uses exactly one byte as oparg on all instructions
The bytecode can be thought of as a series of instructions or a low-level program for the Python interpreter. After version 3.6, Python uses 2 bytes for each instruction. One byte is for the code of that instruction which is called an opcode, and one byte is reserved for its argument which is called the oparg.
[...]
Some instructions do not need an argument, so they ignore the byte after the opcode. The opcodes which have a value below a certain number ignore their argument. This value is stored in dis.HAVE_ARGUMENT and is currently equal to 90. So the opcodes >=dis.HAVE_ARGUMENT have an argument, and the opcodes < dis.HAVE_ARGUMENT ignore it.
That does remove some complexity. And adds new complexity - for opcodes with more than one oparg byte - they exist in python and are handled with a special opcode, that adds an additional oparg byte. I think it will make execution faster, as fetching can be done it advance. If you do not know, how many bytes you need, before your read your opcode, you cannot prefetch the next instructions.
For our goal, keeping the bytecode small is much more important than execution time. So I am pretty sure we will stick with the dynamic number of oparg bytes in lovem.
The source code for this post can be found under the tag v0.0.3-journey.
\ No newline at end of file
diff --git a/2022-07/go-ahead-and-jump.html b/2022-07/go-ahead-and-jump.html
new file mode 100644
index 0000000..b8ed9e2
--- /dev/null
+++ b/2022-07/go-ahead-and-jump.html
@@ -0,0 +1,124 @@
+ Go ahead and jump! - Lovem
All our programs have been linear so far. Let's build the base for jumping around.
In every program we have written so far, each instruction just advances the PC1, until we reach the end. That is very linear. We will now introduce a new opcode, that jumps to a different position in the program.
How do we implement that? That is actually quite easy. Do you remember what I said about the PC? It is a special register, that always points to the instruction in the bytecode, that is executed next. So all our operation needs to do is modify the PC. We will give that opcode an oparg of two bytes, so we can tell it, where to jump to. Here is our new opcode in op.rs:
Now we have the dreaded goto. Don't be scared - on bytecode level, that is all well. We are not designing a high level language here, there will be gotos. But how do we fetch an i16 from our bytecode? So far we can only fetch u8. So we add some more fetching:
/// Reads the next byte from the bytecode, increase programm counter, and return byte.
+fnfetch_u8(&mutself,pgm: &[u8])-> Result<u8,RuntimeError>{
+ifletSome(v)=pgm.get(self.pc){
+self.pc+=1;
+Ok(*v)
+}else{
+Err(RuntimeError::EndOfProgram)
+}
+}
+
+/// Reads the next byte from the bytecode, increase programm counter, and return byte.
+fnfetch_i8(&mutself,pgm: &[u8])-> Result<i8,RuntimeError>{
+ifletSome(v)=pgm.get(self.pc){
+self.pc+=1;
+Ok(*vasi8)
+}else{
+Err(RuntimeError::EndOfProgram)
+}
+}
+
+/// Reads the next two bytes from the bytecode, increase programm counter by two, and return as i16.
+fnfetch_i16(&mutself,pgm: &[u8])-> Result<i16,RuntimeError>{
+lethi=self.fetch_i8(pgm)?asi16;
+letlo=self.fetch_u8(pgm)?asi16;
+Ok(hi<<8|lo)
+}
+
We already know fn fetch_u8(). fn fetch_i8() does almost the exact thing, only that it casts that byte from u8 to i8. Simple enough. Casting in Rust has the beautiful syntax <value> as <type>.
So why do we need i8? Because we are building an i16 from an i8 and a u8. Just a bit of bit arithmetic. We can pass on potential EndOfProgram runtime errors easily with ? and Result. It allows us to write some short but still easy-to-read code, I think. So now we can fetch the value, we need for our jump. So let us write the handler for the opcode in fn execute_op() of vm.rs.
Yeah - Rust does not allow us to do calculations with different types of integers. We need to explicitly cast everything. Rust tries to avoid ambiguity, so no implicit conversions. And, to be honest, the compiler has a good point. We should care even more about that calculation; we want our VM to be robust. We change the handler to:
Now, let us write a new program that uses the goto opcode:
//! Create a VM and run a small bytecode program in it.
+//!
+//! This demonstrates the goto operation with an endless loop.
+uselovem::{op,VM};
+
+fnmain(){
+// Create a program in bytecode.
+// We just hardcode the bytes in an array here:
+letpgm=[op::PUSH_U8,123,op::GOTO,0xff,0xfb,op::FIN];
+// Create our VM instance.
+letmutvm=VM::new(100);
+// Execute the program in our VM:
+matchvm.run(&pgm){
+Ok(_)=>{
+println!("Execution successful.")
+}
+Err(e)=>{
+println!("Error during execution: {:?}",e);
+}
+}
+}
+
I will write that bytecode down in a more readable format again:
push_u8 123
+goto -5
+fin
+
Only 3 instructions. And the fin will never be reached. That 0xff, 0xfb after the op::GOTO is the 2 byte oparg: an i16 with the value -5. But why -5? When the goto executed, we have read both oparg bytes, so the PC points to the fin at index 5. So adding -5 to it will set the PC to 0. The next executed instruction will be the push_u8 once again. This is an endless loop. So will the program run forever? What do you think will happen? Let's try:
There is a push_u8 operation in our endless loop. So it will fill our stack until it is full! The program hits a runtime error after 200 executed instructions. Great, now we tested that, too.
That is not very dynamic. We want to make decisions! We want to choose our path. What we want is branching. We will introduce a new opcode, that will decide, which branch the execution of our program will take, based on a value during runtime. If this sounds unfamiliar to you, let me tell you, what statement we want to introduce: it is the if statement.
So, how does that work? As mentioned, normally the PC is incremented on each byte we fetch from the bytecode. And the PC always points to the next instruction, that will be executed. So if we want to change the path of execution, what we have to do is change the value of the PC.
An operation, that simply changes the PC statically, would be a GOTO statement. But there is no branching involved in that, the path that will be executed is always clear. The if statement on the other hand only alters the PC, if a certain condition is met.
/// opcode: Branch if top value is equal to zero.
+///
+/// pop: 1, push: 0
+/// oparg: 2B, i16 relative jump
+pubconstIFEQ: u8=0x20;
+
Our new operation pops only one value. So what does it get compared to? That's easy: zero. If you need to compare two values to each other, just subtract them instead, and then you can compare with zero. That gives the same result.
And what kind of oparg does this operation take? A signed integer. That is the value that should be added to the PC, if our condition is met. This will result in a relative jump.
Same as always. Write some bytecode. Try some jumping around. Run into troubles! You can write a program, that has a fin in the middle, but executes code that lies behind that instruction.
The source code for this post can be found under the tag v0.0.6-journey.
\ No newline at end of file
diff --git a/2022-07/index.html b/2022-07/index.html
new file mode 100644
index 0000000..7844ec7
--- /dev/null
+++ b/2022-07/index.html
@@ -0,0 +1 @@
+ Journal entries from July 2022 - Lovem
So far we have read an assembly source file into a string, and we got to know some new data structures. It is time we use the one to fill the other. Let us start parsing.
We introduce an API for assembly to our lovem library.
Last time, we built the frame of a command line program, that will become our new assembler, lovas. It is time that we give that program the power to assemble.
I have had it with these motherloving bytes in this motherloving bytecode!
By now you should have come to a realisation: writing bytecode sucks! It wasn't fun to begin with, but now that we introduce jumps in our code, we need to count how many bytes the jump takes – and that with instructions that have different numbers of bytes as opargs. Encoding negative numbers in bytes is also no fun. And just think about it: if you change your program (e.g. add a few instructions), you have to adjust those relative jumps! How horrible is that? Can't someone else do it? Well, yeah, of course. We invented a machine that can do annoying and monotone tasks that require accuracy and that must be done over and over again. That machine is, of course, the computer.
All our programs have been linear so far. Let's build the base for jumping around.
In every program we have written so far, each instruction just advances the PC[^pc], until we reach the end. That is very linear. We will now introduce a new opcode, that jumps to a different position in the program.
We are using the design of a stack machine to efficiently execute some calculations.
The way stack machines work can be used in programs that execute calculations. We will look at it by implementing an example from the Wikipedia page about stack machines.
The basic operation of the VM is working. Let us add a few more opcodes, so that we can do calculations.
We have created a rust library that holds our virtual register machine. We can now add multiple executables to it, so that makes it easier, to write different programs and keep them (to mess around with the VM). We will add a few more opcodes to our repertoire, because only adding numbers is just plain boring.
Many design decisions must be made for lovem. Here I talk about some of those in the current state.
I have shared and discussed source code in the recent posts. Now it is time again, to write about design decisions. I made a few of them for the code you saw. So far I have not been reasoning about those here, and some of you might have wondered already. Let's talk about them.
We turn our project from a binary project into a library project.
So far, our lovem cargo project holds a single binary. That is not very useful for something that should be integrated into other projects. What we need is a library. How is that done? Simple: we rename our main.rs to lib.rs.
After a few days of progress on the project itself, I spent a bit of time on the site again. We have the fancy link to our GitHub repo in the upper right corner now. But more important, I added support for comments on my entries. You can now react and ask questions or share your thought.
After we got our Proof of Concept running, we clean up our code and make it look like a respectable Rust program.
Did you play around with the program from the previous post? If you are new to Rust, you really should! At least mess around with our bytecode. You should find, that our VM does not react well to errors, yet. It simply panics! That is no behaviour for a respectable rust program.
Now, that we have a VM, we will run a program on it.
So we built our very first VM and studied the code in detail. It is time to execute a program on it and look at it's output. We will look at every single step the program takes. Aren't we lucky, that our VM is so talkative during execution?
The first draft of source code, that will be our VM, explained.
I dumped some source code in front of you, and then I started to talk about programming languages. Time now, to explain what I did and why. We only have 132 lines, including comments. We will go through all parts of it. And I will talk a little about how Rust's basic syntax works, while I use it. Not too much, since it is not good Rust code, yet, but to help you start. This will be a longer entry.
Finally, I will be showing some source code. Not directly in the journal, but I will link you to GitHub, for a start.
I have written code. And this time, I (re-)started lovem in a public git repository, so you can see what I do, if you are interested. And I hope it puts enough pressure on me, to keep on the project for a while.
So I have been talking a lot about VMs without doing anything concrete. Well that is not true, I have done quite a bit already, but I am still describing earlier steps. We will get there.
When I was looking around for a scripting language to use inside our embedded devices, I came across an article I mentioned in an earlier post: [Creating a Virtual Machine/Register VM in C][register-book].
Reality strikes again, and code will be written from scratch once more. And the reason is this site.
You want me to get to the code. And I really should. I have written so much already, and I want to show it, but there is so much around it. And after I had written up a long text on how I started, I realised that I had no commits during the early state. So I had to write it all again, slower, and with code to be presentable in this journal.
Since I am always focused on my work on lovem, I will never get sidetracked. Unrelated: I spent a few days on reworking the journal on this site.
So, no update on the core project today, sorry. I was very unhappy with my first solution, on how the Journal entries where created. Way too much to do by hand – that is not what I learned programming for. But mkdocs is python, and python I can do. So did. And now I can write my Journal entries (like this one) as plain Markdown files with very few metadata entries. And I get entries in the navigation and pages listing the whole month. I even included a whole month in single page version of the journal. I feel it is quite fancy. I will need to do a bit of work on the static content of the site, but one step at a time.
\ No newline at end of file
diff --git a/2022-07/it-looks-so-weird.html b/2022-07/it-looks-so-weird.html
new file mode 100644
index 0000000..9c59f90
--- /dev/null
+++ b/2022-07/it-looks-so-weird.html
@@ -0,0 +1,2 @@
+ It looks so weird - Lovem
It is not the original initial commit, as I did commit way too late, and it was not suitable for writing a story about it. So I created a new, clean version, with just very simple concepts that I can explain in a single entry. In the next entry, that is.
If you are thinking: "What is that weird source code?", then you are in for a real treat (and a lot of pain), should you chose to follow up. The code you are seeing is written in Rust.
Why Rust? Because Rust! Writing Rust can feel so good! And for something like a VM, it is such a good choice. If you have never heard of the language (or heard of it, but never looked into it), it is hard to understand why this is so. My advice: try it! use it! Or read along this journal, code along, you might like it.
When you start, chances are high that you will not like Rust. The compiler is a pedantic pain in the ass. But at the same time it is incredibly polite, trying to help you find out, what you did wrong, and suggesting what you might want to do instead. And Rust really, really tries, to keep you from shooting yourself in the foot. It tries to make common mistakes impossible or at least hard to do – those mistakes that happen everywhere in C/C++ programs and their like. Yes, those mistakes that are the cause of the majority of all security problems and crashes. Buffer overruns, use after free, double free, memory leak – to name just some common ones from the top of my head. And Rust makes all it can to make those mistakes impossible during compilation! So it does not even add runtime overhead. That is so powerful!
And it is so painful. Half of the things you do, when writing C/C++, you will not be able to do in Rust in the same way. Every piece of memory is owned. You can borrow it and return it, but it cannot be owned in two places at once. And if any part of the program has writing access to it, no other part may have any access. This makes some data structures complicated or impossible (there are ways around it), and you will have to think quite differently. But if you give in on that way of thinking, you can gain so much. Even peace of the mind, as the coding world will look a lot saner inside Rust source code. This will, of course, come with the price, that all code in other languages will start to feel dirty to you, but that is the way.
Also, there are a lot of ways to write code, that you cannot add to a language that already exists. C and C++ will never be freed of their heritage; they will stay what they are, with all their pros and cons. Things are solved differently in Rust. Did I mention there is no NULL? And I have never missed it for a moment. Rust solves the problems other languages solve with NULL by using enums. That comes with certainty and safety all the way. There are no exceptions either. That problem is also solved by using enums. The way the language embraces those, they are a really powerful feature! And there are lot more convenient ways of organising code, that I keep missing in my daily C/C++ life.
I will not write an introduction into Rust here. At least not your typical "how to get started in rust" intro. There are a lot of those out there, and I am already 10 posts into my Journal without programming. Maybe the Journal will become a different kind of Rust introduction, as it will try to take you along a real project, as it develops, from the beginning on. I will run into problems along the way and try to solve them in Rusty ways. This might be a good way, to start thinking in Rust. But, to be honest, I did never finish a project in Rust, yet. I got quite a bit running and functional, and I think in some parts in a rust-like way. But this is for me as much as anyone else as a learning project. I will make weird things. But the basics, I have worked with, yeah.
The initial learning curve will be steep! I try to not get too fancy in the first draft, so the code will not be good Rust there! So, if you are shocked at how bad my Rust is – it will be very different, soon. But I want to give everyone a fair chance to hop on without understanding all the concepts. The initial code should be not too hard to follow, if you know C/C++, I hope. Learning a new thing (writing a VM) in a new, quite different language is a mouth full, I know.
Yes I did say that. And I do use those. It is not easy to change that, when you have a certain amount of legacy code (and not much experience with the new language, as we do not really have, yet). But we do have a saying these days. Often, after a debugging session that lasted for hours, when we find the bug, understand it and fix it, there is this realisation, that fits in the sentence:
"Mit Rust wär' das nicht passiert." — "This would not have happened with Rust."
So, this will not happen to me with this project, because those things will not happen with Rust!
\ No newline at end of file
diff --git a/2022-07/let-there-be-source-code.html b/2022-07/let-there-be-source-code.html
new file mode 100644
index 0000000..c338c5b
--- /dev/null
+++ b/2022-07/let-there-be-source-code.html
@@ -0,0 +1,2 @@
+ Let there be source code - Lovem
Finally, I will be showing some source code. Not directly in the journal, but I will link you to GitHub, for a start.
I have written code. And this time, I (re-)started lovem in a public git repository, so you can see what I do, if you are interested. And I hope it puts enough pressure on me, to keep on the project for a while.
In fact, there is quite a bit of code there already. I started coding, before writing any of this, and it went so well. I like how it feels. I was working any hour I could spare. When a friend asked me what I was doing, I started a somewhat complex backstory why I was doing it, instead of actually explaining anything of the stuff I was doing – and was interrupted quite early, so there was more to tell in me still. The next day, I sat down and started to write all of that down as a little story. I wanted to put it somewhere, so I started this journal to publish it. And I decided to do it in blog form, so I am publishing that background story bit by bit.
So, as of writing this, there is a lot of work completed on the VM. Tt is amazing what things it can do for how little code there is. When this post goes public, there should be quite lot more done...
I plan to continue sharing my thoughts while I work on the VM. So you will be able to follow my failures and see the attempts that I will be ditching later. I think the format of this journal can work out, but we will see how I like it over time. It will be behind on progress, as I want to take time to share things as they unfold. And this should help to produce a somewhat continuous publication stream. Git being what git is, should support me in showing you the things I do back in time, using the power of commits.
As things are with blogs, my entries will be very different, depending on what I want to tell and on what I did. So far most blogs where conceptional thinking, some research, and a lot of blabla, which I tell because it interests me myself. In the future, there should be concrete problems I find and solve in source code - or which I fail to solve.
Me original first commit was way too late and contained way too much code. Also, I did not plan to show it to you like this, back then. So, as mentioned before, I rolled back and started again, with more commits. And I am keeping tags now, so that I have well-defined versions for my blog posts. That should make it easy for you to follow up, if you want to.
The new, artificial "first commit" is now a tag/release: v0.0.1-journey. You can view the code for any tag online, this one you will find under:
I think this will be a theme of this journal: linking you to what I did, when I am writing about it. And I will try to share my trails of thoughts, leading to my decisions (and errors, as it will be). I will do that, for that v0.0.1-journey, soon, don't worry, I will explain everything I did. But the next journal entry will be about some decisions again; mainly about the language I am using.
\ No newline at end of file
diff --git a/2022-07/making-virtual-a-reality.html b/2022-07/making-virtual-a-reality.html
new file mode 100644
index 0000000..925751b
--- /dev/null
+++ b/2022-07/making-virtual-a-reality.html
@@ -0,0 +1,2 @@
+ Making virtual a reality - Lovem
So I have been talking a lot about VMs without doing anything concrete. Well that is not true, I have done quite a bit already, but I am still describing earlier steps. We will get there.
When I was looking around for a scripting language to use inside our embedded devices, I came across an article I mentioned in an earlier post: Creating a Virtual Machine/Register VM in C.
Reading it made me want to try working with a register machine, mainly because I have not been stuff like this since my early semesters. Never hurts to refresh rusty knowledge.
So I started designing a register VM, starting from that code, but more complex, with longer data words and longer instruction words, more registers, and so forth. For this project I came up with lovem as a working title. It still stuck to now, two approaches and a year later. I also started implementing some concepts I still want to add to lovem in my current approach, but that is for a later post to discuss.
I was experimenting with a quite complicated instruction word encoding. I was trying to fit everything in a few bits (32 of them if I recall correctly) with varying instruction code length and quite long arguments. I wanted to include instructions on three registers, which takes up quite some bits to address. Of course, you can get away with two-register operations only - or if you are fancy you can even use a single address or even no address for most instructions. You will just end up with a lot of register swapping. I guess my rational for having three addresses in an instruction was code size. For what I want to do, 32 bit instruction words feel quite long (4 bytes per instruction!). And every swap would mean another 4 bytes of program size. So trying to optimise for fewer operations by having more flexible instructions.
I do not even know if that rational makes sense. I guess I would have needed to try different layouts to find out. Or maybe read more about that topic, other people have done similar things I assume. But I never got that far. The experiment showed me, that I do not want to build lovem as a register machine. I think building a clever register based architecture for my goals would make it too complicated. I want simple. To reduce the VM's overhead, but also on principle. Complexity is the enemy.
I'm pretty sure, that code still exists somewhere, but there is no sense in publishing it or even in me reading it again, so you will never see it. I think of it as a pre-study with a very useful conclusion: not a register machine.
So a stack machine it is! I have looked at a few during my research for lovem, looking at instruction sets and design ideas. It is not the first time, I have been working with those. In a different project (around the same time I started work on the register based machine), I was starting to implement a stack machine. That one had a different aim and therefore very different challenges. It was more of an object-oriented approach with dynamic program loading and calling code in different programs. It could do quite a few things already, but it will never be continued. I learned a bit about calling conventions and found out that it is not so simple, when you want to switch between multiple programs and objects. That is where the project got too frustrating for me (and some external events made it obsolete, so that is okay). But I take it for a pre-study on stack machines and calling conventions. Not that I have developed a proven concept for it, but I know about the problems there...
I had a PoC for lovem as a stack machine back then, too (right after I ditched the register approach). That code won't be published either, but the attempt showed me, that I want to take that road for a serious approach on creating lovem.
I guess this concludes the prehistory of the lovem story. I am, for whatever reason, back on the project, currently with a decent amount of motivation. You never know how long that lasts, but right now I like the idea of continuing the development, while talking about the development process, sharing my thoughts on decisions I make. Next post should start on sharing newer thoughts.
\ No newline at end of file
diff --git a/2022-07/more-operations.html b/2022-07/more-operations.html
new file mode 100644
index 0000000..db655b7
--- /dev/null
+++ b/2022-07/more-operations.html
@@ -0,0 +1,126 @@
+ More operations - Lovem
The basic operation of the VM is working. Let us add a few more opcodes, so that we can do calculations.
We have created a rust library that holds our virtual register machine. We can now add multiple executables to it, so that makes it easier, to write different programs and keep them (to mess around with the VM). We will add a few more opcodes to our repertoire, because only adding numbers is just plain boring.
I put some sort into what opcodes to introduce; but be advised, that none of them are final. Not only is the VM experimental and in a very early state, I introduce codes that I do not intend to keep on purpose. This is also a demonstration/introduction. So I add codes that are helpful at the time of writing, for experimenting. FIN is an example of a code, that will most likely be removed at some point. But for now it is nice to have a simple way to explicitly terminate the program. It gives some confidence, when we reach that point, that our program works as intended, and that we did not mess up the bytecode.
Baby steps. No rush here. We had adding as a first example. We will introduce subtraction, multiplication, division, and modulo. Sounds like not much, but we will run in some complications, anyways... Here is our addtion to op.rs.
/// opcode: Subtract top two values on stack.
+///
+/// pop: 2, push: 1
+/// oparg: 0
+pubconstSUB: u8=0x11;
+
+/// opcode: Multiply top two values on stack.
+///
+/// pop: 2, push: 1
+/// oparg: 0
+pubconstMUL: u8=0x12;
+
+/// opcode: Divide top two values on stack.
+///
+/// pop: 2, push: 1
+/// oparg: 0
+pubconstDIV: u8=0x13;
+
+/// opcode: Calculate modulo of top two values on stack.
+///
+/// pop: 2, push: 1
+/// oparg: 0
+pubconstMOD: u8=0x14;
+
Simple enough those new codes, just copy and paste from ADD. But it turns out, subtraction is not as easy as addition. Here is the handling code we used for ADD:
As my math teacher liked to say: "... dann fliegt die Schule in die Luft!" – If we do that the school building will blow up. It is his way of dealing with the issue, that pupils are told "you must never divide by zero", but that they are never given an understandable reason for it. So just own it, and provide a completely absurde one.
What happens, is we keep it like this? Well, not much - until you write a program that divides by zero. Then, this will happen:
[...]
+VM { stack: [4, 0], pc: 4, op_cnt: 2 }
+Executing op 0x13
+ DIV
+thread 'main' panicked at 'attempt to divide by zero', src/vm.rs:142:31
+stack backtrace:
+ 0: rust_begin_unwind
+ at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/std/src/panicking.rs:584:5
+ 1: core::panicking::panic_fmt
+ at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/panicking.rs:143:14
+ 2: core::panicking::panic
+ at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/panicking.rs:48:5
+ 3: lovem::vm::VM::execute_op
+ at ./src/vm.rs:142:31
+ 4: lovem::vm::VM::run
+ at ./src/vm.rs:85:13
+ 5: modulo::main
+ at ./src/bin/modulo.rs:10:11
+ 6: core::ops::function::FnOnce::call_once
+ at /rustc/fe5b13d681f25ee6474be29d748c65adcd91f69e/library/core/src/ops/function.rs:227:5
+note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
+
+Process finished with exit code 101
+
Our program panics! I told you earlier, that this is not good behaviour. I introduced you to a lot of weird Rust stuff, just to avoid those. So, let us not re-introduce them now. So, what can we do instead?
Division by zero is a runtime error, for sure (at least in this numerical domain we are working with). But it should not be a runtime error in our virtual machine, it should be a runtime error in the program it is running. Luckily, we already have that mechanism in our VM. So let us add a new runtime error:
/// An error that happens during execution of a program inside the VM.
+#[derive(Debug, Clone, PartialEq)]
+pubenumRuntimeError{
+EndOfProgram,
+UnknownOpcode(u8),
+StackUnderflow,
+StackOverflow,
+DivisionByZero,
+}
+
We add a check for the DIV and MOD handlers (modulo is a division as well). If we run that program dividing by zero again, we now get this:
[...]
+VM { stack: [4, 0], pc: 4, op_cnt: 2 }
+Executing op 0x13
+ DIV
+Error during execution: DivisionByZero
+
+Process finished with exit code 0
+
Yes, it still fails. But only the execution of the bytecode fails, not the execution of our virtual machine. You can now handle the problem inside your Rust program in a way that fits your needs. Much better. In the next post, we will be using our new instructions in a fancy way, that works well with a stack machine.
Oh, not sure. Play around with it, I guess? As always. Feel free to write a calculation into a program and compare the results. It should work, unless I messed up again. You should have at least, at some point, write a program in bytecode yourself, so that you know how that feels.
The source code for this post can be found under the tag v0.0.4-journey.
\ No newline at end of file
diff --git a/2022-07/parsing-the-source.html b/2022-07/parsing-the-source.html
new file mode 100644
index 0000000..5b2e25b
--- /dev/null
+++ b/2022-07/parsing-the-source.html
@@ -0,0 +1,80 @@
+ Parsing the source - Lovem
So far we have read an assembly source file into a string, and we got to know some new data structures. It is time we use the one to fill the other. Let us start parsing.
What we know so far is this:
/// Parse assembly source code and turn it into a runnable program (or create report).
+pubfnassemble(name: &str,content: &str)-> Result<Pgm,AsmErrorReport>{
+letasm_pgm=AsmPgm::parse(name,content);
+asm_pgm.to_program()
+}
+
Our experimental assembler will begin using a simple syntax. Only one instruction per line, short opnames to identify the operation to be executed, optionally a single argument. I have written a short program: hallo-stack.lass.
push_u8 123
+push_u8 200
+add
+pop
+fin
+
Straightforward. And you know the syntax already from my human friendly listings of bytecode. Parsing that looks simple. We do want to allow adding whitespaces, though. And we want to allow comments, for sure. Our assembler needs to handle a bit of noise, as in noice.lass.
# This is an awesome program!
+ push_u8 123
+push_u8 200 # What are we using the # 200 for?
+
+
+add
+ pop
+
+
+# let's end it here!
+fin
+
Those two programs should be identical and produce the same bytecode.
The parse() function we call creates an empty instance of AsmPgm and then processes the source file line after line, filling the AsmPgm on the way.
/// Parse an assembly program from source into `AsmPgm` struct.
+fnparse(name: &str,content: &str)-> AsmPgm{
+// create a new, clean instance to fill during parsing:
+letmutp=AsmPgm{
+name: String::from(name),
+instructions: vec![],
+line_number: 0,
+text_pos: 0,
+error: None,
+};
+// read the source, one line at a time, adding instructions:
+for(n,line)incontent.lines().enumerate(){
+p.line_number=n+1;
+letline=AsmPgm::clean_line(line);
+ifletErr(e)=p.parse_line(line){
+// Store error in program and abort parsing:
+p.error=Some(e);
+break;
+}
+}
+p
+}
+
content.lines() gives us an iterator that we can use to handle each line of the String content in a for loop. We extend the iterator by calling enumerate() on it; that gives us a different iterator, which counts the values returned by the first iterator, and adds the number to it. So n will hold the line number and line will hold the line's content.
We always keep track of where we are in the source. Because the enumerate() starts counting at 0 (as things should be), we need to add 1. File lines start counting at 1. The first thing we do with the line is cleaning it. Then it gets processed by parse_line(line). If this produces an error, we will store that error and abort parsing. All our errors are fatal. The final line p returns the AsmPgm. We do not use a Result this time, but the AsmPgm can contain an error. Only if its error field is None, the parsing was successful.
/// Removes all noise from an assembler program's line.
+fnclean_line(line: &str)-> String{
+// Remove comments:
+letline=ifletSome(pair)=line.split_once("#"){
+pair.0
+}else{
+&line
+};
+// Trim start and end:
+letline=line.trim();
+// Reduce all whitespaces to a single space (0x20):
+ANY_WHITESPACES.replace_all(line," ").to_string()
+}
+
We use multiple techniques to clean our input: splitting, trimming, regular expressions. When we are done, we only have lines as they look in hallo-stack.lass. The cleaned line can also be completely empty.
I want to add a word about that regexp in ANY_WHITESPACES. Where does it come from? I am using some more Rust magic there, and the crate lazy_static:
uselazy_static::lazy_static;
+useregex::Regex;
+
+// Regular expressions used by the assembler.
+// lazy static takes care that they are compiled only once and then reused.
+lazy_static!{
+staticrefANY_WHITESPACES: Regex=regex::Regex::new(r"\s+").unwrap();
+staticrefOP_LINE_RE: Regex=regex::Regex::new(r"^(\S+)(?: (.+))?$").unwrap();
+}
+
I do not pretend to understand the macro magic that happens here. But what happens, is that the regular expressions are compiled only once and then kept as some sort of global static immutable variable, that we can than use again and again all over the program as a reference. Static references are a convenient thing in Rust, if you remember what I told you about ownership. You can always have as many references to immutable static variables, because there is nothing that can happen to them, and they exist throughout the complete runtime of the program.
/// Handles a single cleaned line from an Assembly program.
+fnparse_line(&mutself,line: String)-> Result<(),AsmError>{
+ifline==""{
+// empty line (or comment only) - skip
+returnOk(());
+}
+ifletSome(caps)=OP_LINE_RE.captures(&line){
+letopname=caps.get(1).unwrap().as_str();
+letparm=caps.get(2).map(|m|m.as_str());
+returnself.parse_instruction(opname,parm);
+}
+Err(AsmError::InvalidLine)
+}
+
parse_line() processes each line. Empty ones are just skipped. We use another regular expression, to find out if they match our schema. Because we cleaned it the expression can be rather simple: r"^(\S+)(?: (.+))?$". We look for one or more non-empty chars for our opname. It can be followed by a single argument, which must consist of one or more chars, separated by a single space. That is our optional oparg. If the line fits, we found an introduction we can try to parse. That is the job of parse_instruction(). Everything that is neither empty nor an instruction, is an error, that we can simply return. It will abort the parsing and the caller will know, that there was an invalid line.
parse_instruction() can also run into an error. We use our tried pattern of returning a Result where the successful outcome does not carry any additional information (which is why we return Ok(())). The error case will return an AsmError, that carries the reason for the error. And because of our the Result type and because of Rust's might enum system, we can simply return what parse_instruction() returns to us.
Handling the instruction itself will be handled in the next entry.
The source code for this post can be found under the tag v0.0.8-journey.
We are using the design of a stack machine to efficiently execute some calculations.
The way stack machines work can be used in programs that execute calculations. We will look at it by implementing an example from the Wikipedia page about stack machines.
I will quote a lot of it here. You can see the full text of the article and its authors when you follow the Wikipedia permalink to the article.
Design
Most or all stack machine instructions assume that operands will be from the stack, and results placed in the stack. The stack easily holds more than two inputs or more than one result, so a rich set of operations can be computed. In stack machine code (sometimes called p-code), instructions will frequently have only an opcode commanding an operation, with no additional fields identifying a constant, register or memory cell, known as a zero address format.1 This greatly simplifies instruction decoding. Branches, load immediates, and load/store instructions require an argument field, but stack machines often arrange that the frequent cases of these still fit together with the opcode into a compact group of bits.
The instruction set carries out most ALU actions with postfix (reverse Polish notation) operations that work only on the expression stack, not on data registers or main memory cells. This can be very convenient for executing high-level languages, because most arithmetic expressions can be easily translated into postfix notation.
For example, consider the expression A*(B-C)+(D+E), written in reverse Polish notation as A B C - * D E + +. Compiling and running this on a simple imaginary stack machine would take the form:
# stack contents (leftmost = top = most recent):
+push A # A
+push B # B A
+push C # C B A
+subtract # B-C A
+multiply # A*(B-C)
+push D # D A*(B-C)
+push E # E D A*(B-C)
+add # D+E A*(B-C)
+add # A*(B-C)+(D+E)
+
Well, I don't know about a "simple imaginary stack machine" - but as it happens to be, we have a very real simple stack machine at our disposal. You know where we will be going next!
The program from the Wikipedia article uses 5 variables A to E. We do not support any kind of variables, yet, but that isn't important here. We use immediates (literals from your program) to put some concrete values into the calculation. Let's just take some numbers, totally at random:
//! A small program demonstrating execution of arithmetics in our VM.
+//!
+//! For an explanation of what we are doing here, look at this wikipedia article:
+//! https://en.wikipedia.org/w/index.php?title=Stack_machine&oldid=1097292883#Design
+uselovem::{op,VM};
+
+// A*(B-C)+(D+E)
+// A B C - * D E + +
+// A = 5, B = 7, C = 11, D = 13, E = 17
+// 5 * (7 - 11) + (13 + 17) = 10
+
+fnmain(){
+// Create a program in bytecode.
+// We just hardcode the bytes in an array here:
+letpgm=[op::PUSH_U8,5,op::PUSH_U8,7,op::PUSH_U8,11,op::SUB,op::MUL,
+op::PUSH_U8,13,op::PUSH_U8,17,op::ADD,op::ADD,op::POP,op::FIN];
+// Create our VM instance.
+letmutvm=VM::new(100);
+// Execute the program in our VM:
+matchvm.run(&pgm){
+Ok(_)=>{
+println!("Execution successful.")
+}
+Err(e)=>{
+println!("Error during execution: {:?}",e);
+}
+}
+}
+
The comments spoil the result, but we want to check it calculates correctly, so that is okay. The program is the same as before: create a VM and run some hardcoded bytecode on it. Since the VM logs excessively, we will see what happens, when we run it. So the only new thing here is the bytecode program. I'll write it down in a more readable form:
To no-ones surprise, this code is the same as in the article - only with the variables replaced by numbers, and I added a pop and a fin at the end, to keep our program clean.
The output shows you the stack after every instruction. You can compare it to the stack contents in the Wikipedia listing, and you will find them identical (the order of the stack listing is switched, and of course you have numbers instead of arithmetic expressions with variables – but if you insert our numbers on the Wikipedia listing they should match).
Our PoC stack machine really can do what the imaginary one is claimed to do. That's nice.
You should really read the article on Reverse Polish Notation(permalink to article at time of writing). It will give some background on why it is important, not at least historically. The Z3, for example, arguably the first computer built by mankind2, was using it.
The source code for this post can be found under the tag v0.0.5-journey.
Yeah, I know. The answer to the question "What was the first machine to qualify as a computer?", differs, depending on whom you ask – and also on the country you ask the question in. But the Z3 is a prominent candidate. ↩
\ No newline at end of file
diff --git a/2022-07/running-our-first-program.html b/2022-07/running-our-first-program.html
new file mode 100644
index 0000000..c359095
--- /dev/null
+++ b/2022-07/running-our-first-program.html
@@ -0,0 +1,55 @@
+ Running our first program - Lovem
Now, that we have a VM, we will run a program on it.
So we built our very first VM and studied the code in detail. It is time to execute a program on it and look at it's output. We will look at every single step the program takes. Aren't we lucky, that our VM is so talkative during execution?
If you missed the code, look at the previous post, A VM.
It is quite talkative. And isn't it nice, how easy it is, to print the complete state of our VM in Rust? And it costs no overhead during runtime, as it is generated during compilation for us. Isn't that something?
So, what is happening there? Our program pgm looks like this:
That are 8 bytes that consist of 6 instructions. Each instruction has a 1 byte opcode. Two of those instructions (the PUSH_U8) have one byte of argument each, making up the remaining two bytes of our program. Here they are listed:
NOP
PUSH_U8 [100]
PUSH_U8 [77]
ADD
POP
FIN
The NOP does not do anything. I just put it in front of the program to let you see fetching, decoding, and executing without any effects:
We just increased the program counter by one (we advance one byte in the bytecode), and the operation counter counts this executed instruction. Let's look at the next instruction, that is more interesting:
Here the PC is increased by two. That happens, because we fetch an additional value from the bytecode. The op_cnt is only increased by one. And we now have our first value on the stack! It is the byte we read from the bytecode. Let's do that again:
Now there is only one value left on the stack, and it is the sum of the two values we had. There happened quite a lot here. The two values we had before where both popped from the stack (so it was shortly empty). The add operation adds them, and pushes their sum back on the stack. So now there is one value on the stack, and it is the result of our adding operation.
What's next?
VM { stack: [177], pc: 6, op_cnt: 4 }
+Executing op 0x01
+ POP
+ dropping value 177
+VM { stack: [], pc: 7, op_cnt: 5 }
+
It is always nice to leave your workplace all tidied up, when you are done. We can do that by popping our result back from the stack, leaving it empty. And besides, our POP operation prints the value it drops. One more instruction to go:
So, we ran a program in a VM. Hooray, we are done. Only 132 lines of code, including excessive comments and logging. That was easy.
Well yeah - it doesn't do much. But you can understand the root principle that makes up a stack machine. It's that simple.
Go play around with it a bit. It is the best way to learn and to understand. I mean it! Write a longer program. What happens to the stack? Add another opcode – how about subtraction? Will your program execute at all? What happens, if it does not?
\ No newline at end of file
diff --git a/2022-07/state-of-the-journal.html b/2022-07/state-of-the-journal.html
new file mode 100644
index 0000000..7d622a5
--- /dev/null
+++ b/2022-07/state-of-the-journal.html
@@ -0,0 +1,2 @@
+ State of the Journal - Lovem
Since I am always focused on my work on lovem, I will never get sidetracked. Unrelated: I spent a few days on reworking the journal on this site.
So, no update on the core project today, sorry. I was very unhappy with my first solution, on how the Journal entries where created. Way too much to do by hand – that is not what I learned programming for. But mkdocs is python, and python I can do. So did. And now I can write my Journal entries (like this one) as plain Markdown files with very few metadata entries. And I get entries in the navigation and pages listing the whole month. I even included a whole month in single page version of the journal. I feel it is quite fancy. I will need to do a bit of work on the static content of the site, but one step at a time.
I want to write my Journal entries (aka blog posts) as a nice standalone markdown file, one file per entry. I will need to add include a bit of metadata, at least the release date/time. And I want the entries to look fancy without adding the fanciness to each file. Maybe I will be changing the layout later, hmm? And create those teaser pages for me, thank you very much.
I use a plugin called mkdocs-gen-files, by @oprypin, that creates additional mkdocs source files on the fly. It does not really put the files on disk, but they are parsed by mkdocs, as if they were in the docs directory.
I have a directory journal next to my docs directory, where I put all my posts in a single markdown file each. My script walks through that directory, and processes each file. The content is modified a bit (to put in the card with the author's name and other metadata), and then put in a virtual file inside docs, so that the pages with the entries are created by mkdocs, as if I hat them inside docs.
The script also generates two pages for each month: one that shows that month's posts as teasers, with a "continue reading" link, and a second one that shows all posts from a month on a single page, so that you can read them without changing pages all the time.
The remaining part is adding all the pages, that the script creates, to the navigation in a way that makes sense. The order is a critical part, being a central aspect of a journal or a log. For that I use another plugin by @oprypin: mkdocs-literate-nav. With it, you can control your navigation (completely or in parts) by adding markdown source files with lists of links. This goes together well with the gen-files plugin, because I can just create that navigation files with it in my script.
The plugins are a bit light on the documentation side. It took me a while to understand, that you cannot do multiple layers of nested navigation in those files. That is not a problem, because you can always just add another nesting layer by adding more of those nav files as children. Also, what you can do in those files is very limited. I wanted to do some fancy things in the navigation (adding a second link in a single line with alternative representation). I would guess that those limitations come from the ways mkdocs itself handles the navigation, so that is okay. But a word on that would have been nice. And the error messages popping up did not help at all, because the actual error happens way later in the process inside mkdocs itself and is some weird side effect problem.
If you want to take a look, see blogem.py. That will be the script in its current state. For the version of the script at the time of writing, see the permalink, the original blogem.py.
\ No newline at end of file
diff --git a/2022-07/to-the-library.html b/2022-07/to-the-library.html
new file mode 100644
index 0000000..b599a4e
--- /dev/null
+++ b/2022-07/to-the-library.html
@@ -0,0 +1,44 @@
+ To the library! - Lovem
We turn our project from a binary project into a library project.
So far, our lovem cargo project holds a single binary. That is not very useful for something that should be integrated into other projects. What we need is a library. How is that done? Simple: we rename our main.rs to lib.rs.
But wait? What about fn main()? We do not need that inside a library. But it would be nice to still have some code that we can execute, right? Well, no problem. Your cargo project can only hold a single library, but it can hold even multiple binaries, each with its own fn main(). Just stuff them in the bin subdir.
While we are at it, I split the project up into multiple source files, to get it organised. It is small, still, but we will have it grow, soon. Here is, what we are at now:
The only real configuration in that file is edition = "2021". Rust has a major edition release every three years. These are used to introduce braking changes. You have to specify the edition you use explicitly, and there are migration guides. We use the most recent one, 2021.
Rust manages projects by using default project layouts. That is why we need not write a lot into the Cargo.toml. The src directory holds our source code. The fact that it holds a lib.rs makes it a library, and lib.rs is the entry point. This is what is in it:
pubmodop;
+pubmodvm;
+
+// re-export main types
+pubusecrate::vm::VM;
+
Really not a lot. It declares the two modules op and vm and makes them public. So, whatever rust project will be using our library will have access to those modules. The modules will be in the files op.rs and vm.rs. What a coincidence, that are exactly the remaining two source files in this directory!
The last line just re-exports a symbol from one of those submodules, so that programs using our library can access more easily. Will will be doing that in our binary.
Back in v0.0.2-journey, we already had a module called op to hold the opcodes. We had it stuffed in our main.rs. Now it lives in a separate file, so we do not have to scroll over it every time.
This holds the rest of our source code (except for fn main() which has no place in a lib). The only new thing, compared with our former main.rs is the first line:
usecrate::op;
+
This simply pulls the module op into the namespace of this module, so that we can access our opcode constants as we did before. The rest remains the way we already know.
So how do we use our lib in a project? That is best illustrated by doing it. And we can do so inside our project itself, because we can add binaries. Just put a Rust source file with a fn main() inside the bin subdir. There we can write a binary as we would in a separate project, that can use the lib.
We did that in the file test-run.rs:
uselovem::{op,VM};
+
+fnmain(){
+// Create a program in bytecode.
+// We just hardcode the bytes in an array here:
+letpgm=[op::NOP,op::PUSH_U8,100,op::PUSH_U8,77,op::ADD,op::POP,0xff];
+// Crate our VM instance.
+letmutvm=VM::new(100);
+// Execute the program in our VM:
+matchvm.run(&pgm){
+Ok(_)=>{
+println!("Execution successful.")
+}
+Err(e)=>{
+println!("Error during execution: {:?}",e);
+}
+}
+}
+
This is the fn main() function from our former main.rs. Instead of having all the functions and definitions, it just has this single line at the top:
uselovem::{op,VM};
+
Nothing too complicated. It tells the compiler, that our program uses the library called lovem (which is, of course, the one we are writing ourselves here). It also tells it to bring the two symbols op and VM from it into our namespace.
The op one is simply the module op defined in op.rs. Because lib.rs declares the module public, we can access it from here. VM does not refer to the module in vm.rs, as that module is called vm (in lower case). VM is actually the struct we defined in vm, that we use to hold the state of our Virtual Machine.
We could include the struct as lovem::vm::VM, which is its full path. But I find that a bit anoying, as VM is the main type of our whole library. We will always be using that. So I re-exported it in lib.rs. Remember the line pub use crate::vm::VM;? That's what it did.
So, how do we run our program now? Back in v0.0.2-journey we simply called cargo run. That actually still works, as long as we have exactly one binary.
But we can have multiple binaries inside our project. If we do, we need to tell cargo which it should run. That can easily be done:
cargorun--bintest-run
+
The parameter to --bin is the name of the file inside bin, without the .rs. And no configuration is needed anywhere, it works by convention of project layout.
What, homework again? Yeah, why not. If it fits, I might keep adding ideas for you to play around with. Doing things yourself is understanding. Stuff we just read, we tend to forget. So here is what might help you understand the project layout stuff I was writing about:
Add a second binary, that runs a different program in the VM (with different bytecode). You have all the knowledge to do so. And then run it with cargo.
In earlier posts I included explicit links to the source code at the time of writing. That got annoying to do really fast. So I added a new feature to my blogem.py that I use to write this journal. Entries like this, that are explaining a specific state of the source of lovem will have a tag from now on. This corresponds to a tag inside the git repository, as it did in earlier posts. You will find it in the card at the top of the post (where you see the publishing date and the author). It is prefixed with a little tag image. For this post it looks like this:
At the bottom of the entry (if you view it in the entry page, not in the "whole month" page), you will find it again with a list of links that help you access the source in different ways. The best way to work with the code, is to clone the repository and simply check out the tag. I also added a page on this site, explaining how you do that. You can find it under Source Code.
So, in future I will not be adding explicit links, only this implicit ones. And there will be a link to the explaining page at the bottom. This should be convenient for both, you and me.
The source code for this post can be found under the tag v0.0.3-journey.
\ No newline at end of file
diff --git a/2022-07/turn-fragile-into-rusty.html b/2022-07/turn-fragile-into-rusty.html
new file mode 100644
index 0000000..ca93873
--- /dev/null
+++ b/2022-07/turn-fragile-into-rusty.html
@@ -0,0 +1,9 @@
+ Turn "fragile" into "rusty" - Lovem
After we got our Proof of Concept running, we clean up our code and make it look like a respectable Rust program.
Did you play around with the program from the previous post? If you are new to Rust, you really should! At least mess around with our bytecode. You should find, that our VM does not react well to errors, yet. It simply panics! That is no behaviour for a respectable rust program.
We will make it more rusty, look at the enhanced version:
If you do not know your way around Rust, some of those things will be difficult to understand. It might be time to read up on some Rust, if you intend to follow my journey onwards. I will not explain everything here, but I will give you some leads right now, if you want to understand the things I did in that change.
The most important thing to understand for you will be Enums. Yeah, I know. That is what I thought at first learning Rust. "I know enums. Yeah, they are handy and useful, but what could be so interesting about them?"
Well, in fact, enums in Rust completely change the way you are writing code. They are such an important part of the language that they have an impact on just about every part of it.
It is obviously a datatype to communicate runtime errors of different nature. And I use it a bit like you would exceptions in some other languages. Nevermind the #[derive...] part for now. That is just for fancy debug output (and a bit more). Once you understand line 33: InvalidOperation(u8),, you are on the right track! To put it into easy terms: values of enums in Rust can hold additional values. And, as you see in our RuntimeError, not all values have to hold the same kind of additional value, or a value at all. This is, what makes enums really powerful.
If you know what happens in the return type of fn push in line 70, you are golden. The Result type can communicate a value on success or an error condition on failure. The great difference to typical exceptions form other languages is, that there is no special way to pass on the errors, as with exceptions that are thrown. It is just your normal return statement used. And this is done, you guessed it, with enums. If you want to read up on Result, try understanding Option first. I am using that in my code, even though you cannot see it.
If you are wondering now about the return of fn push, that does not have a return statement to be seen, you should find out, while some of my lines do not have a semicolon ; at the end, while most do.
So, this is what will get you through a lot here. Try to understand those in the given order:
Option
Some(v) vs. None
Result<v, e>
Ok(v) vs. Err(e)
if let Some(v) =
match
Result<(), e>
Ok(())
unwrap()
?
Bonus: ok(), ok_or(), and their likes
If you understand for each of those, and why I put them in the list, you are prepared to handle most Rust things I will be doing in the next time. If you have problems with parts of it, still, move on. It gets better after a while, when you use them.
\ No newline at end of file
diff --git a/2022-07/what-is-a-virtual-machine-anyway.html b/2022-07/what-is-a-virtual-machine-anyway.html
new file mode 100644
index 0000000..77c3b9c
--- /dev/null
+++ b/2022-07/what-is-a-virtual-machine-anyway.html
@@ -0,0 +1,2 @@
+ What is a Virtual Machine anyway? - Lovem
So, how do you build a Virtual Machine. There are actually two quite different approaches:
Register Machine vs. Stack Machine
Let's take a look at those concepts first. This will be very brief and basic. You can, of course, also have some combination of those concepts, and not everything I say here is true for every implementation of virtual machine, but it will be close enough for this article.
Most physical computers are register machines. At least those you will be thinking of. You are most likely using one right now to read this article. Virtual register machines use the same concepts, but not in physical hardware, instead inside another computer as software. This allows them to do some things a bit more flexible than a real hardware machine would.
A register is nothing more than a dedicated place to store a portion of data where it can be accessed for direct manipulation. They are more or less a variable of the machine's basic data type that have a fixed address, and that can be accessed and manipulated directly by the processing unit. Register machines use those to actually compute and change data. All other storage places are only that: places where data is put when it is not needed at the moment. Register machines have a multitude of registers, from a very few (maybe 4 or 8 in simplistic designs) to hundreds or more in modern computers. The size of the registers often gives the architecture its name. E.g. in the x86-x64 architecture, that most current CPUs by Intel and AMD are of, a register is 64 bits long.
The instructions for a register machine are encoded in code words. A code word is a bunch of bytes that tell the machine what to do in the next program step. For simple designs, code words are of a fixed length. This code word length is often longer than the register size. So a 16 bit architecture could have 32 bit instructions. The reason for this is, that instructions consist of an operation code that defines what operation should be executed in the next step, but they also contain the arguments passed to that operation. Because the number and size of arguments needed for an operation differ for different operations, decoding the instruction can be quite complicated. When you put multiple instructions together, you end up with a program. This representation of a computer program is called machine code. For a virtual machine it is also called bytecode, although I think this term fits better for stack machines (more on that later).
If you want to understand what I tried to describe here, read this really short article: Creating a Virtual Machine/Register VM in C. It builds a simplistic register VM in C (the whole thing is 87 lines long). It demonstrates the principles used in a register machine (fetch, decode, execute), and shows you what a register is and how it is used. You will understand, how machine code is decoded and executed. The article only uses 16 bit code words and 16 bit data words (register size). If you know C, you should be able to understand what I am talking about in about an hour of reading and coding. If you ever wanted to understand how a computer works on the inside, this might be a nice place to start, before you read about an actual physical computer.
A register machine normally has multiple stacks it uses. This does not make it a stack machine, those are just needed to store data when it is not currently used.
So a typical operations would be: * "Take the number from register 0, take the number from register 1, add those two numbers together, write the result in register 0." * "Take the lower 16 bits of this instruction and write them in register 2."
Lua and Neko are virtual register machines (at least in current versions).
And then there are Stack Machines. They are, I think, easier to understand than register machines, but following a program during execution is more confusing, since the manipulated data is more complicated to follow.
A stack is just a pile of data. Data is portioned in fixed sizes, a portion is called a word. All you can normally do is put a word on top of the stack - we will call that operation a push, or you can take the word that is currently on top of the stack (if there is one) - we will call that a pop. No other direct manipulations of the stack are allowed (I say "direct manipulations", because indirectly there often are ways that this is done, but that is a detail for later).
Manipulation of data is done this way by the machine. If you want to add two numbers, say 5 and 23, you would write a program that does this:
Push the first number to the stack.
Push the second number to the stack.
Execute the "ADD" operation.
That operation will pop the two numbers from the stack, add them, and push their sum back on the stack (so that after the operation there will be one word less on the stack).
A stack machine will also typically have some additional place to store words when you do not need them on the stack. These places can relate to variables inside a program.
As you can see from the example above, instructions in a stack machine often do not need to have arguments. If data is to be manipulated, it is always on top of the stack. There is no need to address its location, as you would do in a register machine.
Because of this, the instructions for a stack machine are typically encoded in a single byte. This byte holds a number we will call opcode (short for operation code), that simply identifies the operation to execute. If your operation does need additional arguments, you write them to the bytes following your opcode byte (the oparg), so that the operation can read them from your program. This structure of single bytes encoding our program is why we call this representation bytecode.
The concept of a stack machine is easy to implement in software, but it is not so easy to do so in hardware. That is why your typical computer is a register machine. There are, however, a lot of historical examples of important physical stack machines.
The most famous example of a virtual stack machine is the Java VM. Java source code is compiled to bytecode that is executed inside a virtual machine, the JVM. This vm is so common, that many newer programming languages compile to Java bytecode. It makes it possible to run programs written in that languages on any system that has a JVM; and that includes just about every major and many minor computer systems. A second example for a stack machine is the Python VM.
Some random thought on register and stack machines¶
While writing this down, describing the two kinds of machines I couldn't help but notice a curious fact:
A register machine manipulates data inside addressable registers. When the data is not need, it can be stored away in some kind of stack.
A stack machine manipulates data inside a stack. When the data is not needed, it can be stored away in some kind of addressable spaces, not unlike registers.
It looks as if you just need both concepts to work efficiently.
\ No newline at end of file
diff --git a/2022-08/ALL.html b/2022-08/ALL.html
new file mode 100644
index 0000000..3b2bada
--- /dev/null
+++ b/2022-08/ALL.html
@@ -0,0 +1,840 @@
+ August 2022 complete - Lovem
We took care of all the dirty work inside the assembler during the previous posts. We now have a cleanly parsed instruction with an optional argument that we can evaluate. Let us dive into parse_instruction():
/// Handles a single instruction of opcode an optional oparg parsed from Assembly file.
+fnparse_instruction(&mutself,opname: &str,oparg: Option<&str>)-> Result<(),AsmError>{
+matchopname{
+"nop"=>self.parse_a0_instruction(op::NOP,oparg),
+"fin"=>self.parse_a0_instruction(op::FIN,oparg),
+"pop"=>self.parse_a0_instruction(op::POP,oparg),
+"add"=>self.parse_a0_instruction(op::ADD,oparg),
+"sub"=>self.parse_a0_instruction(op::SUB,oparg),
+"mul"=>self.parse_a0_instruction(op::MUL,oparg),
+"div"=>self.parse_a0_instruction(op::DIV,oparg),
+"mod"=>self.parse_a0_instruction(op::MOD,oparg),
+"push_u8"=>{
+letoparg=oparg.ok_or(AsmError::MissingArgument)?;
+letv=parse_int::parse::<u8>(oparg).or(Err(AsmError::InvalidArgument))?;
+self.push_a1_instruction(op::PUSH_U8,v)
+},
+"goto"=>{
+letoparg=oparg.ok_or(AsmError::MissingArgument)?;
+letv=parse_int::parse::<i16>(oparg).or(Err(AsmError::InvalidArgument))?;
+leta=v.to_be_bytes();
+self.push_a2_instruction(op::GOTO,a[0],a[1])
+},
+_=>Err(AsmError::UnknownInstruction(String::from(opname)))
+}
+}
+
That is a surprisingly simple function. It receives two parameters. opname is a &str that holds the opname of the instruction. oparg is either None, if there was no argument in the instruction, or it holds a none-empty string that holds whatever argument was present in the instruction.
The function only consists of a long match, that directly matches the opname against our known opnames. If there is no match, it returns a helpful error that even contains the unknown opname that was found.
The explicit branches look a bit weirder. That is because I do not like to repeat myself when writing code. And Rust tends to allow some very dense source code.
I decided to group by instructions into three categories. They are grouped by the number of bytes an instruction uses as argument. An a0 instruction has zero bytes of oparg, a1 has one byte, and a2 has two bytes.
Most of our operations do not allow any argument at all. We want to make sure that there is none given in the instruction. And the only difference in handling those instructions inside the assembler is the byte that will be written to the bytecode. We can handle all of those with the same function: parse_a0_instruction():
/// Helper that parses an instruction with no oparg and pushes it.
+fnparse_a0_instruction(&mutself,opcode: u8,oparg: Option<&str>)-> Result<(),AsmError>{
+ifoparg.is_some(){
+Err(AsmError::UnexpectedArgument)
+}else{
+self.push_a0_instruction(opcode)
+}
+}
+
If we did get an argument, we fail, since that is not allowed. And then we push a very basic instruction to the back of our program. We have helper functions to do that:
/// Adds a single instruction to the end of the AsmProgram.
+fnpush_instruction(&mutself,i: AsmInstruction)-> Result<(),AsmError>{
+self.text_pos+=i.size();
+self.instructions.push(i);
+Ok(())
+}
+
+/// Helper that creates an instruction with 0 bytes of oparg and pushes it.
+fnpush_a0_instruction(&mutself,opcode: u8)-> Result<(),AsmError>{
+leti=AsmInstruction{
+line_number: self.line_number,
+opcode,
+oparg: vec![],
+pos: self.text_pos,
+};
+self.push_instruction(i)
+}
+
We create a new instruction instance and add it. We also track the position of every instruction in the bytecode, that is why we update the programs current position in the bytecode for every instruction we add (stored in text_pos).
There is nothing we do with that information, yet. But we will need that information later.
We only have one operation that needs a single byte of oparg, and that is push_u8. We use that operation to push values on the stack, taken directly from the bytecode. u8 is the only type supported at the moment. That is not even a hard restriction; you can easily get any i64 value to the stack by using basic arithmetics, and we have those.
Parsing numbers is no fun. It is hard. So we let someone else do it for us. The crate we are using is called parse_int. Go take a look at what it can do. It allows us to enter numbers easily in hexadecimal, octal, or binary notation. That is a really handy feature in source code! Thanks, Rust community! So how are we parsing push_u8?
First we make sure that we have an argument. If not, we fail. We can again use our handy ? syntax. Then we try to parse it into a u8, using parse_int. The syntax for that call takes some getting used to - I'm still waiting for me to getting used to it. But if it works, we now have a valid u8. If it fails to parse, we quickly return with that failure information. If all goes well we will reach the third line, that calls our helper for adding a1 instructions. There is no big surprise in what that function does:
/// Helper that creates an instruction with 1 byte of oparg and pushes it.
+fnpush_a1_instruction(&mutself,opcode: u8,a0: u8)-> Result<(),AsmError>{
+leti=AsmInstruction{
+line_number: self.line_number,
+opcode,
+oparg: vec![a0],
+pos: self.text_pos,
+};
+self.push_instruction(i)
+}
+
An interesting detail is, that push_instruction() returns a Result, even though it can never fail! It always returns Ok(()). And if you look at push_a2_instruction(), you will now see that it also will always return Ok(()). We do be bother? Take a look at the handler for push_u8 again, in context of the complete function parse_instruction(). That function returns a Result, and it can return Err(...). Because push_a1_instruction() has the same return value of Result, the calls integrate nicely with the layout of the complete function inside the match. For me, it gives the code a clean compactness.
This time we use parse_int to read a i16. Whether you like the ::<i16> syntax or not, at least you can see what it is for. We need to unpack the two bytes of the i16 after parsing, so that we can store the bytes correctly in the bytecode. to_be_bytes() gives us an array (of size 2) that holds the bytes in big endian byte order. to_le_bytes() is the little endian counterpart. I generally prefer big endian, when I can. And if you remember how we read the bytes in the VM, you can see that we are already using big endian there.
There is nothing new in the push_a2_instruction() function, only one additional byte.
/// Helper that creates an instruction with 1 byte of oparg and pushes it.
+fnpush_a2_instruction(&mutself,opcode: u8,a0: u8,a1: u8)-> Result<(),AsmError>{
+leti=AsmInstruction{
+line_number: self.line_number,
+opcode,
+oparg: vec![a0,a1],
+pos: self.text_pos,
+};
+self.push_instruction(i)
+}
+
We have now parsed the complete program source into the AsmPgm structure. Or we have failed to do so, in which case there is an Error stored in AsmPgm. Either way, you have now seen all the code that does the parsing. Next journal entry will finally produce the bytecode we are longing for.
Our new assembler is almost done assembling. Over the last entries we learned how the program parses the assembly sourcecode and produces a list of parsed instructions. What we now need to do, is turn that into bytes.
The error part is straightforward. A small detail is the clone() call for name and error. We need to do that, because we cannot move ownership of those values (they must still exist in the AsmPgm instance). And we cannot use references. There is no need to clone the line number; as an integer type it can simply be copied.
The success part isn't complex either. We create a Vector of bytes and push all bytes into it: for each instruction the opcode and the opargs (which there can be zero). We have our bytecode now! Wrap it inside our new Pgm type, and we are done.
What happens, if our program has errors? Easy to find out, I included a broken program: syntax-error.lass
push_u8 123
+push_u8 300
+add
+pop
+fin
+
Have you found the problem? Will the assembler?
kratenko@jotun:~/git/lovem$ cargo run --bin lovas -- pgm/syntax-error.lass
+
+ Finished dev [unoptimized + debuginfo] target(s) in 0.04s
+ Running `target/debug/lovas pgm/syntax-error.lass`
+Error: assembly failed in line 2 of program 'pgm/syntax-error.lass'
+
+Caused by:
+ InvalidArgument
+
It does find the error. Using the parse_int create already pays. And the error message really tells us, what is wrong and where. We get a lot of value for very few code we have written.
There does not really seem to be a point of storing all that information inside AsmPgm. We could easily have created the bytecode directly. That would have been a lot easier. And if you have run the code yourself, you will have been bombarded with compiler warnings about unread fields.
We will be needing that information soon, and it was easiest to build it like this right away. But let us just enjoy our new assembler for now.
Okay, before we leave for today, one more thing that you might have spotted. What's with that impl blocks?
implDisplayforAsmError{
+fnfmt(&self,f: &mutFormatter<'_>)-> std::fmt::Result{
+write!(f,"{:?}",self)
+}
+}
+
+implerror::ErrorforAsmError{
+}
+
+implDisplayforAsmErrorReport{
+fnfmt(&self,f: &mutFormatter<'_>)-> std::fmt::Result{
+write!(f,"assembly failed in line {} of program '{}'",self.line,self.name)
+}
+}
+
+implerror::ErrorforAsmErrorReport{
+fnsource(&self)-> Option<&(dynerror::Error+'static)>{
+Some(&self.error)
+}
+}
+
That is the price we have to pay when we want to use Rust magic. Rust's answer to writing generic code that can be applied to different types (that might not exist at the time of writing) are traits. A function can accept a trait as a type. If you implement that trait for your type, you can use that function. That is a very simplified introduction.
A trait defines specific functions you have to write for a type. That is what we do here. We implement the trait std::error::Error for our AsmError and AsmErrorReport. To do so, we must also implement the trait std::fmt::Display (because the Error trait says so).
There is not much we do there. Types implementing the Display trait can be printed using println!("{}", value). What the println! macro does is just calling that fmt method we define. The trait Debug does a similar thing, but for use with println!("{:?}", value). We can use any value with those constructs that implements the Display trait (for "{}") or the Debug trait (for "{:?}").
The Debug trait we let the compiler implement (derive) for us. That is what the line #[derive(Debug)] does. And for our Display trait we are lazy and just use the function that was created by #[derive(Debug)].
The Error trait lets you implement a source() method, that is used to get a nested Error inside your Error, that was its cause. Think of exception stacks, only that we do not have exceptions, of course. That is exactly what we want for AsmErrorReport; it is, after all, a wrapper for AsmError. AsmError on the other hand does not have a nested error, so we do not implement the source() method. The empty impl error::Error for AsmError block is still needed. If you remove it, the Error trait will not be implemented for AsmError.
Cool story, but why do we do all this? This is what enables us to use the magic of anyhow in our lovas.rs. We can use AsmError and AsmErrorReport (wrapped in an Err()) as return for our main function. It returns anyhow::Result<()>. And when there is an error returned by it, an error message is created and printed for us. With this we can easily create useful error messages in the error type itself, at the place where we understand, what errors exist and what they mean. And we need do it in that one place only. Every program that uses our library (as lovas.rs does) benefits from that without any extra work or even without knowing, error types can be returned by the library.
We will extend our assembler to do something useful, finally: execute our programs on lovem.
We have created ourselves an assembler in ~300 lines of code. And it has a command line interface, an API to be used in a program, and even useful error reporting. That is cool! But what do we do with the bytecode? It just dumps them to the console. That is not very useful. We could copy/paste that into one of our example binaries... This is not what we wanted. So let us enhance our assembler.
We add some features to lovas.rs. A new command line parameter --run, that takes no arguments. If you add that flag to the call, lovas will take the assembled program (if there are no errors), create an instance of the VM and run the program on it. Thanks to clap, that is really easy to do. We add another field to our Cli struct. Actually, while we are at it, we add four new parameters:
##[clap(short, long, help = "Run the assembled program in lovem.")]
+run: bool,
+
+##[clap(long, help = "Enable tracing log when running lovem.")]
+trace: bool,
+
+##[clap(long, help = "Output the program to stdout.")]
+print: bool,
+
+##[clap(long, default_value_t = 100, help = "Setting the stack size for lovem when running the program.")]
+stack_size: usize,
+
And we change what we do with a successfully created program, depending on our new flag:
// run the assembler:
+matchasm::assemble(&name,&content){
+Ok(pgm)=>{
+ifargs.print{
+println!("{:?}",pgm);
+}
+// we succeeded and now have a program with bytecode:
+ifargs.run{
+// lovas was called with `--run`, so create a VM and execute program:
+run(&pgm,&args)?
+}
+Ok(())
+},
+Err(e)=>{
+// Something went wrong during assembly.
+// Convert the error report, so that `anyhow` can do its magic
+// and display some helpful error message:
+Err(Error::from(e))
+},
+}
+
Just printing the program to stdout is no very useful default behaviour for an assembler. It might still come in handy, if you want to see what you are executing, so we make it optional and for the caller to decide with the --print flag. If the --run flag is set, we call run(). So what does run() do?
/// Executes a program in a freshly created lovem VM.
+fnrun(pgm: &Pgm,args: &Cli)-> Result<()>{
+// Create our VM instance.
+letmutvm=VM::new(args.stack_size);
+vm.trace=args.trace;
+letstart=Instant::now();
+letoutcome=vm.run(&pgm.text);
+letduration=start.elapsed();
+matchoutcome{
+Ok(_)=>{
+// Execution successful, program terminated:
+eprintln!("Terminated.\nRuntime={:?}\nop_cnt={}, pc={}, stack-depth={}, watermark={}",
+duration,
+vm.op_cnt,vm.pc,vm.stack.len(),vm.watermark
+);
+Ok(())
+},
+Err(e)=>{
+// Runtime error. Error will be printed on return of main.
+eprintln!("Runtime error!\nRuntime={:?}\nop_cnt={}, pc={}, stack-depth={}, watermark={}",
+duration,vm.op_cnt,vm.pc,vm.stack.len(),vm.watermark);
+Err(Error::from(e))
+}
+}
+}
+
We create a VM instance, and we run the program on it. If there is a RuntimeError, we return it, just as we did with the AsmErrorReport. Back in our examples, we created a VM with a stack size of 100 - simply because we needed a number there. 100 is still the default, but now you can choose the stack size, when calling lovas. If you do
When we were running a program in our VM, we did always get a lot of output during execution. That is nice for understanding, what a stack machine does, but in general it is not a got idea for a VM to do that. It can be very beneficial, if you run into a problem with your program, so it is an easily available tool for debugging. That is why I removed all those log messages from lovem, but I let some in that can be activated, if you set vm.trace = true. That is what we added the new command line parameter --trace for. You can now control, if you want to see it.
There is some output by lovas, after the execution. It reports if the run was successfully terminated (by executing a fin instruction), or if there was a RuntimeError. In both cases it will show you the time the execution took (wallclock time), as well as the number of instructions executed by the VM, the final position of the programm counter, the number of values on the stack at termination, and the highest number of values on the stack at any time during execution (the watermark). This can give you some quick insight on what your program did and maybe where it ran into trouble.
All this lead to some changes to vm.rs, but nothing that should give you any problems to understand. Remember that we have the power of git at our disposal, so you can easily find out what changed in a file between two releases. You could do that for vm.rs with this handy link:
We have written a few example programs so far. Each is its own binary in src/bin/, and all of them consist of the same Rust code of creating a VM and running a program. Only the bytecode changed between them.
I got rid of all of those (except for the most basic one) and translated the programs into assembly programs that live in pgm/. You can now execute those using lovas, like this:
Remember to add --trace to the call, or you won't see very much. It has become a lot easier, to play around with the VM. No more writing bytecode by hand!
You might have noticed that I changed the filename extension that I use for the assembly programs from .lass to .lva. There are multiple reasons, but the main one is, that I thought Lass could be a nice name for a programming language, when I will finally come to writing one for lovem. So I want to reserve the extension for that possible future.
The diagnostic information given after the execution can be interesting, when you mess around. Let us play a bit with the program endless-stack.lva.
## This program runs in an endless loop, but it will push a new value to the stack on every iteration.
+## It will inevitably lead to a stack overrun at some point and crash the program.
+push_u8 123
+goto -5
+fin
+
The program will fill the stack until it is full, and then it will crash:
After 201 executed instructions it crashes. The stack depth at the time of the crash is 100. That is the complete stack, the next instruction tried to push value 101, which must fail. Instruction number 201 did cause the crash. That makes sense, if you follow the execution in your head. And the program counter is on 2. The last instruction executed will be the one before that, which would be at 0. That is the push_u8 instruction. There is no surprise that the watermark is at 100. That is the highest possible value for it and also the current value of out stack depth.
As we can now easily change the stack size, let us try what happens with a bigger stack:
So now the stack overflows at over 150 values, of course. And it takes 301 instructions to fill it. Runtime has been longer, but only about 15%. I would not have expected a rise of 50%, as there is overhead for starting the program.
There is, of course, a lot of output, that I cut out. What is interesting is the change in execution time. I ran this inside the CLion IDE by JetBrains. The console there will not be a very fast console, as it does a lot with that output coming through. But the impact of the logging is enormous! The runtime until we hit our stack overflow is more than 1000 times longer! The exact numbers don't mean anything; we are running unoptimised Rust code with debuginfo, and the bottleneck is the console. But it is still fascinating to see.
We add a feature to our assembler that we overlooked before.
Over the last few entries we created ourselves a really useful little assembler program. I hope you played around with it and enjoyed not having to write bytecode directly. If you did, you should have noticed that I left out a really important detail. Remember when I was complaining about how bad writing bytecode is? And that it got even worth, when we introduced jumps? Yeah, I did not solve that problem at all. If anything, I made it worse, because you still have to count the relative bytes to your destination, but you do not see those bytes any longer. You just have to know, how many bytes each instruction will produce.
There was so much already going on in that assembler program, that I did not want to introduce more complexity up front. Let's fix that now: we will introduce a way to give a position inside your program a name, so that you can goto that name later. And in good tradition, we will call this names labels.
The traditional way of defining labels in assembly is by writing them first thing on a line, followed by a colon :. Take a look at this little program, label.lva. It is neither good style, nor does it do anything useful, but it shows us labels:
## A small demonstration of how labels work with goto.
+push_u8 1
+goto coda
+
+back:
+ push_u8 3
+ fin
+
+ coda: push_u8 2
+goto back
+
There are two labels defined here: back in line 5, and coda in line 9. A label definition is a short string that is directly followed by a colon :. We restrict it to letters, numbers, and underscore, with a letter at the front. For the curious, the regex is: ^[A-Za-z][0-9A-Za-z_]{0,31}$. As you can see in the example, there can be an optional instruction in the same line as the label definition. Now, how will our assembler parse those?
First of all, I did a little reconstruction inside asm.rs, because I did not like how the parsing was done inside an associated function, that also created the AsmPgm instance. That seems messed up. After the change, the fn assemble() creates the instance itself and then calls a method on it, to parse the source code. Here is the new version:
/// Parse assembly source code and turn it into a runnable program (or create report).
+pubfnassemble(name: &str,content: &str)-> Result<Pgm,AsmErrorReport>{
+// create a new, clean instance to fill during parsing:
+letmutasm_pgm=AsmPgm{
+name: String::from(name),
+instructions: vec![],
+line_number: 0,
+text_pos: 0,
+error: None,
+labels: Default::default(),
+};
+// evaluate the source code:
+asm_pgm.process_assembly(content);
+// convert to Pgm instance if successful, or to Error Report, if assembly failed:
+asm_pgm.to_program()
+}
+
And there is no problem with us changing the code like this. The only public function inside asm.rs is that pub fn assemble(). All methods of AsmPgm are private and therefore internal detail. Not that it would matter at this state of development, but it demonstrates how separation of public API and internal implementation work.
What is also new in that function is a new field inside AsmPgm: labels.
/// A assembler program during parsing/assembling.
+##[derive(Debug)]
+structAsmPgm{
+...
+/// A map storing label definitions by name with there position in bytecode.
+labels: HashMap<String,usize>,
+}
+
It is a HashMap (aka. associative array in other languages). This is where we put all label definitions we find, while parsing the source file. It maps the label's name to its position inside the bytecode. Here we can look up where to jump, for a goto that wants to jump to a label.
fnprocess(&mutself,content: &str)-> Result<(),AsmError>{
+// Go over complete source, extracting instructions. Some will have their opargs
+// left empty (with placeholders).
+self.parse(content)?;
+self.update_instructions()
+}
+
+/// Process assembly source code. Must be used with "empty" AsmPgm.
+fnprocess_assembly(&mutself,content: &str){
+// this function is just a wrapper around `process()`, so that I can use the
+// return magic and don't need to write the error check twice.
+ifletErr(e)=self.process(content){
+self.error=Some(e);
+}
+}
+
The important part is, that we have to steps now. We parse the complete source, as before. The second run is needed to write the actual relative jump address to the instructions. We do not know them during parsing, at least not for jumps forward.
/// Parses and extracts optional label definition from line.
+///
+/// Looks for a colon ':'. If one exists, the part before the first colon will be
+/// seen as the name for a label, that is defined on this line. Instructions inside
+/// the program that execute jumps can refer to these labels as a destination.
+/// Lines containing a label definition may also contain an instruction and/or a comment.
+/// This can return `AsmError::InvalidLabel` if the part before the colon is not a valid
+/// label name, or `AsmError::DuplicateLabel` if a label name is reused.
+/// If a label could be parsed, it will be stored to the `AsmPgm`.
+/// On success, the line without the label definition is returned, so that it can be
+/// used to extract an instruction. This will be the complete line, if there was no
+/// label definition.
+fnparse_label_definition<'a>(&mutself,line: &'astr)-> Result<&'astr,AsmError>{
+ifletSome((label,rest))=line.split_once(":"){
+letlabel=label.trim_start();
+ifVALID_LABEL.is_match(label){
+ifself.labels.contains_key(label){
+Err(AsmError::DuplicateLabel(String::from(label)))
+}else{
+self.labels.insert(String::from(label),self.text_pos);
+Ok(rest)
+}
+}else{
+Err(AsmError::InvalidLabel(String::from(label)))
+}
+}else{
+Ok(line)
+}
+}
+
The method is trying to find a label definition in the line, and if so, handles it. We use our trusted Result<> returning, to communicate potential errors. But instead of Ok(()), which is the empty okay value, we return a &str on success. This is because there might also be an instruction in the line. If we find a label definition, it returns the line after the colon. If there is none, it returns the complete line it got. This gives us the lines as we used to get before we introduced labels. Great. But what is that weird 'a that shows up in that highlighted line everywhere?
Yeah, this is where it becomes rusty, again. I said, in an early post, that you would hate the Rust compiler and its pedantic error messages. The thing Rust is most pedantic about, is ownership and access to values you do not own. We are working with references to Strings here. A &str references the bytes inside that String directly (a &str need not reference a String, but it does here). We did that before, where is the problem now? This is the first time we are returning a &str.
When you are using references, Rust makes sure that the value you are referencing exists at least as long as the reference exists. That is easy for functions, as long as you drop every reference you have when you are done. But in this function, we return a reference to the parameter we got. Rust cannot allow that without some special care. When I remove the 'a parts of the method, I get a compilation error:
error[E0623]: lifetime mismatch
+ --> src/asm.rs:277:21
+ |
+269 | fn parse_label_definition(&mut self, line: &str) -> Result<&str, AsmError> {
+ | ---- ----------------------
+ | |
+ | this parameter and the return type are declared with different lifetimes...
+...
+277 | Ok(rest)
+ | ^^^^^^^^ ...but data from `line` is returned here
+ |
+ = note: each elided lifetime in input position becomes a distinct lifetime
+help: consider introducing a named lifetime parameter and update trait if needed
+ |
+269 | fn parse_label_definition<'a>(&'a mut self, line: &'a str) -> Result<&str, AsmError> {
+ | ++++ ++ ++
+
The compiler tells me, that I messed up the lifetimes. It even proposes a change that introduces lifetime parameters (but gets it slightly wrong). What do we do with the 'a?
Well we introduce a lifetime parameter called a. The syntax for that is the apostrophe, which looked weird to me at start, but it is so lightweight, that I came to like it. It is custom, to just call your lifetimes 'a, 'b, ... – they normally don't have a long scope anyway. The thing we are telling the compiler with this parameter is this: the lifetime of the returned &str is dependent on the lifetime of the parameter line: &str. So whenever the reference the function is called with runs out of scope, the reference that was returned must be out of scope as well.
This is a concept that is new to many programmers when they learn Rust. I think, what we do here demonstrates it quiet well. Let us look at what happens for line 9 of our assembly program:
Our function receives a reference to a String holding that line: " coda: push_u8 2". It finds the label coda and stores it inside self.labels. Its work is done, but there might be more to this line. It returns a reference to a substring of it (&str are actually slices; they can reference only a part of a String's data). That is what we return, a reference to the part data inside the String, starting at the first char after the colon, so it looks like this " push_u8 2". It is not a copy, it is the same area inside the computer's memory! So if you want to make certain, that there are no accesses to memory after its content has run out of scope (use after free, or use of local variable after it runs our of scope), you must not allow access to it, unless you are sure the value still exists. And this is what Rust does. This is what makes Rust a secure language. Many bugs and exploits in the world exist, because most languages do not check this, but leave the responsibility to the programmer. And the really cool thing about Rust is, it does this completely at compile time, as you can see by the fact that we got a compiler error.
The way we call our function is not a problem at all:
Our initial line comes from line 228. It is already a reference, because content.lines() is also giving us a reference to the memory inside of content. That is a reference already, the String variable that holds (and owns) the data lives inside lovas.rs:
// read complete source file into String:
+letcontent=std::fs::read_to_string(&args.source)
+.with_context(
+||format!("could not read file `{}`",&name)
+)?;
+// run the assembler:
+matchasm::assemble(&name,&content){
+
We do not copy any of that bytes along the way. The first time we do that is in clean_line(). Returning a &str will not work there, because we actually modify the contents of the string, by replacing characters inside it. Have you ever tried to work with inplace "substrings" (I mean char arrays, like this char *str), without modifying the contents (placing \0 bytes). It is not fun. In Rust, it can be, if you understand lifetime restrictions.
If you run into problems with your &str inside a Rust program, there is often an easy way to get around that. You can simply create a new String from your &str, as we do in clean_line(). That will copy the bytes. For our program, that would have been no problem at all. Cloning a few bytes of source code for every line during assembly would cost us next to nothing. You would not notice in execution time. But things are different when you need to quickly handle long substrings in a program. Think of a diagnostic job on a busy server. And remember that Strings will be created on the heap. That is a complexity that you sometimes want to avoid. When programming microcontrollers, there is a chance that you do not even have a memory allocator at your disposal. And microcontrollers is, what we are aiming for in our project. There are already some parts of lovem, that we will need to change, because of that. But that is a story for another time. I just thought that this was a nice little example to introduce you to lifetime parameters. We will need them at some point...
This is a long entry already. You can look at the complete state of the assembler directly in the sourcecode. You should know how to find the tags inside the repo by now. But I want to execute our new program, using the labels, before I end this. Here it is again:
The program has three push_u8 operations. If you executed them in the order of the source code, they would push [1, 3, 2] to the stack. But because of the goto instructions, they are not executed in that order. You can see the jumps in the trace, and you can see that the stack at termination holds the values in this order: [1, 2, 3].
Not much of a program, but it shows you, how our new labels work. And finally: no more counting bytes!
Our assembler gives us a lot of convenience for testing features of our VM. So let us start doing interesting stuff with it. We do have support for jumps already, but as it is now, save of an endless loop, there is absolutely no reason to do it, yet. All our programs run their predetermined way. If you look again at label.lva, you can see that none of those gotos introduce any dynamic. We could just ditch them and reorder the rest. It would do the same, only more efficient. They simple tangle up our linear code, without removing its linearity.
Today we will introduce branches to our VM. A branch is a point in a program from which there are multiple possible paths to take. Two paths, normally. Which of those paths is takes is decided at runtime by looking at the state of the program. For us that means that we look at the value on top of the stack. How does it work?
We already introduced the goto operation. What we will add now, works exactly the same way, but only if a certain condition is met. And, yes, we will call that operation if. But if what? How about if equal?
So we get the new opname ifeq, that pops a value from the stack and only executes its jump when that value is equal. Equal to what, you want to know? How about if it is equal to zero. If you want to compare it to a different number, it is easy to subtract that number from your value before you compare it to zero, and you achieve what you need.
/// opcode: Pop value from stack and push it back, twice.
+///
+/// pop: 1, push: 2
+/// oparg: 0
+pubconstDUP: u8=0x03;
+
This one simply duplicates the value on top of the stack, so that there will be another copy of it on top of it. We will use that often when testing values with an if, if we still need the value after testing it. The if will consume the top most value.
And that is all we need to change on our assembler. The way we have written it, it is easy to introduce new operations, when they share the same syntax in assembly and in bytecode as existing ones.
## Demonstrate the conditional jump (a branch)
+## The program has a loop that it executes thrice, before it terminates.
+ push_u8 3
+loop:
+ push_u8 1
+ sub
+ dup
+ ifgt loop
+ pop
+ fin
+
Nice! This is basically a for-loop. Granted, it does not do anything but loop, but you can see how the program counts down from 3 to 0 and after the third time it reaches line 8, it stops jumping back to loop: and advances to the end.
We can increase the number in line 3, and the number of runs increase with it. If we change it to 200, we get this (I ditched the --trace for this).
Takes about have a second to execute, over 4000000 operations where executed. And the stack never held more than 2 values, as you can see by the watermark. We are programming!
Wait a second! Our only way of getting values on the stack is push_u8. That can only push a u8, so only values 0 - 255. How did I push that 1000000 there?
\ No newline at end of file
diff --git a/2022-08/NAV.html b/2022-08/NAV.html
new file mode 100644
index 0000000..78c8f17
--- /dev/null
+++ b/2022-08/NAV.html
@@ -0,0 +1 @@
+ NAV - Lovem
\ No newline at end of file
diff --git a/2022-08/assembling-bytes.html b/2022-08/assembling-bytes.html
new file mode 100644
index 0000000..b721976
--- /dev/null
+++ b/2022-08/assembling-bytes.html
@@ -0,0 +1,85 @@
+ Assembling bytes - Lovem
Our new assembler is almost done assembling. Over the last entries we learned how the program parses the assembly sourcecode and produces a list of parsed instructions. What we now need to do, is turn that into bytes.
The error part is straightforward. A small detail is the clone() call for name and error. We need to do that, because we cannot move ownership of those values (they must still exist in the AsmPgm instance). And we cannot use references. There is no need to clone the line number; as an integer type it can simply be copied.
The success part isn't complex either. We create a Vector of bytes and push all bytes into it: for each instruction the opcode and the opargs (which there can be zero). We have our bytecode now! Wrap it inside our new Pgm type, and we are done.
What happens, if our program has errors? Easy to find out, I included a broken program: syntax-error.lass
push_u8 123
+push_u8 300
+add
+pop
+fin
+
Have you found the problem? Will the assembler?
kratenko@jotun:~/git/lovem$ cargo run --bin lovas -- pgm/syntax-error.lass
+
+ Finished dev [unoptimized + debuginfo] target(s) in 0.04s
+ Running `target/debug/lovas pgm/syntax-error.lass`
+Error: assembly failed in line 2 of program 'pgm/syntax-error.lass'
+
+Caused by:
+ InvalidArgument
+
It does find the error. Using the parse_int create already pays. And the error message really tells us, what is wrong and where. We get a lot of value for very few code we have written.
There does not really seem to be a point of storing all that information inside AsmPgm. We could easily have created the bytecode directly. That would have been a lot easier. And if you have run the code yourself, you will have been bombarded with compiler warnings about unread fields.
We will be needing that information soon, and it was easiest to build it like this right away. But let us just enjoy our new assembler for now.
Okay, before we leave for today, one more thing that you might have spotted. What's with that impl blocks?
implDisplayforAsmError{
+fnfmt(&self,f: &mutFormatter<'_>)-> std::fmt::Result{
+write!(f,"{:?}",self)
+}
+}
+
+implerror::ErrorforAsmError{
+}
+
+implDisplayforAsmErrorReport{
+fnfmt(&self,f: &mutFormatter<'_>)-> std::fmt::Result{
+write!(f,"assembly failed in line {} of program '{}'",self.line,self.name)
+}
+}
+
+implerror::ErrorforAsmErrorReport{
+fnsource(&self)-> Option<&(dynerror::Error+'static)>{
+Some(&self.error)
+}
+}
+
That is the price we have to pay when we want to use Rust magic. Rust's answer to writing generic code that can be applied to different types (that might not exist at the time of writing) are traits. A function can accept a trait as a type. If you implement that trait for your type, you can use that function. That is a very simplified introduction.
A trait defines specific functions you have to write for a type. That is what we do here. We implement the trait std::error::Error for our AsmError and AsmErrorReport. To do so, we must also implement the trait std::fmt::Display (because the Error trait says so).
There is not much we do there. Types implementing the Display trait can be printed using println!("{}", value). What the println! macro does is just calling that fmt method we define. The trait Debug does a similar thing, but for use with println!("{:?}", value). We can use any value with those constructs that implements the Display trait (for "{}") or the Debug trait (for "{:?}").
The Debug trait we let the compiler implement (derive) for us. That is what the line #[derive(Debug)] does. And for our Display trait we are lazy and just use the function that was created by #[derive(Debug)].
The Error trait lets you implement a source() method, that is used to get a nested Error inside your Error, that was its cause. Think of exception stacks, only that we do not have exceptions, of course. That is exactly what we want for AsmErrorReport; it is, after all, a wrapper for AsmError. AsmError on the other hand does not have a nested error, so we do not implement the source() method. The empty impl error::Error for AsmError block is still needed. If you remove it, the Error trait will not be implemented for AsmError.
Cool story, but why do we do all this? This is what enables us to use the magic of anyhow in our lovas.rs. We can use AsmError and AsmErrorReport (wrapped in an Err()) as return for our main function. It returns anyhow::Result<()>. And when there is an error returned by it, an error message is created and printed for us. With this we can easily create useful error messages in the error type itself, at the place where we understand, what errors exist and what they mean. And we need do it in that one place only. Every program that uses our library (as lovas.rs does) benefits from that without any extra work or even without knowing, error types can be returned by the library.
The source code for this post can be found under the tag v0.0.8-journey.
\ No newline at end of file
diff --git a/2022-08/handling-instructions.html b/2022-08/handling-instructions.html
new file mode 100644
index 0000000..60de64e
--- /dev/null
+++ b/2022-08/handling-instructions.html
@@ -0,0 +1,83 @@
+ Handling instructions - Lovem
We took care of all the dirty work inside the assembler during the previous posts. We now have a cleanly parsed instruction with an optional argument that we can evaluate. Let us dive into parse_instruction():
/// Handles a single instruction of opcode an optional oparg parsed from Assembly file.
+fnparse_instruction(&mutself,opname: &str,oparg: Option<&str>)-> Result<(),AsmError>{
+matchopname{
+"nop"=>self.parse_a0_instruction(op::NOP,oparg),
+"fin"=>self.parse_a0_instruction(op::FIN,oparg),
+"pop"=>self.parse_a0_instruction(op::POP,oparg),
+"add"=>self.parse_a0_instruction(op::ADD,oparg),
+"sub"=>self.parse_a0_instruction(op::SUB,oparg),
+"mul"=>self.parse_a0_instruction(op::MUL,oparg),
+"div"=>self.parse_a0_instruction(op::DIV,oparg),
+"mod"=>self.parse_a0_instruction(op::MOD,oparg),
+"push_u8"=>{
+letoparg=oparg.ok_or(AsmError::MissingArgument)?;
+letv=parse_int::parse::<u8>(oparg).or(Err(AsmError::InvalidArgument))?;
+self.push_a1_instruction(op::PUSH_U8,v)
+},
+"goto"=>{
+letoparg=oparg.ok_or(AsmError::MissingArgument)?;
+letv=parse_int::parse::<i16>(oparg).or(Err(AsmError::InvalidArgument))?;
+leta=v.to_be_bytes();
+self.push_a2_instruction(op::GOTO,a[0],a[1])
+},
+_=>Err(AsmError::UnknownInstruction(String::from(opname)))
+}
+}
+
That is a surprisingly simple function. It receives two parameters. opname is a &str that holds the opname of the instruction. oparg is either None, if there was no argument in the instruction, or it holds a none-empty string that holds whatever argument was present in the instruction.
The function only consists of a long match, that directly matches the opname against our known opnames. If there is no match, it returns a helpful error that even contains the unknown opname that was found.
The explicit branches look a bit weirder. That is because I do not like to repeat myself when writing code. And Rust tends to allow some very dense source code.
I decided to group by instructions into three categories. They are grouped by the number of bytes an instruction uses as argument. An a0 instruction has zero bytes of oparg, a1 has one byte, and a2 has two bytes.
Most of our operations do not allow any argument at all. We want to make sure that there is none given in the instruction. And the only difference in handling those instructions inside the assembler is the byte that will be written to the bytecode. We can handle all of those with the same function: parse_a0_instruction():
/// Helper that parses an instruction with no oparg and pushes it.
+fnparse_a0_instruction(&mutself,opcode: u8,oparg: Option<&str>)-> Result<(),AsmError>{
+ifoparg.is_some(){
+Err(AsmError::UnexpectedArgument)
+}else{
+self.push_a0_instruction(opcode)
+}
+}
+
If we did get an argument, we fail, since that is not allowed. And then we push a very basic instruction to the back of our program. We have helper functions to do that:
/// Adds a single instruction to the end of the AsmProgram.
+fnpush_instruction(&mutself,i: AsmInstruction)-> Result<(),AsmError>{
+self.text_pos+=i.size();
+self.instructions.push(i);
+Ok(())
+}
+
+/// Helper that creates an instruction with 0 bytes of oparg and pushes it.
+fnpush_a0_instruction(&mutself,opcode: u8)-> Result<(),AsmError>{
+leti=AsmInstruction{
+line_number: self.line_number,
+opcode,
+oparg: vec![],
+pos: self.text_pos,
+};
+self.push_instruction(i)
+}
+
We create a new instruction instance and add it. We also track the position of every instruction in the bytecode, that is why we update the programs current position in the bytecode for every instruction we add (stored in text_pos).
There is nothing we do with that information, yet. But we will need that information later.
We only have one operation that needs a single byte of oparg, and that is push_u8. We use that operation to push values on the stack, taken directly from the bytecode. u8 is the only type supported at the moment. That is not even a hard restriction; you can easily get any i64 value to the stack by using basic arithmetics, and we have those.
Parsing numbers is no fun. It is hard. So we let someone else do it for us. The crate we are using is called parse_int. Go take a look at what it can do. It allows us to enter numbers easily in hexadecimal, octal, or binary notation. That is a really handy feature in source code! Thanks, Rust community! So how are we parsing push_u8?
First we make sure that we have an argument. If not, we fail. We can again use our handy ? syntax. Then we try to parse it into a u8, using parse_int. The syntax for that call takes some getting used to - I'm still waiting for me to getting used to it. But if it works, we now have a valid u8. If it fails to parse, we quickly return with that failure information. If all goes well we will reach the third line, that calls our helper for adding a1 instructions. There is no big surprise in what that function does:
/// Helper that creates an instruction with 1 byte of oparg and pushes it.
+fnpush_a1_instruction(&mutself,opcode: u8,a0: u8)-> Result<(),AsmError>{
+leti=AsmInstruction{
+line_number: self.line_number,
+opcode,
+oparg: vec![a0],
+pos: self.text_pos,
+};
+self.push_instruction(i)
+}
+
An interesting detail is, that push_instruction() returns a Result, even though it can never fail! It always returns Ok(()). And if you look at push_a2_instruction(), you will now see that it also will always return Ok(()). We do be bother? Take a look at the handler for push_u8 again, in context of the complete function parse_instruction(). That function returns a Result, and it can return Err(...). Because push_a1_instruction() has the same return value of Result, the calls integrate nicely with the layout of the complete function inside the match. For me, it gives the code a clean compactness.
This time we use parse_int to read a i16. Whether you like the ::<i16> syntax or not, at least you can see what it is for. We need to unpack the two bytes of the i16 after parsing, so that we can store the bytes correctly in the bytecode. to_be_bytes() gives us an array (of size 2) that holds the bytes in big endian byte order. to_le_bytes() is the little endian counterpart. I generally prefer big endian, when I can. And if you remember how we read the bytes in the VM, you can see that we are already using big endian there.
There is nothing new in the push_a2_instruction() function, only one additional byte.
/// Helper that creates an instruction with 1 byte of oparg and pushes it.
+fnpush_a2_instruction(&mutself,opcode: u8,a0: u8,a1: u8)-> Result<(),AsmError>{
+leti=AsmInstruction{
+line_number: self.line_number,
+opcode,
+oparg: vec![a0,a1],
+pos: self.text_pos,
+};
+self.push_instruction(i)
+}
+
We have now parsed the complete program source into the AsmPgm structure. Or we have failed to do so, in which case there is an Error stored in AsmPgm. Either way, you have now seen all the code that does the parsing. Next journal entry will finally produce the bytecode we are longing for.
The source code for this post can be found under the tag v0.0.8-journey.
\ No newline at end of file
diff --git a/2022-08/index.html b/2022-08/index.html
new file mode 100644
index 0000000..f6b8ab8
--- /dev/null
+++ b/2022-08/index.html
@@ -0,0 +1 @@
+ Journal entries from August 2022 - Lovem
Our assembler gives us a lot of convenience for testing features of our VM. So let us start doing interesting stuff with it. We do have support for jumps already, but as it is now, save of an endless loop, there is absolutely no reason to do it, yet. All our programs run their predetermined way. If you look again at label.lva, you can see that none of those gotos introduce any dynamic. We could just ditch them and reorder the rest. It would do the same, only more efficient. They simple tangle up our linear code, without removing its linearity.
We add a feature to our assembler that we overlooked before.
Over the last few entries we created ourselves a really useful little assembler program. I hope you played around with it and enjoyed not having to write bytecode directly. If you did, you should have noticed that I left out a really important detail. Remember when I was complaining about how bad writing bytecode is? And that it got even worth, when we introduced jumps? Yeah, I did not solve that problem at all. If anything, I made it worse, because you still have to count the relative bytes to your destination, but you do not see those bytes any longer. You just have to know, how many bytes each instruction will produce.
We will extend our assembler to do something useful, finally: execute our programs on lovem.
We have created ourselves an assembler in ~300 lines of code. And it has a command line interface, an API to be used in a program, and even useful error reporting. That is cool! But what do we do with the bytecode? It just dumps them to the console. That is not very useful. We could copy/paste that into one of our example binaries... This is not what we wanted. So let us enhance our assembler.
Our new assembler is almost done assembling. Over the last entries we learned how the program parses the assembly sourcecode and produces a list of parsed instructions. What we now need to do, is turn that into bytes.
We took care of all the dirty work inside the assembler during the previous posts. We now have a cleanly parsed instruction with an optional argument that we can evaluate. Let us dive into parse_instruction():
We will extend our assembler to do something useful, finally: execute our programs on lovem.
We have created ourselves an assembler in ~300 lines of code. And it has a command line interface, an API to be used in a program, and even useful error reporting. That is cool! But what do we do with the bytecode? It just dumps them to the console. That is not very useful. We could copy/paste that into one of our example binaries... This is not what we wanted. So let us enhance our assembler.
We add some features to lovas.rs. A new command line parameter --run, that takes no arguments. If you add that flag to the call, lovas will take the assembled program (if there are no errors), create an instance of the VM and run the program on it. Thanks to clap, that is really easy to do. We add another field to our Cli struct. Actually, while we are at it, we add four new parameters:
#[clap(short, long, help = "Run the assembled program in lovem.")]
+run: bool,
+
+#[clap(long, help = "Enable tracing log when running lovem.")]
+trace: bool,
+
+#[clap(long, help = "Output the program to stdout.")]
+print: bool,
+
+#[clap(long, default_value_t = 100, help = "Setting the stack size for lovem when running the program.")]
+stack_size: usize,
+
And we change what we do with a successfully created program, depending on our new flag:
// run the assembler:
+matchasm::assemble(&name,&content){
+Ok(pgm)=>{
+ifargs.print{
+println!("{:?}",pgm);
+}
+// we succeeded and now have a program with bytecode:
+ifargs.run{
+// lovas was called with `--run`, so create a VM and execute program:
+run(&pgm,&args)?
+}
+Ok(())
+},
+Err(e)=>{
+// Something went wrong during assembly.
+// Convert the error report, so that `anyhow` can do its magic
+// and display some helpful error message:
+Err(Error::from(e))
+},
+}
+
Just printing the program to stdout is no very useful default behaviour for an assembler. It might still come in handy, if you want to see what you are executing, so we make it optional and for the caller to decide with the --print flag. If the --run flag is set, we call run(). So what does run() do?
/// Executes a program in a freshly created lovem VM.
+fnrun(pgm: &Pgm,args: &Cli)-> Result<()>{
+// Create our VM instance.
+letmutvm=VM::new(args.stack_size);
+vm.trace=args.trace;
+letstart=Instant::now();
+letoutcome=vm.run(&pgm.text);
+letduration=start.elapsed();
+matchoutcome{
+Ok(_)=>{
+// Execution successful, program terminated:
+eprintln!("Terminated.\nRuntime={:?}\nop_cnt={}, pc={}, stack-depth={}, watermark={}",
+duration,
+vm.op_cnt,vm.pc,vm.stack.len(),vm.watermark
+);
+Ok(())
+},
+Err(e)=>{
+// Runtime error. Error will be printed on return of main.
+eprintln!("Runtime error!\nRuntime={:?}\nop_cnt={}, pc={}, stack-depth={}, watermark={}",
+duration,vm.op_cnt,vm.pc,vm.stack.len(),vm.watermark);
+Err(Error::from(e))
+}
+}
+}
+
We create a VM instance, and we run the program on it. If there is a RuntimeError, we return it, just as we did with the AsmErrorReport. Back in our examples, we created a VM with a stack size of 100 - simply because we needed a number there. 100 is still the default, but now you can choose the stack size, when calling lovas. If you do
When we were running a program in our VM, we did always get a lot of output during execution. That is nice for understanding, what a stack machine does, but in general it is not a got idea for a VM to do that. It can be very beneficial, if you run into a problem with your program, so it is an easily available tool for debugging. That is why I removed all those log messages from lovem, but I let some in that can be activated, if you set vm.trace = true. That is what we added the new command line parameter --trace for. You can now control, if you want to see it.
There is some output by lovas, after the execution. It reports if the run was successfully terminated (by executing a fin instruction), or if there was a RuntimeError. In both cases it will show you the time the execution took (wallclock time), as well as the number of instructions executed by the VM, the final position of the programm counter, the number of values on the stack at termination, and the highest number of values on the stack at any time during execution (the watermark). This can give you some quick insight on what your program did and maybe where it ran into trouble.
All this lead to some changes to vm.rs, but nothing that should give you any problems to understand. Remember that we have the power of git at our disposal, so you can easily find out what changed in a file between two releases. You could do that for vm.rs with this handy link:
We have written a few example programs so far. Each is its own binary in src/bin/, and all of them consist of the same Rust code of creating a VM and running a program. Only the bytecode changed between them.
I got rid of all of those (except for the most basic one) and translated the programs into assembly programs that live in pgm/. You can now execute those using lovas, like this:
Remember to add --trace to the call, or you won't see very much. It has become a lot easier, to play around with the VM. No more writing bytecode by hand!
You might have noticed that I changed the filename extension that I use for the assembly programs from .lass to .lva. There are multiple reasons, but the main one is, that I thought Lass could be a nice name for a programming language, when I will finally come to writing one for lovem. So I want to reserve the extension for that possible future.
The diagnostic information given after the execution can be interesting, when you mess around. Let us play a bit with the program endless-stack.lva.
# This program runs in an endless loop, but it will push a new value to the stack on every iteration.
+# It will inevitably lead to a stack overrun at some point and crash the program.
+push_u8 123
+goto -5
+fin
+
The program will fill the stack until it is full, and then it will crash:
After 201 executed instructions it crashes. The stack depth at the time of the crash is 100. That is the complete stack, the next instruction tried to push value 101, which must fail. Instruction number 201 did cause the crash. That makes sense, if you follow the execution in your head. And the program counter is on 2. The last instruction executed will be the one before that, which would be at 0. That is the push_u8 instruction. There is no surprise that the watermark is at 100. That is the highest possible value for it and also the current value of out stack depth.
As we can now easily change the stack size, let us try what happens with a bigger stack:
So now the stack overflows at over 150 values, of course. And it takes 301 instructions to fill it. Runtime has been longer, but only about 15%. I would not have expected a rise of 50%, as there is overhead for starting the program.
There is, of course, a lot of output, that I cut out. What is interesting is the change in execution time. I ran this inside the CLion IDE by JetBrains. The console there will not be a very fast console, as it does a lot with that output coming through. But the impact of the logging is enormous! The runtime until we hit our stack overflow is more than 1000 times longer! The exact numbers don't mean anything; we are running unoptimised Rust code with debuginfo, and the bottleneck is the console. But it is still fascinating to see.
The source code for this post can be found under the tag v0.0.9-journey.
\ No newline at end of file
diff --git a/2022-08/what-if.html b/2022-08/what-if.html
new file mode 100644
index 0000000..9430526
--- /dev/null
+++ b/2022-08/what-if.html
@@ -0,0 +1,330 @@
+ What if? - Lovem
Our assembler gives us a lot of convenience for testing features of our VM. So let us start doing interesting stuff with it. We do have support for jumps already, but as it is now, save of an endless loop, there is absolutely no reason to do it, yet. All our programs run their predetermined way. If you look again at label.lva, you can see that none of those gotos introduce any dynamic. We could just ditch them and reorder the rest. It would do the same, only more efficient. They simple tangle up our linear code, without removing its linearity.
Today we will introduce branches to our VM. A branch is a point in a program from which there are multiple possible paths to take. Two paths, normally. Which of those paths is takes is decided at runtime by looking at the state of the program. For us that means that we look at the value on top of the stack. How does it work?
We already introduced the goto operation. What we will add now, works exactly the same way, but only if a certain condition is met. And, yes, we will call that operation if. But if what? How about if equal?
So we get the new opname ifeq, that pops a value from the stack and only executes its jump when that value is equal. Equal to what, you want to know? How about if it is equal to zero. If you want to compare it to a different number, it is easy to subtract that number from your value before you compare it to zero, and you achieve what you need.
/// opcode: Pop value from stack and push it back, twice.
+///
+/// pop: 1, push: 2
+/// oparg: 0
+pubconstDUP: u8=0x03;
+
This one simply duplicates the value on top of the stack, so that there will be another copy of it on top of it. We will use that often when testing values with an if, if we still need the value after testing it. The if will consume the top most value.
And that is all we need to change on our assembler. The way we have written it, it is easy to introduce new operations, when they share the same syntax in assembly and in bytecode as existing ones.
# Demonstrate the conditional jump (a branch)
+# The program has a loop that it executes thrice, before it terminates.
+ push_u8 3
+loop:
+ push_u8 1
+ sub
+ dup
+ ifgt loop
+ pop
+ fin
+
Nice! This is basically a for-loop. Granted, it does not do anything but loop, but you can see how the program counts down from 3 to 0 and after the third time it reaches line 8, it stops jumping back to loop: and advances to the end.
We can increase the number in line 3, and the number of runs increase with it. If we change it to 200, we get this (I ditched the --trace for this).
Takes about have a second to execute, over 4000000 operations where executed. And the stack never held more than 2 values, as you can see by the watermark. We are programming!
Wait a second! Our only way of getting values on the stack is push_u8. That can only push a u8, so only values 0 - 255. How did I push that 1000000 there?
The source code for this post can be found under the tag v0.0.11-journey.
\ No newline at end of file
diff --git a/2022-08/you-labeled-me-i-ll-label-you.html b/2022-08/you-labeled-me-i-ll-label-you.html
new file mode 100644
index 0000000..10aa9d7
--- /dev/null
+++ b/2022-08/you-labeled-me-i-ll-label-you.html
@@ -0,0 +1,230 @@
+ You labeled me, I'll label you - Lovem
We add a feature to our assembler that we overlooked before.
Over the last few entries we created ourselves a really useful little assembler program. I hope you played around with it and enjoyed not having to write bytecode directly. If you did, you should have noticed that I left out a really important detail. Remember when I was complaining about how bad writing bytecode is? And that it got even worth, when we introduced jumps? Yeah, I did not solve that problem at all. If anything, I made it worse, because you still have to count the relative bytes to your destination, but you do not see those bytes any longer. You just have to know, how many bytes each instruction will produce.
There was so much already going on in that assembler program, that I did not want to introduce more complexity up front. Let's fix that now: we will introduce a way to give a position inside your program a name, so that you can goto that name later. And in good tradition, we will call this names labels.
The traditional way of defining labels in assembly is by writing them first thing on a line, followed by a colon :. Take a look at this little program, label.lva. It is neither good style, nor does it do anything useful, but it shows us labels:
# A small demonstration of how labels work with goto.
+push_u8 1
+goto coda
+
+back:
+ push_u8 3
+ fin
+
+ coda: push_u8 2
+goto back
+
There are two labels defined here: back in line 5, and coda in line 9. A label definition is a short string that is directly followed by a colon :. We restrict it to letters, numbers, and underscore, with a letter at the front. For the curious, the regex is: ^[A-Za-z][0-9A-Za-z_]{0,31}$. As you can see in the example, there can be an optional instruction in the same line as the label definition. Now, how will our assembler parse those?
First of all, I did a little reconstruction inside asm.rs, because I did not like how the parsing was done inside an associated function, that also created the AsmPgm instance. That seems messed up. After the change, the fn assemble() creates the instance itself and then calls a method on it, to parse the source code. Here is the new version:
/// Parse assembly source code and turn it into a runnable program (or create report).
+pubfnassemble(name: &str,content: &str)-> Result<Pgm,AsmErrorReport>{
+// create a new, clean instance to fill during parsing:
+letmutasm_pgm=AsmPgm{
+name: String::from(name),
+instructions: vec![],
+line_number: 0,
+text_pos: 0,
+error: None,
+labels: Default::default(),
+};
+// evaluate the source code:
+asm_pgm.process_assembly(content);
+// convert to Pgm instance if successful, or to Error Report, if assembly failed:
+asm_pgm.to_program()
+}
+
And there is no problem with us changing the code like this. The only public function inside asm.rs is that pub fn assemble(). All methods of AsmPgm are private and therefore internal detail. Not that it would matter at this state of development, but it demonstrates how separation of public API and internal implementation work.
What is also new in that function is a new field inside AsmPgm: labels.
/// A assembler program during parsing/assembling.
+#[derive(Debug)]
+structAsmPgm{
+...
+/// A map storing label definitions by name with there position in bytecode.
+labels: HashMap<String,usize>,
+}
+
It is a HashMap (aka. associative array in other languages). This is where we put all label definitions we find, while parsing the source file. It maps the label's name to its position inside the bytecode. Here we can look up where to jump, for a goto that wants to jump to a label.
fnprocess(&mutself,content: &str)-> Result<(),AsmError>{
+// Go over complete source, extracting instructions. Some will have their opargs
+// left empty (with placeholders).
+self.parse(content)?;
+self.update_instructions()
+}
+
+/// Process assembly source code. Must be used with "empty" AsmPgm.
+fnprocess_assembly(&mutself,content: &str){
+// this function is just a wrapper around `process()`, so that I can use the
+// return magic and don't need to write the error check twice.
+ifletErr(e)=self.process(content){
+self.error=Some(e);
+}
+}
+
The important part is, that we have to steps now. We parse the complete source, as before. The second run is needed to write the actual relative jump address to the instructions. We do not know them during parsing, at least not for jumps forward.
/// Parses and extracts optional label definition from line.
+///
+/// Looks for a colon ':'. If one exists, the part before the first colon will be
+/// seen as the name for a label, that is defined on this line. Instructions inside
+/// the program that execute jumps can refer to these labels as a destination.
+/// Lines containing a label definition may also contain an instruction and/or a comment.
+/// This can return `AsmError::InvalidLabel` if the part before the colon is not a valid
+/// label name, or `AsmError::DuplicateLabel` if a label name is reused.
+/// If a label could be parsed, it will be stored to the `AsmPgm`.
+/// On success, the line without the label definition is returned, so that it can be
+/// used to extract an instruction. This will be the complete line, if there was no
+/// label definition.
+fnparse_label_definition<'a>(&mutself,line: &'astr)-> Result<&'astr,AsmError>{
+ifletSome((label,rest))=line.split_once(":"){
+letlabel=label.trim_start();
+ifVALID_LABEL.is_match(label){
+ifself.labels.contains_key(label){
+Err(AsmError::DuplicateLabel(String::from(label)))
+}else{
+self.labels.insert(String::from(label),self.text_pos);
+Ok(rest)
+}
+}else{
+Err(AsmError::InvalidLabel(String::from(label)))
+}
+}else{
+Ok(line)
+}
+}
+
The method is trying to find a label definition in the line, and if so, handles it. We use our trusted Result<> returning, to communicate potential errors. But instead of Ok(()), which is the empty okay value, we return a &str on success. This is because there might also be an instruction in the line. If we find a label definition, it returns the line after the colon. If there is none, it returns the complete line it got. This gives us the lines as we used to get before we introduced labels. Great. But what is that weird 'a that shows up in that highlighted line everywhere?
Yeah, this is where it becomes rusty, again. I said, in an early post, that you would hate the Rust compiler and its pedantic error messages. The thing Rust is most pedantic about, is ownership and access to values you do not own. We are working with references to Strings here. A &str references the bytes inside that String directly (a &str need not reference a String, but it does here). We did that before, where is the problem now? This is the first time we are returning a &str.
When you are using references, Rust makes sure that the value you are referencing exists at least as long as the reference exists. That is easy for functions, as long as you drop every reference you have when you are done. But in this function, we return a reference to the parameter we got. Rust cannot allow that without some special care. When I remove the 'a parts of the method, I get a compilation error:
error[E0623]: lifetime mismatch
+ --> src/asm.rs:277:21
+ |
+269 | fn parse_label_definition(&mut self, line: &str) -> Result<&str, AsmError> {
+ | ---- ----------------------
+ | |
+ | this parameter and the return type are declared with different lifetimes...
+...
+277 | Ok(rest)
+ | ^^^^^^^^ ...but data from `line` is returned here
+ |
+ = note: each elided lifetime in input position becomes a distinct lifetime
+help: consider introducing a named lifetime parameter and update trait if needed
+ |
+269 | fn parse_label_definition<'a>(&'a mut self, line: &'a str) -> Result<&str, AsmError> {
+ | ++++ ++ ++
+
The compiler tells me, that I messed up the lifetimes. It even proposes a change that introduces lifetime parameters (but gets it slightly wrong). What do we do with the 'a?
Well we introduce a lifetime parameter called a. The syntax for that is the apostrophe, which looked weird to me at start, but it is so lightweight, that I came to like it. It is custom, to just call your lifetimes 'a, 'b, ... – they normally don't have a long scope anyway. The thing we are telling the compiler with this parameter is this: the lifetime of the returned &str is dependent on the lifetime of the parameter line: &str. So whenever the reference the function is called with runs out of scope, the reference that was returned must be out of scope as well.
This is a concept that is new to many programmers when they learn Rust. I think, what we do here demonstrates it quiet well. Let us look at what happens for line 9 of our assembly program:
Our function receives a reference to a String holding that line: " coda: push_u8 2". It finds the label coda and stores it inside self.labels. Its work is done, but there might be more to this line. It returns a reference to a substring of it (&str are actually slices; they can reference only a part of a String's data). That is what we return, a reference to the part data inside the String, starting at the first char after the colon, so it looks like this " push_u8 2". It is not a copy, it is the same area inside the computer's memory! So if you want to make certain, that there are no accesses to memory after its content has run out of scope (use after free, or use of local variable after it runs our of scope), you must not allow access to it, unless you are sure the value still exists. And this is what Rust does. This is what makes Rust a secure language. Many bugs and exploits in the world exist, because most languages do not check this, but leave the responsibility to the programmer. And the really cool thing about Rust is, it does this completely at compile time, as you can see by the fact that we got a compiler error.
The way we call our function is not a problem at all:
Our initial line comes from line 228. It is already a reference, because content.lines() is also giving us a reference to the memory inside of content. That is a reference already, the String variable that holds (and owns) the data lives inside lovas.rs:
// read complete source file into String:
+letcontent=std::fs::read_to_string(&args.source)
+.with_context(
+||format!("could not read file `{}`",&name)
+)?;
+// run the assembler:
+matchasm::assemble(&name,&content){
+
We do not copy any of that bytes along the way. The first time we do that is in clean_line(). Returning a &str will not work there, because we actually modify the contents of the string, by replacing characters inside it. Have you ever tried to work with inplace "substrings" (I mean char arrays, like this char *str), without modifying the contents (placing \0 bytes). It is not fun. In Rust, it can be, if you understand lifetime restrictions.
If you run into problems with your &str inside a Rust program, there is often an easy way to get around that. You can simply create a new String from your &str, as we do in clean_line(). That will copy the bytes. For our program, that would have been no problem at all. Cloning a few bytes of source code for every line during assembly would cost us next to nothing. You would not notice in execution time. But things are different when you need to quickly handle long substrings in a program. Think of a diagnostic job on a busy server. And remember that Strings will be created on the heap. That is a complexity that you sometimes want to avoid. When programming microcontrollers, there is a chance that you do not even have a memory allocator at your disposal. And microcontrollers is, what we are aiming for in our project. There are already some parts of lovem, that we will need to change, because of that. But that is a story for another time. I just thought that this was a nice little example to introduce you to lifetime parameters. We will need them at some point...
This is a long entry already. You can look at the complete state of the assembler directly in the sourcecode. You should know how to find the tags inside the repo by now. But I want to execute our new program, using the labels, before I end this. Here it is again:
The program has three push_u8 operations. If you executed them in the order of the source code, they would push [1, 3, 2] to the stack. But because of the goto instructions, they are not executed in that order. You can see the jumps in the trace, and you can see that the stack at termination holds the values in this order: [1, 2, 3].
Not much of a program, but it shows you, how our new labels work. And finally: no more counting bytes!
\ No newline at end of file
diff --git a/2024-02/ALL.html b/2024-02/ALL.html
new file mode 100644
index 0000000..870eae3
--- /dev/null
+++ b/2024-02/ALL.html
@@ -0,0 +1,104 @@
+ February 2024 complete - Lovem
Yes, I have been gone for a while. More than a year, in fact. The project – lovem – is not dead, however. In fact, I even have multiple posts already written, that I just need to publish. So let's start doing that. After this short intermission, I will publish an additional entry in the journey, that will take us further along the path to creating our VM.
To be quite honest – I dated this entry back to yesterday. The reason is, that my journal, as I currently run it, does not really support multiple entries on the same day. Yes, I could simply add a time to the publication date, but that breaks continuity. And I don't plan to normally release multiple entries on the same day, as I want to keep the pace not too high. One post every two or three days is what I aim for, just the way I used to have it.
A few things have changed in the meantime. For reasons that I have no desire to explain, I have removed the link to my Twitter account from the journal, and replaced it with a link to my Mastodon account. You can find me under @kratenko@chaos.social there. I also used to announce new entries over Twitter. I guess I will move that over to Mastodon as well. I guess we will see how that goes.
But now let's get back to the journey. We will next implement a simple feature, that makes allows the VM to limit the processing time of a program – which can be very useful, especially when running user supplied code inside the machine. Building an endless loop inside a turing complete (or not even) language is quite easy. Having an embedded device stuck in an endless loop is often a catastrophe...
We introduce an optional execution limit to our VM.
Since we have goto, we can write looping programs. With if* we have potentially looping programs as well. Both of this open the potential for endless loops. There are situations, in which endless loops are required. But often they are something to be avoided.
## Looping a looooong time.
+## This program will not run forever, but you will not see it terminate either.
+ push_u8 0
+loop:
+ push_u8 1
+ add
+ dup
+ ifgt loop
+ pop
+ fin
+
Someone messed up the loop condition there. If you run this program, it will be running for a long time. We start at zero and add to the value until our number is smaller than 0. Sounds impossible to reach for normal people, programmers will now better. Eventually we will reach the integer overflow, and our signed integer will loop around from its highest possible value to the lowest possible one. But do remember, what type we currently use to store our values: i64. So how big is that highest number?
9223372036854775807
+
Is that a lot? That depends. Last entry I had my program loop for 1 million rounds. It took my modern laptop about half a second. So reaching that number should take 9223372036854.775807 times as long, that is around 4611686018427 seconds or just about 146135 years. Is that a lot?
Oh, and by the way, the Rust professionals reading this will have spotted a potentially false claim there. While we run our program in debug mode, there will be no integer wraparound, instead the program will panic. If we build our Rust program in release mode, we will have integer wraparound, and will (theoretically) eventually reach the end of our loop. But that is besides the point.
The reason I started writing lovem, is that I need an embeddable lightweight VM to execute programmable handlers when certain events occur on my restrained embedded devices. So we are talking about some form of user generated content that is executed as a program! We can never trust those programs to be solid. We need a way to limit execution in some way, so that the device has the possibility to terminate those programs. There is an easy way to achieve that with what we already have. We put a limit on the number of operations the VM will execute.
// Loop going through the whole program, one instruction at a time.
+loop{
+// Log the vm's complete state, so we can follow what happens in console:
+ifself.trace{
+println!("{:?}",self);
+}
+// Fetch next opcode from program (increases program counter):
+letopcode=self.fetch_u8(pgm)?;
+// Limit execution by number of instructions that will be executed:
+ifself.instruction_limit!=0&&self.op_cnt>=self.instruction_limit{
+returnErr(RuntimeError::InstructionLimitExceeded);
+}
+// We count the number of instructions we execute:
+self.op_cnt+=1;
+// If we are done, break loop and stop execution:
+ifopcode==op::FIN{
+break;
+}
+// Execute the current instruction (with the opcode we loaded already):
+self.execute_op(pgm,opcode)?;
+}
+
And of course we also add that new RuntimeError::InstructionLimitExceeded and a new field pub instruction_limit: usize, to our VM struct.
##[clap(long, default_value_t = 1000000, help = "Limit max number of instructions allowed for execution. 0 for unlimited.")]
+instruction_limit: usize,
+
And we need to pass that to the VM in the run() function:
/// Executes a program in a freshly created lovem VM.
+fnrun(pgm: &Pgm,args: &Cli)-> Result<()>{
+// Create our VM instance.
+letmutvm=VM::new(args.stack_size);
+vm.trace=args.trace;
+vm.instruction_limit=args.instruction_limit;
+letstart=Instant::now();
+letoutcome=vm.run(&pgm.text);
+letduration=start.elapsed();
+...
+
And, well, that's it. We now have an optional execution limitation that we default at 1 million.
\ No newline at end of file
diff --git a/2024-02/NAV.html b/2024-02/NAV.html
new file mode 100644
index 0000000..10cb2a5
--- /dev/null
+++ b/2024-02/NAV.html
@@ -0,0 +1 @@
+ NAV - Lovem
\ No newline at end of file
diff --git a/2024-02/index.html b/2024-02/index.html
new file mode 100644
index 0000000..4ff76a4
--- /dev/null
+++ b/2024-02/index.html
@@ -0,0 +1 @@
+ Journal entries from February 2024 - Lovem
We introduce an optional execution limit to our VM.
Since we have goto, we can write looping programs. With if* we have potentially looping programs as well. Both of this open the potential for endless loops. There are situations, in which endless loops are required. But often they are something to be avoided.
Yes, I have been gone for a while. More than a year, in fact. The project – lovem – is not dead, however. In fact, I even have multiple posts already written, that I just need to publish. So let's start doing that. After this short intermission, I will publish an additional entry in the journey, that will take us further along the path to creating our VM.
\ No newline at end of file
diff --git a/2024-02/it-has-been-a-while.html b/2024-02/it-has-been-a-while.html
new file mode 100644
index 0000000..c14d7d0
--- /dev/null
+++ b/2024-02/it-has-been-a-while.html
@@ -0,0 +1,2 @@
+ It has been a while... - Lovem
Yes, I have been gone for a while. More than a year, in fact. The project – lovem – is not dead, however. In fact, I even have multiple posts already written, that I just need to publish. So let's start doing that. After this short intermission, I will publish an additional entry in the journey, that will take us further along the path to creating our VM.
To be quite honest – I dated this entry back to yesterday. The reason is, that my journal, as I currently run it, does not really support multiple entries on the same day. Yes, I could simply add a time to the publication date, but that breaks continuity. And I don't plan to normally release multiple entries on the same day, as I want to keep the pace not too high. One post every two or three days is what I aim for, just the way I used to have it.
A few things have changed in the meantime. For reasons that I have no desire to explain, I have removed the link to my Twitter account from the journal, and replaced it with a link to my Mastodon account. You can find me under @kratenko@chaos.social there. I also used to announce new entries over Twitter. I guess I will move that over to Mastodon as well. I guess we will see how that goes.
But now let's get back to the journey. We will next implement a simple feature, that makes allows the VM to limit the processing time of a program – which can be very useful, especially when running user supplied code inside the machine. Building an endless loop inside a turing complete (or not even) language is quite easy. Having an embedded device stuck in an endless loop is often a catastrophe...
\ No newline at end of file
diff --git a/2024-02/stop-right-there-that-s-far-enough.html b/2024-02/stop-right-there-that-s-far-enough.html
new file mode 100644
index 0000000..7025d1d
--- /dev/null
+++ b/2024-02/stop-right-there-that-s-far-enough.html
@@ -0,0 +1,105 @@
+ Stop right there, that's far enough! - Lovem
We introduce an optional execution limit to our VM.
Since we have goto, we can write looping programs. With if* we have potentially looping programs as well. Both of this open the potential for endless loops. There are situations, in which endless loops are required. But often they are something to be avoided.
# Looping a looooong time.
+# This program will not run forever, but you will not see it terminate either.
+ push_u8 0
+loop:
+ push_u8 1
+ add
+ dup
+ ifgt loop
+ pop
+ fin
+
Someone messed up the loop condition there. If you run this program, it will be running for a long time. We start at zero and add to the value until our number is smaller than 0. Sounds impossible to reach for normal people, programmers will now better. Eventually we will reach the integer overflow, and our signed integer will loop around from its highest possible value to the lowest possible one. But do remember, what type we currently use to store our values: i64. So how big is that highest number?
9223372036854775807
+
Is that a lot? That depends. Last entry I had my program loop for 1 million rounds. It took my modern laptop about half a second. So reaching that number should take 9223372036854.775807 times as long, that is around 4611686018427 seconds or just about 146135 years. Is that a lot?
Oh, and by the way, the Rust professionals reading this will have spotted a potentially false claim there. While we run our program in debug mode, there will be no integer wraparound, instead the program will panic. If we build our Rust program in release mode, we will have integer wraparound, and will (theoretically) eventually reach the end of our loop. But that is besides the point.
The reason I started writing lovem, is that I need an embeddable lightweight VM to execute programmable handlers when certain events occur on my restrained embedded devices. So we are talking about some form of user generated content that is executed as a program! We can never trust those programs to be solid. We need a way to limit execution in some way, so that the device has the possibility to terminate those programs. There is an easy way to achieve that with what we already have. We put a limit on the number of operations the VM will execute.
// Loop going through the whole program, one instruction at a time.
+loop{
+// Log the vm's complete state, so we can follow what happens in console:
+ifself.trace{
+println!("{:?}",self);
+}
+// Fetch next opcode from program (increases program counter):
+letopcode=self.fetch_u8(pgm)?;
+// Limit execution by number of instructions that will be executed:
+ifself.instruction_limit!=0&&self.op_cnt>=self.instruction_limit{
+returnErr(RuntimeError::InstructionLimitExceeded);
+}
+// We count the number of instructions we execute:
+self.op_cnt+=1;
+// If we are done, break loop and stop execution:
+ifopcode==op::FIN{
+break;
+}
+// Execute the current instruction (with the opcode we loaded already):
+self.execute_op(pgm,opcode)?;
+}
+
And of course we also add that new RuntimeError::InstructionLimitExceeded and a new field pub instruction_limit: usize, to our VM struct.
#[clap(long, default_value_t = 1000000, help = "Limit max number of instructions allowed for execution. 0 for unlimited.")]
+instruction_limit: usize,
+
And we need to pass that to the VM in the run() function:
/// Executes a program in a freshly created lovem VM.
+fnrun(pgm: &Pgm,args: &Cli)-> Result<()>{
+// Create our VM instance.
+letmutvm=VM::new(args.stack_size);
+vm.trace=args.trace;
+vm.instruction_limit=args.instruction_limit;
+letstart=Instant::now();
+letoutcome=vm.run(&pgm.text);
+letduration=start.elapsed();
+...
+
And, well, that's it. We now have an optional execution limitation that we default at 1 million.
\ No newline at end of file
diff --git a/404.html b/404.html
new file mode 100644
index 0000000..5574cbb
--- /dev/null
+++ b/404.html
@@ -0,0 +1 @@
+ Lovem
\n )\n}\n", "/*\n * Copyright (c) 2016-2022 Martin Donath \n *\n * Permission is hereby granted, free of charge, to any person obtaining a copy\n * of this software and associated documentation files (the \"Software\"), to\n * deal in the Software without restriction, including without limitation the\n * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or\n * sell copies of the Software, and to permit persons to whom the Software is\n * furnished to do so, subject to the following conditions:\n *\n * The above copyright notice and this permission notice shall be included in\n * all copies or substantial portions of the Software.\n *\n * THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\n * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\n * FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL THE\n * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\n * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING\n * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS\n * IN THE SOFTWARE.\n */\n\nimport { h } from \"~/utilities\"\n\nimport { renderTooltip } from \"../tooltip\"\n\n/* ----------------------------------------------------------------------------\n * Functions\n * ------------------------------------------------------------------------- */\n\n/**\n * Render an annotation\n *\n * @param id - Annotation identifier\n * @param prefix - Tooltip identifier prefix\n *\n * @returns Element\n */\nexport function renderAnnotation(\n id: string | number, prefix?: string\n): HTMLElement {\n prefix = prefix ? `${prefix}_annotation_${id}` : undefined\n\n /* Render tooltip with anchor, if given */\n if (prefix) {\n const anchor = prefix ? `#${prefix}` : undefined\n return (\n \n )\n } else {\n return (\n \n )\n }\n}\n", "/*\n * Copyright (c) 2016-2022 Martin Donath \n *\n * Permission is hereby granted, free of charge, to any person obtaining a copy\n * of this software and associated documentation files (the \"Software\"), to\n * deal in the Software without restriction, including without limitation the\n * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or\n * sell copies of the Software, and to permit persons to whom the Software is\n * furnished to do so, subject to the following conditions:\n *\n * The above copyright notice and this permission notice shall be included in\n * all copies or substantial portions of the Software.\n *\n * THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\n * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\n * FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL THE\n * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\n * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING\n * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS\n * IN THE SOFTWARE.\n */\n\nimport { translation } from \"~/_\"\nimport { h } from \"~/utilities\"\n\n/* ----------------------------------------------------------------------------\n * Functions\n * ------------------------------------------------------------------------- */\n\n/**\n * Render a 'copy-to-clipboard' button\n *\n * @param id - Unique identifier\n *\n * @returns Element\n */\nexport function renderClipboardButton(id: string): HTMLElement {\n return (\n \n )\n}\n", "/*\n * Copyright (c) 2016-2022 Martin Donath \n *\n * Permission is hereby granted, free of charge, to any person obtaining a copy\n * of this software and associated documentation files (the \"Software\"), to\n * deal in the Software without restriction, including without limitation the\n * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or\n * sell copies of the Software, and to permit persons to whom the Software is\n * furnished to do so, subject to the following conditions:\n *\n * The above copyright notice and this permission notice shall be included in\n * all copies or substantial portions of the Software.\n *\n * THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\n * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\n * FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL THE\n * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\n * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING\n * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS\n * IN THE SOFTWARE.\n */\n\nimport { ComponentChild } from \"preact\"\n\nimport { configuration, feature, translation } from \"~/_\"\nimport {\n SearchDocument,\n SearchMetadata,\n SearchResultItem\n} from \"~/integrations/search\"\nimport { h, truncate } from \"~/utilities\"\n\n/* ----------------------------------------------------------------------------\n * Helper types\n * ------------------------------------------------------------------------- */\n\n/**\n * Render flag\n */\nconst enum Flag {\n TEASER = 1, /* Render teaser */\n PARENT = 2 /* Render as parent */\n}\n\n/* ----------------------------------------------------------------------------\n * Helper function\n * ------------------------------------------------------------------------- */\n\n/**\n * Render a search document\n *\n * @param document - Search document\n * @param flag - Render flags\n *\n * @returns Element\n */\nfunction renderSearchDocument(\n document: SearchDocument & SearchMetadata, flag: Flag\n): HTMLElement {\n const parent = flag & Flag.PARENT\n const teaser = flag & Flag.TEASER\n\n /* Render missing query terms */\n const missing = Object.keys(document.terms)\n .filter(key => !document.terms[key])\n .reduce((list, key) => [\n ...list, {key}, \" \"\n ], [])\n .slice(0, -1)\n\n /* Assemble query string for highlighting */\n const url = new URL(document.location)\n if (feature(\"search.highlight\"))\n url.searchParams.set(\"h\", Object.entries(document.terms)\n .filter(([, match]) => match)\n .reduce((highlight, [value]) => `${highlight} ${value}`.trim(), \"\")\n )\n\n /* Render article or section, depending on flags */\n const { tags } = configuration()\n return (\n \n \n {parent > 0 && }\n
\n }\n \n \n )\n}\n\n/* ----------------------------------------------------------------------------\n * Functions\n * ------------------------------------------------------------------------- */\n\n/**\n * Render a search result\n *\n * @param result - Search result\n *\n * @returns Element\n */\nexport function renderSearchResultItem(\n result: SearchResultItem\n): HTMLElement {\n const threshold = result[0].score\n const docs = [...result]\n\n /* Find and extract parent article */\n const parent = docs.findIndex(doc => !doc.location.includes(\"#\"))\n const [article] = docs.splice(parent, 1)\n\n /* Determine last index above threshold */\n let index = docs.findIndex(doc => doc.score < threshold)\n if (index === -1)\n index = docs.length\n\n /* Partition sections */\n const best = docs.slice(0, index)\n const more = docs.slice(index)\n\n /* Render children */\n const children = [\n renderSearchDocument(article, Flag.PARENT | +(!parent && index === 0)),\n ...best.map(section => renderSearchDocument(section, Flag.TEASER)),\n ...more.length ? [\n \n \n {more.length > 0 && more.length === 1\n ? translation(\"search.result.more.one\")\n : translation(\"search.result.more.other\", more.length)\n }\n \n {...more.map(section => renderSearchDocument(section, Flag.TEASER))}\n \n ] : []\n ]\n\n /* Render search result */\n return (\n
\n {children}\n
\n )\n}\n", "/*\n * Copyright (c) 2016-2022 Martin Donath \n *\n * Permission is hereby granted, free of charge, to any person obtaining a copy\n * of this software and associated documentation files (the \"Software\"), to\n * deal in the Software without restriction, including without limitation the\n * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or\n * sell copies of the Software, and to permit persons to whom the Software is\n * furnished to do so, subject to the following conditions:\n *\n * The above copyright notice and this permission notice shall be included in\n * all copies or substantial portions of the Software.\n *\n * THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\n * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\n * FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL THE\n * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\n * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING\n * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS\n * IN THE SOFTWARE.\n */\n\nimport { SourceFacts } from \"~/components\"\nimport { h, round } from \"~/utilities\"\n\n/* ----------------------------------------------------------------------------\n * Functions\n * ------------------------------------------------------------------------- */\n\n/**\n * Render repository facts\n *\n * @param facts - Repository facts\n *\n * @returns Element\n */\nexport function renderSourceFacts(facts: SourceFacts): HTMLElement {\n return (\n
\n {typeof value === \"number\" ? round(value) : value}\n
\n ))}\n
\n )\n}\n", "/*\n * Copyright (c) 2016-2022 Martin Donath \n *\n * Permission is hereby granted, free of charge, to any person obtaining a copy\n * of this software and associated documentation files (the \"Software\"), to\n * deal in the Software without restriction, including without limitation the\n * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or\n * sell copies of the Software, and to permit persons to whom the Software is\n * furnished to do so, subject to the following conditions:\n *\n * The above copyright notice and this permission notice shall be included in\n * all copies or substantial portions of the Software.\n *\n * THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\n * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\n * FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL THE\n * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\n * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING\n * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS\n * IN THE SOFTWARE.\n */\n\nimport { h } from \"~/utilities\"\n\n/* ----------------------------------------------------------------------------\n * Helper types\n * ------------------------------------------------------------------------- */\n\n/**\n * Tabbed control type\n */\ntype TabbedControlType =\n | \"prev\"\n | \"next\"\n\n/* ----------------------------------------------------------------------------\n * Functions\n * ------------------------------------------------------------------------- */\n\n/**\n * Render control for content tabs\n *\n * @param type - Control type\n *\n * @returns Element\n */\nexport function renderTabbedControl(\n type: TabbedControlType\n): HTMLElement {\n const classes = `tabbed-control tabbed-control--${type}`\n return (\n
\n \n
\n )\n}\n", "/*\n * Copyright (c) 2016-2022 Martin Donath \n *\n * Permission is hereby granted, free of charge, to any person obtaining a copy\n * of this software and associated documentation files (the \"Software\"), to\n * deal in the Software without restriction, including without limitation the\n * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or\n * sell copies of the Software, and to permit persons to whom the Software is\n * furnished to do so, subject to the following conditions:\n *\n * The above copyright notice and this permission notice shall be included in\n * all copies or substantial portions of the Software.\n *\n * THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\n * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\n * FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL THE\n * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\n * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING\n * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS\n * IN THE SOFTWARE.\n */\n\nimport { h } from \"~/utilities\"\n\n/* ----------------------------------------------------------------------------\n * Functions\n * ------------------------------------------------------------------------- */\n\n/**\n * Render a table inside a wrapper to improve scrolling on mobile\n *\n * @param table - Table element\n *\n * @returns Element\n */\nexport function renderTable(table: HTMLElement): HTMLElement {\n return (\n
\n
\n {table}\n
\n
\n )\n}\n", "/*\n * Copyright (c) 2016-2022 Martin Donath \n *\n * Permission is hereby granted, free of charge, to any person obtaining a copy\n * of this software and associated documentation files (the \"Software\"), to\n * deal in the Software without restriction, including without limitation the\n * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or\n * sell copies of the Software, and to permit persons to whom the Software is\n * furnished to do so, subject to the following conditions:\n *\n * The above copyright notice and this permission notice shall be included in\n * all copies or substantial portions of the Software.\n *\n * THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\n * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\n * FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL THE\n * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\n * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING\n * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS\n * IN THE SOFTWARE.\n */\n\nimport { configuration, translation } from \"~/_\"\nimport { h } from \"~/utilities\"\n\n/* ----------------------------------------------------------------------------\n * Types\n * ------------------------------------------------------------------------- */\n\n/**\n * Version\n */\nexport interface Version {\n version: string /* Version identifier */\n title: string /* Version title */\n aliases: string[] /* Version aliases */\n}\n\n/* ----------------------------------------------------------------------------\n * Helper functions\n * ------------------------------------------------------------------------- */\n\n/**\n * Render a version\n *\n * @param version - Version\n *\n * @returns Element\n */\nfunction renderVersion(version: Version): HTMLElement {\n const config = configuration()\n\n /* Ensure trailing slash - see https://bit.ly/3rL5u3f */\n const url = new URL(`../${version.version}/`, config.base)\n return (\n
\n )\n}\n\n/* ----------------------------------------------------------------------------\n * Functions\n * ------------------------------------------------------------------------- */\n\n/**\n * Render a version selector\n *\n * @param versions - Versions\n * @param active - Active version\n *\n * @returns Element\n */\nexport function renderVersionSelector(\n versions: Version[], active: Version\n): HTMLElement {\n return (\n
\n \n
\n {versions.map(renderVersion)}\n
\n
\n )\n}\n", "/*\n * Copyright (c) 2016-2022 Martin Donath \n *\n * Permission is hereby granted, free of charge, to any person obtaining a copy\n * of this software and associated documentation files (the \"Software\"), to\n * deal in the Software without restriction, including without limitation the\n * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or\n * sell copies of the Software, and to permit persons to whom the Software is\n * furnished to do so, subject to the following conditions:\n *\n * The above copyright notice and this permission notice shall be included in\n * all copies or substantial portions of the Software.\n *\n * THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\n * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\n * FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL THE\n * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\n * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING\n * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS\n * IN THE SOFTWARE.\n */\n\nimport {\n Observable,\n Subject,\n animationFrameScheduler,\n auditTime,\n combineLatest,\n debounceTime,\n defer,\n delay,\n filter,\n finalize,\n fromEvent,\n map,\n merge,\n switchMap,\n take,\n takeLast,\n takeUntil,\n tap,\n throttleTime,\n withLatestFrom\n} from \"rxjs\"\n\nimport {\n ElementOffset,\n getActiveElement,\n getElementSize,\n watchElementContentOffset,\n watchElementFocus,\n watchElementOffset,\n watchElementVisibility\n} from \"~/browser\"\n\nimport { Component } from \"../../../_\"\n\n/* ----------------------------------------------------------------------------\n * Types\n * ------------------------------------------------------------------------- */\n\n/**\n * Annotation\n */\nexport interface Annotation {\n active: boolean /* Annotation is active */\n offset: ElementOffset /* Annotation offset */\n}\n\n/* ----------------------------------------------------------------------------\n * Helper types\n * ------------------------------------------------------------------------- */\n\n/**\n * Mount options\n */\ninterface MountOptions {\n target$: Observable /* Location target observable */\n}\n\n/* ----------------------------------------------------------------------------\n * Functions\n * ------------------------------------------------------------------------- */\n\n/**\n * Watch annotation\n *\n * @param el - Annotation element\n * @param container - Containing element\n *\n * @returns Annotation observable\n */\nexport function watchAnnotation(\n el: HTMLElement, container: HTMLElement\n): Observable {\n const offset$ = defer(() => combineLatest([\n watchElementOffset(el),\n watchElementContentOffset(container)\n ]))\n .pipe(\n map(([{ x, y }, scroll]): ElementOffset => {\n const { width, height } = getElementSize(el)\n return ({\n x: x - scroll.x + width / 2,\n y: y - scroll.y + height / 2\n })\n })\n )\n\n /* Actively watch annotation on focus */\n return watchElementFocus(el)\n .pipe(\n switchMap(active => offset$\n .pipe(\n map(offset => ({ active, offset })),\n take(+!active || Infinity)\n )\n )\n )\n}\n\n/**\n * Mount annotation\n *\n * @param el - Annotation element\n * @param container - Containing element\n * @param options - Options\n *\n * @returns Annotation component observable\n */\nexport function mountAnnotation(\n el: HTMLElement, container: HTMLElement, { target$ }: MountOptions\n): Observable> {\n const [tooltip, index] = Array.from(el.children)\n\n /* Mount component on subscription */\n return defer(() => {\n const push$ = new Subject()\n const done$ = push$.pipe(takeLast(1))\n push$.subscribe({\n\n /* Handle emission */\n next({ offset }) {\n el.style.setProperty(\"--md-tooltip-x\", `${offset.x}px`)\n el.style.setProperty(\"--md-tooltip-y\", `${offset.y}px`)\n },\n\n /* Handle complete */\n complete() {\n el.style.removeProperty(\"--md-tooltip-x\")\n el.style.removeProperty(\"--md-tooltip-y\")\n }\n })\n\n /* Start animation only when annotation is visible */\n watchElementVisibility(el)\n .pipe(\n takeUntil(done$)\n )\n .subscribe(visible => {\n el.toggleAttribute(\"data-md-visible\", visible)\n })\n\n /* Toggle tooltip presence to mitigate empty lines when copying */\n merge(\n push$.pipe(filter(({ active }) => active)),\n push$.pipe(debounceTime(250), filter(({ active }) => !active))\n )\n .subscribe({\n\n /* Handle emission */\n next({ active }) {\n if (active)\n el.prepend(tooltip)\n else\n tooltip.remove()\n },\n\n /* Handle complete */\n complete() {\n el.prepend(tooltip)\n }\n })\n\n /* Toggle tooltip visibility */\n push$\n .pipe(\n auditTime(16, animationFrameScheduler)\n )\n .subscribe(({ active }) => {\n tooltip.classList.toggle(\"md-tooltip--active\", active)\n })\n\n /* Track relative origin of tooltip */\n push$\n .pipe(\n throttleTime(125, animationFrameScheduler),\n filter(() => !!el.offsetParent),\n map(() => el.offsetParent!.getBoundingClientRect()),\n map(({ x }) => x)\n )\n .subscribe({\n\n /* Handle emission */\n next(origin) {\n if (origin)\n el.style.setProperty(\"--md-tooltip-0\", `${-origin}px`)\n else\n el.style.removeProperty(\"--md-tooltip-0\")\n },\n\n /* Handle complete */\n complete() {\n el.style.removeProperty(\"--md-tooltip-0\")\n }\n })\n\n /* Allow to copy link without scrolling to anchor */\n fromEvent(index, \"click\")\n .pipe(\n takeUntil(done$),\n filter(ev => !(ev.metaKey || ev.ctrlKey))\n )\n .subscribe(ev => ev.preventDefault())\n\n /* Allow to open link in new tab or blur on close */\n fromEvent(index, \"mousedown\")\n .pipe(\n takeUntil(done$),\n withLatestFrom(push$)\n )\n .subscribe(([ev, { active }]) => {\n\n /* Open in new tab */\n if (ev.button !== 0 || ev.metaKey || ev.ctrlKey) {\n ev.preventDefault()\n\n /* Close annotation */\n } else if (active) {\n ev.preventDefault()\n\n /* Focus parent annotation, if any */\n const parent = el.parentElement!.closest(\".md-annotation\")\n if (parent instanceof HTMLElement)\n parent.focus()\n else\n getActiveElement()?.blur()\n }\n })\n\n /* Open and focus annotation on location target */\n target$\n .pipe(\n takeUntil(done$),\n filter(target => target === tooltip),\n delay(125)\n )\n .subscribe(() => el.focus())\n\n /* Create and return component */\n return watchAnnotation(el, container)\n .pipe(\n tap(state => push$.next(state)),\n finalize(() => push$.complete()),\n map(state => ({ ref: el, ...state }))\n )\n })\n}\n", "/*\n * Copyright (c) 2016-2022 Martin Donath \n *\n * Permission is hereby granted, free of charge, to any person obtaining a copy\n * of this software and associated documentation files (the \"Software\"), to\n * deal in the Software without restriction, including without limitation the\n * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or\n * sell copies of the Software, and to permit persons to whom the Software is\n * furnished to do so, subject to the following conditions:\n *\n * The above copyright notice and this permission notice shall be included in\n * all copies or substantial portions of the Software.\n *\n * THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\n * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\n * FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL THE\n * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\n * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING\n * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS\n * IN THE SOFTWARE.\n */\n\nimport {\n EMPTY,\n Observable,\n Subject,\n defer,\n finalize,\n merge,\n share,\n takeLast,\n takeUntil\n} from \"rxjs\"\n\nimport {\n getElement,\n getElements,\n getOptionalElement\n} from \"~/browser\"\nimport { renderAnnotation } from \"~/templates\"\n\nimport { Component } from \"../../../_\"\nimport {\n Annotation,\n mountAnnotation\n} from \"../_\"\n\n/* ----------------------------------------------------------------------------\n * Helper types\n * ------------------------------------------------------------------------- */\n\n/**\n * Mount options\n */\ninterface MountOptions {\n target$: Observable /* Location target observable */\n print$: Observable /* Media print observable */\n}\n\n/* ----------------------------------------------------------------------------\n * Helper functions\n * ------------------------------------------------------------------------- */\n\n/**\n * Find all annotation markers in the given code block\n *\n * @param container - Containing element\n *\n * @returns Annotation markers\n */\nfunction findAnnotationMarkers(container: HTMLElement): Text[] {\n const markers: Text[] = []\n for (const el of getElements(\".c, .c1, .cm\", container)) {\n const nodes: Text[] = []\n\n /* Find all text nodes in current element */\n const it = document.createNodeIterator(el, NodeFilter.SHOW_TEXT)\n for (let node = it.nextNode(); node; node = it.nextNode())\n nodes.push(node as Text)\n\n /* Find all markers in each text node */\n for (let text of nodes) {\n let match: RegExpExecArray | null\n\n /* Split text at marker and add to list */\n while ((match = /(\\(\\d+\\))(!)?/.exec(text.textContent!))) {\n const [, id, force] = match\n if (typeof force === \"undefined\") {\n const marker = text.splitText(match.index)\n text = marker.splitText(id.length)\n markers.push(marker)\n\n /* Replace entire text with marker */\n } else {\n text.textContent = id\n markers.push(text)\n break\n }\n }\n }\n }\n return markers\n}\n\n/**\n * Swap the child nodes of two elements\n *\n * @param source - Source element\n * @param target - Target element\n */\nfunction swap(source: HTMLElement, target: HTMLElement): void {\n target.append(...Array.from(source.childNodes))\n}\n\n/* ----------------------------------------------------------------------------\n * Functions\n * ------------------------------------------------------------------------- */\n\n/**\n * Mount annotation list\n *\n * This function analyzes the containing code block and checks for markers\n * referring to elements in the given annotation list. If no markers are found,\n * the list is left untouched. Otherwise, list elements are rendered as\n * annotations inside the code block.\n *\n * @param el - Annotation list element\n * @param container - Containing element\n * @param options - Options\n *\n * @returns Annotation component observable\n */\nexport function mountAnnotationList(\n el: HTMLElement, container: HTMLElement, { target$, print$ }: MountOptions\n): Observable> {\n\n /* Compute prefix for tooltip anchors */\n const parent = container.closest(\"[id]\")\n const prefix = parent?.id\n\n /* Find and replace all markers with empty annotations */\n const annotations = new Map()\n for (const marker of findAnnotationMarkers(container)) {\n const [, id] = marker.textContent!.match(/\\((\\d+)\\)/)!\n if (getOptionalElement(`li:nth-child(${id})`, el)) {\n annotations.set(id, renderAnnotation(id, prefix))\n marker.replaceWith(annotations.get(id)!)\n }\n }\n\n /* Keep list if there are no annotations to render */\n if (annotations.size === 0)\n return EMPTY\n\n /* Mount component on subscription */\n return defer(() => {\n const done$ = new Subject()\n\n /* Retrieve container pairs for swapping */\n const pairs: [HTMLElement, HTMLElement][] = []\n for (const [id, annotation] of annotations)\n pairs.push([\n getElement(\".md-typeset\", annotation),\n getElement(`li:nth-child(${id})`, el)\n ])\n\n /* Handle print mode - see https://bit.ly/3rgPdpt */\n print$\n .pipe(\n takeUntil(done$.pipe(takeLast(1)))\n )\n .subscribe(active => {\n el.hidden = !active\n\n /* Show annotations in code block or list (print) */\n for (const [inner, child] of pairs)\n if (!active)\n swap(child, inner)\n else\n swap(inner, child)\n })\n\n /* Create and return component */\n return merge(...[...annotations]\n .map(([, annotation]) => (\n mountAnnotation(annotation, container, { target$ })\n ))\n )\n .pipe(\n finalize(() => done$.complete()),\n share()\n )\n })\n}\n", "/*\n * Copyright (c) 2016-2022 Martin Donath \n *\n * Permission is hereby granted, free of charge, to any person obtaining a copy\n * of this software and associated documentation files (the \"Software\"), to\n * deal in the Software without restriction, including without limitation the\n * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or\n * sell copies of the Software, and to permit persons to whom the Software is\n * furnished to do so, subject to the following conditions:\n *\n * The above copyright notice and this permission notice shall be included in\n * all copies or substantial portions of the Software.\n *\n * THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\n * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\n * FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL THE\n * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\n * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING\n * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS\n * IN THE SOFTWARE.\n */\n\nimport {\n Observable,\n map,\n of,\n shareReplay,\n tap\n} from \"rxjs\"\n\nimport { watchScript } from \"~/browser\"\nimport { h } from \"~/utilities\"\n\nimport { Component } from \"../../../_\"\n\nimport themeCSS from \"./index.css\"\n\n/* ----------------------------------------------------------------------------\n * Types\n * ------------------------------------------------------------------------- */\n\n/**\n * Mermaid diagram\n */\nexport interface Mermaid {}\n\n/* ----------------------------------------------------------------------------\n * Data\n * ------------------------------------------------------------------------- */\n\n/**\n * Mermaid instance observable\n */\nlet mermaid$: Observable\n\n/**\n * Global sequence number for diagrams\n */\nlet sequence = 0\n\n/* ----------------------------------------------------------------------------\n * Helper functions\n * ------------------------------------------------------------------------- */\n\n/**\n * Fetch Mermaid script\n *\n * @returns Mermaid scripts observable\n */\nfunction fetchScripts(): Observable {\n return typeof mermaid === \"undefined\" || mermaid instanceof Element\n ? watchScript(\"https://unpkg.com/mermaid@9.1.7/dist/mermaid.min.js\")\n : of(undefined)\n}\n\n/* ----------------------------------------------------------------------------\n * Functions\n * ------------------------------------------------------------------------- */\n\n/**\n * Mount Mermaid diagram\n *\n * @param el - Code block element\n *\n * @returns Mermaid diagram component observable\n */\nexport function mountMermaid(\n el: HTMLElement\n): Observable> {\n el.classList.remove(\"mermaid\") // Hack: mitigate https://bit.ly/3CiN6Du\n mermaid$ ||= fetchScripts()\n .pipe(\n tap(() => mermaid.initialize({\n startOnLoad: false,\n themeCSS,\n sequence: {\n actorFontSize: \"16px\", // Hack: mitigate https://bit.ly/3y0NEi3\n messageFontSize: \"16px\",\n noteFontSize: \"16px\"\n }\n })),\n map(() => undefined),\n shareReplay(1)\n )\n\n /* Render diagram */\n mermaid$.subscribe(() => {\n el.classList.add(\"mermaid\") // Hack: mitigate https://bit.ly/3CiN6Du\n const id = `__mermaid_${sequence++}`\n const host = h(\"div\", { class: \"mermaid\" })\n mermaid.mermaidAPI.render(id, el.textContent, (svg: string) => {\n\n /* Create a shadow root and inject diagram */\n const shadow = host.attachShadow({ mode: \"closed\" })\n shadow.innerHTML = svg\n\n /* Replace code block with diagram */\n el.replaceWith(host)\n })\n })\n\n /* Create and return component */\n return mermaid$\n .pipe(\n map(() => ({ ref: el }))\n )\n}\n", "/*\n * Copyright (c) 2016-2022 Martin Donath \n *\n * Permission is hereby granted, free of charge, to any person obtaining a copy\n * of this software and associated documentation files (the \"Software\"), to\n * deal in the Software without restriction, including without limitation the\n * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or\n * sell copies of the Software, and to permit persons to whom the Software is\n * furnished to do so, subject to the following conditions:\n *\n * The above copyright notice and this permission notice shall be included in\n * all copies or substantial portions of the Software.\n *\n * THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\n * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\n * FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL THE\n * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\n * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING\n * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS\n * IN THE SOFTWARE.\n */\n\nimport {\n Observable,\n Subject,\n defer,\n filter,\n finalize,\n map,\n merge,\n tap\n} from \"rxjs\"\n\nimport { Component } from \"../../_\"\n\n/* ----------------------------------------------------------------------------\n * Types\n * ------------------------------------------------------------------------- */\n\n/**\n * Details\n */\nexport interface Details {\n action: \"open\" | \"close\" /* Details state */\n reveal?: boolean /* Details is revealed */\n}\n\n/* ----------------------------------------------------------------------------\n * Helper types\n * ------------------------------------------------------------------------- */\n\n/**\n * Watch options\n */\ninterface WatchOptions {\n target$: Observable /* Location target observable */\n print$: Observable /* Media print observable */\n}\n\n/**\n * Mount options\n */\ninterface MountOptions {\n target$: Observable /* Location target observable */\n print$: Observable /* Media print observable */\n}\n\n/* ----------------------------------------------------------------------------\n * Functions\n * ------------------------------------------------------------------------- */\n\n/**\n * Watch details\n *\n * @param el - Details element\n * @param options - Options\n *\n * @returns Details observable\n */\nexport function watchDetails(\n el: HTMLDetailsElement, { target$, print$ }: WatchOptions\n): Observable {\n let open = true\n return merge(\n\n /* Open and focus details on location target */\n target$\n .pipe(\n map(target => target.closest(\"details:not([open])\")!),\n filter(details => el === details),\n map(() => ({\n action: \"open\", reveal: true\n }) as Details)\n ),\n\n /* Open details on print and close afterwards */\n print$\n .pipe(\n filter(active => active || !open),\n tap(() => open = el.open),\n map(active => ({\n action: active ? \"open\" : \"close\"\n }) as Details)\n )\n )\n}\n\n/**\n * Mount details\n *\n * This function ensures that `details` tags are opened on anchor jumps and\n * prior to printing, so the whole content of the page is visible.\n *\n * @param el - Details element\n * @param options - Options\n *\n * @returns Details component observable\n */\nexport function mountDetails(\n el: HTMLDetailsElement, options: MountOptions\n): Observable> {\n return defer(() => {\n const push$ = new Subject()\n push$.subscribe(({ action, reveal }) => {\n el.toggleAttribute(\"open\", action === \"open\")\n if (reveal)\n el.scrollIntoView()\n })\n\n /* Create and return component */\n return watchDetails(el, options)\n .pipe(\n tap(state => push$.next(state)),\n finalize(() => push$.complete()),\n map(state => ({ ref: el, ...state }))\n )\n })\n}\n", "/*\n * Copyright (c) 2016-2022 Martin Donath \n *\n * Permission is hereby granted, free of charge, to any person obtaining a copy\n * of this software and associated documentation files (the \"Software\"), to\n * deal in the Software without restriction, including without limitation the\n * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or\n * sell copies of the Software, and to permit persons to whom the Software is\n * furnished to do so, subject to the following conditions:\n *\n * The above copyright notice and this permission notice shall be included in\n * all copies or substantial portions of the Software.\n *\n * THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\n * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\n * FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL THE\n * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\n * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING\n * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS\n * IN THE SOFTWARE.\n */\n\nimport { Observable, of } from \"rxjs\"\n\nimport { renderTable } from \"~/templates\"\nimport { h } from \"~/utilities\"\n\nimport { Component } from \"../../_\"\n\n/* ----------------------------------------------------------------------------\n * Types\n * ------------------------------------------------------------------------- */\n\n/**\n * Data table\n */\nexport interface DataTable {}\n\n/* ----------------------------------------------------------------------------\n * Data\n * ------------------------------------------------------------------------- */\n\n/**\n * Sentinel for replacement\n */\nconst sentinel = h(\"table\")\n\n/* ----------------------------------------------------------------------------\n * Functions\n * ------------------------------------------------------------------------- */\n\n/**\n * Mount data table\n *\n * This function wraps a data table in another scrollable container, so it can\n * be smoothly scrolled on smaller screen sizes and won't break the layout.\n *\n * @param el - Data table element\n *\n * @returns Data table component observable\n */\nexport function mountDataTable(\n el: HTMLElement\n): Observable> {\n el.replaceWith(sentinel)\n sentinel.replaceWith(renderTable(el))\n\n /* Create and return component */\n return of({ ref: el })\n}\n", "/*\n * Copyright (c) 2016-2022 Martin Donath \n *\n * Permission is hereby granted, free of charge, to any person obtaining a copy\n * of this software and associated documentation files (the \"Software\"), to\n * deal in the Software without restriction, including without limitation the\n * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or\n * sell copies of the Software, and to permit persons to whom the Software is\n * furnished to do so, subject to the following conditions:\n *\n * The above copyright notice and this permission notice shall be included in\n * all copies or substantial portions of the Software.\n *\n * THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\n * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\n * FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL THE\n * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\n * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING\n * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS\n * IN THE SOFTWARE.\n */\n\nimport {\n Observable,\n Subject,\n animationFrameScheduler,\n asyncScheduler,\n auditTime,\n combineLatest,\n defer,\n finalize,\n fromEvent,\n map,\n merge,\n skip,\n startWith,\n subscribeOn,\n takeLast,\n takeUntil,\n tap,\n withLatestFrom\n} from \"rxjs\"\n\nimport { feature } from \"~/_\"\nimport {\n Viewport,\n getElement,\n getElementContentOffset,\n getElementContentSize,\n getElementOffset,\n getElementSize,\n getElements,\n watchElementContentOffset,\n watchElementSize\n} from \"~/browser\"\nimport { renderTabbedControl } from \"~/templates\"\n\nimport { Component } from \"../../_\"\n\n/* ----------------------------------------------------------------------------\n * Types\n * ------------------------------------------------------------------------- */\n\n/**\n * Content tabs\n */\nexport interface ContentTabs {\n active: HTMLLabelElement /* Active tab label */\n}\n\n/* ----------------------------------------------------------------------------\n * Helper types\n * ------------------------------------------------------------------------- */\n\n/**\n * Mount options\n */\ninterface MountOptions {\n viewport$: Observable /* Viewport observable */\n}\n\n/* ----------------------------------------------------------------------------\n * Functions\n * ------------------------------------------------------------------------- */\n\n/**\n * Watch content tabs\n *\n * @param el - Content tabs element\n *\n * @returns Content tabs observable\n */\nexport function watchContentTabs(\n el: HTMLElement\n): Observable {\n const inputs = getElements(\":scope > input\", el)\n const initial = inputs.find(input => input.checked) || inputs[0]\n return merge(...inputs.map(input => fromEvent(input, \"change\")\n .pipe(\n map(() => getElement(`label[for=\"${input.id}\"]`))\n )\n ))\n .pipe(\n startWith(getElement(`label[for=\"${initial.id}\"]`)),\n map(active => ({ active }))\n )\n}\n\n/**\n * Mount content tabs\n *\n * This function scrolls the active tab into view. While this functionality is\n * provided by browsers as part of `scrollInfoView`, browsers will always also\n * scroll the vertical axis, which we do not want. Thus, we decided to provide\n * this functionality ourselves.\n *\n * @param el - Content tabs element\n * @param options - Options\n *\n * @returns Content tabs component observable\n */\nexport function mountContentTabs(\n el: HTMLElement, { viewport$ }: MountOptions\n): Observable> {\n\n /* Render content tab previous button for pagination */\n const prev = renderTabbedControl(\"prev\")\n el.append(prev)\n\n /* Render content tab next button for pagination */\n const next = renderTabbedControl(\"next\")\n el.append(next)\n\n /* Mount component on subscription */\n const container = getElement(\".tabbed-labels\", el)\n return defer(() => {\n const push$ = new Subject()\n const done$ = push$.pipe(takeLast(1))\n combineLatest([push$, watchElementSize(el)])\n .pipe(\n auditTime(1, animationFrameScheduler),\n takeUntil(done$)\n )\n .subscribe({\n\n /* Handle emission */\n next([{ active }, size]) {\n const offset = getElementOffset(active)\n const { width } = getElementSize(active)\n\n /* Set tab indicator offset and width */\n el.style.setProperty(\"--md-indicator-x\", `${offset.x}px`)\n el.style.setProperty(\"--md-indicator-width\", `${width}px`)\n\n /* Scroll container to active content tab */\n const content = getElementContentOffset(container)\n if (\n offset.x < content.x ||\n offset.x + width > content.x + size.width\n )\n container.scrollTo({\n left: Math.max(0, offset.x - 16),\n behavior: \"smooth\"\n })\n },\n\n /* Handle complete */\n complete() {\n el.style.removeProperty(\"--md-indicator-x\")\n el.style.removeProperty(\"--md-indicator-width\")\n }\n })\n\n /* Hide content tab buttons on borders */\n combineLatest([\n watchElementContentOffset(container),\n watchElementSize(container)\n ])\n .pipe(\n takeUntil(done$)\n )\n .subscribe(([offset, size]) => {\n const content = getElementContentSize(container)\n prev.hidden = offset.x < 16\n next.hidden = offset.x > content.width - size.width - 16\n })\n\n /* Paginate content tab container on click */\n merge(\n fromEvent(prev, \"click\").pipe(map(() => -1)),\n fromEvent(next, \"click\").pipe(map(() => +1))\n )\n .pipe(\n takeUntil(done$)\n )\n .subscribe(direction => {\n const { width } = getElementSize(container)\n container.scrollBy({\n left: width * direction,\n behavior: \"smooth\"\n })\n })\n\n /* Set up linking of content tabs, if enabled */\n if (feature(\"content.tabs.link\"))\n push$.pipe(\n skip(1),\n withLatestFrom(viewport$)\n )\n .subscribe(([{ active }, { offset }]) => {\n const tab = active.innerText.trim()\n if (active.hasAttribute(\"data-md-switching\")) {\n active.removeAttribute(\"data-md-switching\")\n\n /* Determine viewport offset of active tab */\n } else {\n const y = el.offsetTop - offset.y\n\n /* Passively activate other tabs */\n for (const set of getElements(\"[data-tabs]\"))\n for (const input of getElements(\n \":scope > input\", set\n )) {\n const label = getElement(`label[for=\"${input.id}\"]`)\n if (\n label !== active &&\n label.innerText.trim() === tab\n ) {\n label.setAttribute(\"data-md-switching\", \"\")\n input.click()\n break\n }\n }\n\n /* Bring active tab into view */\n window.scrollTo({\n top: el.offsetTop - y\n })\n\n /* Persist active tabs in local storage */\n const tabs = __md_get(\"__tabs\") || []\n __md_set(\"__tabs\", [...new Set([tab, ...tabs])])\n }\n })\n\n /* Create and return component */\n return watchContentTabs(el)\n .pipe(\n tap(state => push$.next(state)),\n finalize(() => push$.complete()),\n map(state => ({ ref: el, ...state }))\n )\n })\n .pipe(\n subscribeOn(asyncScheduler)\n )\n}\n", "/*\n * Copyright (c) 2016-2022 Martin Donath \n *\n * Permission is hereby granted, free of charge, to any person obtaining a copy\n * of this software and associated documentation files (the \"Software\"), to\n * deal in the Software without restriction, including without limitation the\n * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or\n * sell copies of the Software, and to permit persons to whom the Software is\n * furnished to do so, subject to the following conditions:\n *\n * The above copyright notice and this permission notice shall be included in\n * all copies or substantial portions of the Software.\n *\n * THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\n * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\n * FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL THE\n * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\n * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING\n * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS\n * IN THE SOFTWARE.\n */\n\nimport { Observable, merge } from \"rxjs\"\n\nimport { Viewport, getElements } from \"~/browser\"\n\nimport { Component } from \"../../_\"\nimport { Annotation } from \"../annotation\"\nimport {\n CodeBlock,\n Mermaid,\n mountCodeBlock,\n mountMermaid\n} from \"../code\"\nimport {\n Details,\n mountDetails\n} from \"../details\"\nimport {\n DataTable,\n mountDataTable\n} from \"../table\"\nimport {\n ContentTabs,\n mountContentTabs\n} from \"../tabs\"\n\n/* ----------------------------------------------------------------------------\n * Types\n * ------------------------------------------------------------------------- */\n\n/**\n * Content\n */\nexport type Content =\n | Annotation\n | ContentTabs\n | CodeBlock\n | Mermaid\n | DataTable\n | Details\n\n/* ----------------------------------------------------------------------------\n * Helper types\n * ------------------------------------------------------------------------- */\n\n/**\n * Mount options\n */\ninterface MountOptions {\n viewport$: Observable /* Viewport observable */\n target$: Observable /* Location target observable */\n print$: Observable /* Media print observable */\n}\n\n/* ----------------------------------------------------------------------------\n * Functions\n * ------------------------------------------------------------------------- */\n\n/**\n * Mount content\n *\n * This function mounts all components that are found in the content of the\n * actual article, including code blocks, data tables and details.\n *\n * @param el - Content element\n * @param options - Options\n *\n * @returns Content component observable\n */\nexport function mountContent(\n el: HTMLElement, { viewport$, target$, print$ }: MountOptions\n): Observable> {\n return merge(\n\n /* Code blocks */\n ...getElements(\"pre:not(.mermaid) > code\", el)\n .map(child => mountCodeBlock(child, { target$, print$ })),\n\n /* Mermaid diagrams */\n ...getElements(\"pre.mermaid\", el)\n .map(child => mountMermaid(child)),\n\n /* Data tables */\n ...getElements(\"table:not([class])\", el)\n .map(child => mountDataTable(child)),\n\n /* Details */\n ...getElements(\"details\", el)\n .map(child => mountDetails(child, { target$, print$ })),\n\n /* Content tabs */\n ...getElements(\"[data-tabs]\", el)\n .map(child => mountContentTabs(child, { viewport$ }))\n )\n}\n", "/*\n * Copyright (c) 2016-2022 Martin Donath \n *\n * Permission is hereby granted, free of charge, to any person obtaining a copy\n * of this software and associated documentation files (the \"Software\"), to\n * deal in the Software without restriction, including without limitation the\n * rights to use, copy, modify, merge, publish, distribute, sublicense, and/or\n * sell copies of the Software, and to permit persons to whom the Software is\n * furnished to do so, subject to the following conditions:\n *\n * The above copyright notice and this permission notice shall be included in\n * all copies or substantial portions of the Software.\n *\n * THE SOFTWARE IS PROVIDED \"AS IS\", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR\n * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,\n * FITNESS FOR A PARTICULAR PURPOSE AND NON-INFRINGEMENT. IN NO EVENT SHALL THE\n * AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER\n * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING\n * FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS\n * IN THE SOFTWARE.\n */\n\nimport {\n Observable,\n Subject,\n defer,\n delay,\n finalize,\n map,\n merge,\n of,\n switchMap,\n tap\n} from \"rxjs\"\n\nimport { getElement } from \"~/browser\"\n\nimport { Component } from \"../_\"\n\n/* ----------------------------------------------------------------------------\n * Types\n * ------------------------------------------------------------------------- */\n\n/**\n * Dialog\n */\nexport interface Dialog {\n message: string /* Dialog message */\n active: boolean /* Dialog is active */\n}\n\n/* ----------------------------------------------------------------------------\n * Helper types\n * ------------------------------------------------------------------------- */\n\n/**\n * Watch options\n */\ninterface WatchOptions {\n alert$: Subject /* Alert subject */\n}\n\n/**\n * Mount options\n */\ninterface MountOptions {\n alert$: Subject /* Alert subject */\n}\n\n/* ----------------------------------------------------------------------------\n * Functions\n * ------------------------------------------------------------------------- */\n\n/**\n * Watch dialog\n *\n * @param _el - Dialog element\n * @param options - Options\n *\n * @returns Dialog observable\n */\nexport function watchDialog(\n _el: HTMLElement, { alert$ }: WatchOptions\n): Observable