Bobbee is the name of a new text and markup language processing programming language that is currently in development. This is intended to become its web site. At present it just has a few notes about the language.
Currently the language supports:
Text pattern matching using both expression forms (as in Snobol, Icon and OmniMark) and "regex" forms (as in Java, Perl and XSML).
Markup language processing in both tree (as in original XSML) and serial (as in OmniMark) forms — and a mixture of the two. Currently supported markup languages are XML, SGML, MicroXML, Json and HTML, plus a few minimal testing and example markup languages.
Support for Invisible XML is in the works — and will be completed when I finish some debugging on other parts of the language.
Multiple different markup languages can be processed, each with their own rules, within a single program. As well, support for any new markup languages can be easily added to the language.
A large part of the features of the languages are user-defined, i.e. implement using the Bobbee language itself: markup languages, text processing, additional markup features, and language configuring properties.
The language design and its user manual has been completed for a few years now, and the implementation is largely completed. However, final debugging has taken a really long time and looks like it'll take quite a while yet. Partially because it's a necessarily slow process, and partially due to the fact that I'm really too old to be doing this kind of thing. Nonetheless, I really want it to be complete and well debugged before I release it, so that I'm not stuck with a lot of debugging at an even older age. The other thing that still needs a lot of work is documenting the implemention, so others can take over its maintenence at some point.
I really expected the language to be completed by now, but there's been something of a hiatus in the progress of the Bobbee language implementation:
I initially indended the implementation to work on as wide a range of Java version, but when I looked into upgrading to using the Java 13 platform I found that that didn't work because with current Java releases, the Java library isn't available to non-Java languages: they are hidden away somewhere, and I can't find where. As a result, I went back to Java 8.
Sticking with Java 8 seems to what the Scala language implemention has done. That is an unfortunate approach, and is contrary to Bobbee's original goal of working on a wide range of JVM versions. But the fact that the Scala folk went this way, and the fact that they seem to know what they are doing, means that I should do the same thing.
The requirement of version 8 and later versions of the JVM that all methods have a Stack Map Frame adds an amazing amount of time to implementing a language using the JVM. It's really slowed things down.
A further slow-down in the progress of implementing the language has been that last in the last year or so, I have had a medical problems that needed to be dealt with — not a big surprise given that I'm in my mid-70's. I've had a kidney removed, which to the surprise of the surgeon, had a massive tumor that wasn't cancer. I've had a lot of radiation on my prostrate which was because of cancer. And I've had some removal of skin cancer on my head and leg. Both of my parents died with cancer problem. So I feel lucky because it currently looks like my problems have been dealt with. The progress in medicine over the last few decades has been really wonderful. It's why I'm still here, and why I'm still able to be working on this programming language.
Given my age, my medical problems, the fact that I'm working on this language alone and not with a group of helpers, and given that this language is richer than any other language in the markup/text-processing language field, it's not surprising that it's taking a long time. An interesting description of how long a new programming language takes is Building a Language Takes Time.
Just by way of a side-note, when I first started this project, I was seriously considering the Mono Project as the basis for the implementation rather than the Java platform, but went with Java because it works on more platforms, and because Mono is a public distribution of an IBM product, which sadly displeases many people, its advantages not-with-standing. Given what's happening with Java these days I'm thinking I made a mistake going that way, but oh well!
I attended the Balisage 2019 conference in Rockvile MD, and presented a paper about the Bobbee language. These are the presentation materials:
The slide show I used during the presentation.
The paper I wrote for the conference web site.
For interest, here's the files that I used to create the paper: translating it from text to XML, and then from XML to HTML:
The original text file I wrote.
The text-to-XML program that converts the text file to the XML file I submitted to the conference.
The resulting XML file.
Two XML-to-HTML programs I wrote to confirm that the XML file looked right: one an XML tree procesing program and the other a XML serial processing program.
And finally, the resulting HTML files: tree processing and serial processing. Just to note: they are identical. Which was the point of the test. (This is not the paper on the Balisage web site. The Balisage folk produced that from the same XML file using other tools.)
In addition, I wrote a six-page overview of a minimal markup language that was used to test Bobbee's markup language processing features called BML (Basic Markup Language), which I intended to display at the Balisage 2019 conference, but didn't, because there wasn't any wall space available for posters. So here it is. (I've actually produced a more useful minimal markup language recently, that I hope is more acceptable to the folk who don't think BML is a markup language, and I'll be using that as an example of how to implement markup markup languages. I'll post it here once the current phase of debugging is complete.)
Together with a whole bunch of other folk, I attended Balisage 2020, Balisage 2021, Balisage 2022 and Balisage 2023 Conferences virtually, and will be attending future Balisage Conferences virtually or in person. I'm hoping to present the finished language at Balisage in the not-too far future — depending on how well the language debugging and the implementation documentation goes.
For those interested in the history of programming languages, I've designed and lead the development of a number of programming languages in the past, most of which have addressed text and markup processing, and two of which have had significant use, HUGO and OmniMark:
HUGO was used to help typeset (the no longer existing) Canadian Goverment Printing Office (CGPO) documents from the late 1970's up to the mid-1980's, when early word processors replaced it. It was used to typeset our Hansard and the Bills of Canada, amongst other government documents.
There's a couple of ancient documents describing HUGO: the description of the original language implementation at CGPO, and of a later implementation of a subset of the language for a Vancouver real-estate company (which also seems to be no longer in business).
OmniMark is used for text and markup language processing, and it is still in widely use. OmniMark was largely based on HUGO: using it's rule-based features, it's expression pattern-matching language — based on the earlier Snobol4 and Icon languages — and it's multi-phase processing model. Oddly, there's nothing on the OmniMark web site about who designed it (i.e. me), on what earlier languages it is based or its own history, unlike most other programming languages.
One interesting consequence of implementing a variety of markup languages using the same tools is that one can compare how complex it is to implement them. The size of the source code for each one (together with some comments) is a good rough measure of complexity. Here's the sizes of the implementions in bytes:
13329 | bml.bj | My first attempt to find how small a markup language could be. |
12166 | vsml.bj | The next attempt at a minimal/simple markup language — looking a lot more like a convential markup language. |
4548 | mml.bj | The current simple markup language suitable as an example — it's been simplified by it not reporting any errors. |
21234 | microxml.bj | |
27170 | html.bj* | Just the HTML parsing code, without its tables. |
27794 | json.bj | |
55233 | ixml.bj | It's incomplete right now, so it will grow. |
99299 | xml.bj | |
179122 | sgml.bj | |
209170 | html.bj | The HTML parser, including all the tables that describe HTML elements. |
If you've got a reason to contact me about this bobbee.org website or about the new language, please email me at info@bobbee.org.
Latest update: August 18, 2024.