Introduction to CellSpeak

The case for CellSpeak

Do we really need another programming platform ?

Stick to what you have

There are many good reasons not to switch to a new programming platform.

Learning a new programming language in itself is not so difficult, but becoming efficient in it usually requires some time and practice. Also building a user base that one can turn to for questions, best practices and so forth, takes time.

A language is usually embedded in an eco-system that supports it: IDE's, debuggers, libraries etc. These tools are substantial contributors to the productivity that can be reached in given language.

And finally, when there is a signficant body of legacy software, switching to a new language could mean that otherwise perfectly functioning software needs to be modified, ported or re-written.

Knowledge base, tools and legacy software, three good reasons to stick to your favourite programming language.

Language innovation

Many of the languages that we are still using today have survived for decades, despite the many digital revolutions. C dates back to 1972, C++ to 1985 - Fortran that was first used in 1957.

But new languages have been developed and deployed - with varying success - for a number of reasons.

Software development is hard. It is hard to build, it is hard to test, it is hard to debug and it is hard to maintain. Software development needs all the help in can get to make reliable, robust products.

The current programming environment is one of growing complexity, scale and interconnectedness, far beyond what we have seen in the past, and it is therefore no surprise that the search for adequate tools for software development is as alive as ever.

And in that search languages matter.

Language, framework or library

Often missing features or functionality in programming languages get addressed by providing libraries or a framework for that language, that implement the missing features and functionality. And often that is the best approach: adding 3D capabilities, access to databases and the like, are well handled by providing specialised libraries.

There are however situations where the benefits of this approach are limited. For example, adding messaging to a language through a library, obviously adds that functionality, but at the same time does not allow the language to benefit from all the features that messaging can bring.

There simply are features that have to be embedded in the language to be effective. The fact that today, we are no longer programming the single-cpu stand-alone computer - still the model of most of our current programming languages - has to find its way into a programming platform as a central concept, and not as an afterthought that is then addressed as good as possible using concurrency measures, communication protocols, call-back frameworks and so forth.

Also the benfits of being able to stick to a language, by using frameworks and libraries to stretch the functionality of the language, are often reduced by the fact that the problems of learning and porting are just transferred to these extensions, which are often complicated and specific to the OS on which the software is to be deployed.

Why CellSpeak

CellSpeak addresses a number of problem areas in the current programming environment:

  • Moniliths
  • Applications are often still large self-contained pieces of functionality. Difficult to update and test. Software components are not really there yet.

  • Dependency
  • Software contains often dependencies, both direct and indirect, that are hard to oversee and avoid. Changes to one part of an application, often require rebuilding the entire application - or worse - have unexpected side effects in another part of the application.

  • Parallel
  • In most languages you have to set up parallelism yourself. Shared data has to be protected by locks or semaphores, threads have to scheduled and managed, etc. Debugging is difficult at best.

  • Distributed
  • Many applications run on distributed systems. And yet shifting parts of an application from one system to another, or scaling up an application - let alone doing it dynamically - are not trivial.

  • Event driven asynchronous
  • Most of our applications are event driven but most of our programming languages are linear. Applications therefore often implement their own message loops, event handlers and callbacks - difficult to test, difficult to maintain.

CellSpeak addresses these challenges by proposing a platform build around two concepts: Cells and Messaging. Obviously messaging and data isolation - the basis of the actor model - are not new. Most notably Erlang/OTP have applied this model to their design for over thirty years. CellSpeak revisits this model at a moment in time when it is more relevant then ever, and builds its design around an intuitive syntax and speed, and also includes many modern day language concepts.

Cells are the building blocks in CellSpeak. Cells are independant entities - data held by a cell cannot be read or changed by another cell - and if a cell has work to do, that work can be scheduled and executed independantly from all other cells, without requiring any extra programming or protection in the cell itself.

Cell communicate via messages. A message is determined by its name and signature. The signature takes into account the generic type and structure of the data, but does not rely on user defined types, thus further reducing dependency. Cells that respect the interface - i.e. the message name and signature - are able to communicate irrespective of the types and other internals used by the cells. Hence, dependency can be managed and cells can be maintained and updated independant from other cells in the application.

As cells communicate via messages, the cells can run on different cores, but they can also run on different systems. The CellSpeak communication layer will take care of the delivery of the messages using available protocols and infrastructure. This is built-in into the Virtual Machine and is transparent to the application. Of course, message switching between cells running on the same computer will be faster - message switching between local cells is very fast - then between cells running on remote systems, and possibly has to be taken into account in the design of a system, but the sending of messages and the handling of incoming messages is exactly the same in all cases.

There is no intrinsic limit to the size of a cell, or to the number of cells that have to make up an application, or to the number of messages or interfaces it should serve. That is all up to the design of the application. What CellSpeak will take care of is that the messages between the cells are delivered in the fastest way possible, and that cells that have work to do will be scheduled for execution.

CellSpeak is designed to be able to use external libraries, e.g. libraries written in C/C++, assuring in this way that it has access to almost all existing functionality that is also available to other languages. It makes it also easy to write mixed-language applications, for example when porting an existing application.

Reasons for using CellSpeak

Programs written in CellSpeak are easier to write, debug and scale. CellSpeak allows to write software components and assemble these components into applications in a way that would be much harder to achieve using other languages or frameworks.

Implement MicroServices

Cells can be big or small and cells can have one interface or many - it all depends on the needs and the design of the application. But cells are well suited to build applications as a collection of microservices, where each cell - or group of cells - of the application offers a well defined functionality to the other cells via a clear messaging interface.

There could be a cell that handles the user interaction, or a cell that interfaces to the database, a cell to fetch data over the internet and so forth. This approach to the design of an application has many advantages: cells can be debugged and improved independently of other cells, even extra functionality can be added in the form of new messages the cell reacts to, without affecting the cells that use the service.

If a given cell has too much work to do, additional cells can be created to offload some of the work to - completely transparent to other cells. A cell can also easily delegate tasks to other, for example more specialised cells.

Note that a cell can deliver its services to one application or to several applications at the same time.

Because of the cell-based structure of applications, failure modes are granular - an application fails at the level of the cell without bringing down the entire application. Recovery tactics can be designed at the level of the cell as well.

Instantly Parallel

An application written in CellSpeak is automatically parallel. Cells do not share data and communicate only via messages, so cells can be executed as soon as they have work to do, i.e. as soon they have messages in their message queue.

The scheduling and buffering of messages is taken care of by the CellSpeak virtual machine - the VM has a pool of worker threads that will be allocated to cells that have work to do.

Distribute Transparently

Because of the message switching, cells do not have to run on the same machine. Machines can connect via whatever protocol - e.g. TCP/IP, both open or encrypted - and then cells can be created on whatever machine it is best suited to run them. This allows for better load balancing or for delegating functionality to specific machines. The logic and structure of the application do not get affected by this redistribution of the cells of an application.

Managed Dependency

Cells have no access to data inside other cells and messages are identified based on the name and the signature of the message. The signature of the message is made of generic type indicators and of structural indicators (array, structure), but it does not contain user-defined type names. This means that if the internals of a cell change, it does not affect other cells. Only the code for the changed cell will have to be recompiled and swapped to update the application. It is even possible to stop a single cell in a running application and replace it by a new implementation.

Cells can have a strong dependency, most notably because a cell design can be derived from an ancestor design via inheritance, but this is a conscious design decision. It is possible for a designer to avoid all forms of 'accidental' dependencies when building an application.

Changes in the interface of a cell, i.e. the messages it handles or the signature of these messages, get signalled by the compiler so a designer can correct if necessary, but will not cause the application to crash. Messages for which no handler exists are simply discarded.

Scales Easily

Scaling up an application written in a traditional language, often requires considerable re-engineering of the application. If an application has to be scaled up to handle, for example, hundreds of connections or users iso one, chances are that it will have to be overhauled completely.

Scaling up in CellSpeak is much more straightforward because of its cell-based structure. And if the hardware on which the application is running is not sufficient anymore, the application can be distributed over several systems. The CellSpeak Virtual Machine handles the tasks of knowing where cells run and what protocol to use to communicate between cells.

Writing scalable, distributed applications is difficult, but the mechanisms in CellSpeak make it simpler and reduce significantly the boilerplate code normally required to create these type of applications.

Expressive

The CellSpeak language itself is a modern, strongly typed language that has all the features that you can expect of such a language.

The structures and statments that are available to write the code that makes up the functionality of a cell, are designed to reduce boilerplate code and to reduce the risk of accidental coding errors. All types can have methods, and arrays carry size information, making it easier and less error prone to write loops. Vectors and matrices are standard types. Constant and variable functions are first rate citizens. Functions can return multiple return values and functions can use closures.

CellSpeak also has pointers - but only to records - to allow to build 'interesting' structures, like lists and trees.

CellSpeak uses scratchpad memory - a fast form of allocated memory, well suited to the message handling structure of the language. Details on all these features of the language itself can be found in the documentation.

Performance

High performance was a design goal for CellSpeak from the start. There are four factors in the design of CellSpeak that contribute to its performance:

First of all there is the message switch. The message switch uses different strategies to send a message depending on whether the message is small or large, whether it stays on-board or goes off-board and so forth. On board small messages - comparable to function calls - get switched in the order of magnitude of millions per second. Also the selection of the message handler for a message received by a cell, is very fast as it uses a single hash value, calculated at compile time from the message name and parameter signature.

The second component that is important wrt performance of CellSpeak is the scheduler. Cells do not each have their own separate thread, instead the Virtual Machine has a pool of worker threads that are allocated to cells that have work to do - switching between cells in a thread is fast.

The third factor that contributes to the performance of CellSpeak is the native code compiler. The VM will compile CellSpeak bytecode to native processor code before execution and will also use available processor-specific instructions - eg for vector and matrix handling - to speed up execution. That compilation process in itself is also fast, because mapping CellSpeak bytecode instructions to native instructions is mostly straightforward.

The fourth factor that contributes to the performance of CellSpeak is the tight integration with software written in C/C++.

Easier to Test

Cells can be instantiated independently from other cells and their interfaces can be tested without requiering a specific test-harness.

Granular Failure Modes

When an exception occurs that is not handled by the application, the worst that will happen is that the cell where the problem occurred is stopped. The application can then take the necessary steps - e.g. relaunching that cell - but the important point is, that a fatal error in one cell does not automatically bring down the entire application.

Supported

In order to give you a headstart with projects in CellSpeak, CellSpeak is fully supported with training and assistance for project implementation. And if a project would require it, the CellSpeak core or libraries can be customized.

Applications

CellSpeak is a general purpose programming language that will benefit to many applications. The cell and message based structure of a CellSpeak program, make it an excellent choice to build truly modular software. But obviously there are a number of applications that can benefit even more from the parallel, distributed nature of CellSpeak - some examples:

Internet Of Things

IOT architectures consist of many interconnected devices often with onboard intelligence, reachable via interfaces that can have a large or a very small bandwith. Often calculations, to condense data streams, have to be done at the edge of the network. Software updates to thousands of devices can be required.

CellSpeak is very well equipped to handle these types of challenges. The complex problem of interconnectivity - setting up, buffering, recoverey etc. - are all handled at the level of the language.

In CellSpeak it is also easy to decide where to handle the data - and to change that - without fundamentally altering the architecture of the application.

The granular nature of a CellSpeak application makes it easy to make targeted updates to an application, a definite plus if thousands of remote devices have to be updated.

Distributed Control

Systems that have distributed control - eg robotics - are also prime candidates to be programmed in CellSpeak: the interconnected processors and micro-controllers that make up the control environment can be programmed as a single apllication. CellSpeak also allows to make far better use of the computing power available on multi-core devices.

Large Scale Simulations

CellSpeak allows to build simulations consisting of as many cells as required, running on as many machines as required, with the flexibility to stop, update and restart running cells as required.

Simulations can be built on limited hardware and grown as required without changing the fundamental architecture of the application.

When running large simulations it is also important to have good granular recovery mechanisms: if an exception occurs in a CellSpeak application only the cell where the exception occurs is affected. Because of this the recovery of a running simulation written in CellSpeak is much more manageable.

Graphics

By there nature, Graphics programs have a lot of interaction - either between the elements that make up the environment or with the users manipulating the environment. The structure of CellSpeak programs is an excellent fit for these types of programs.

Messaging between cells that run on the same machine is very fast. As the examples included in distribution show, fast enough to be used in demanding graphics applications.

Data Mining

Data mining is often done on an array of interconnected machines where the ability to distribute software routines on the fly and to to recover from unexpected events - loss of communication, hardware failures, etc - in a gracefull way, are very important.

CellSpeak allows to maximize the use of the available processing power and to assign that processing power in a flexible way. Failures in the network only affect the cells running at the failed location and allow for specifc and targetted recovery without a catastrophic collapse of the entire application.