I would like to take a minute early this December and share a monthly update with everyone. Since we have a good deal of backlog to make up for, and with the holidays already here, I thought I’d share this update a bit early this month.
A lot has been going on and with the backlog this is sure to be a long update, so please bear with me. For starters I’d like to talk about some new initiatives we have begun, for more technical details pay attention to dANN-Announce and dANN-Dev mailing lists where it will be discussed in greater detail.
In this edition:
- Working Towards An AI Engine
- AIDE: Automated Inference Detection Engine
- Semantic Syncleus Wiki
- New Team Members
Working Towards An AI Engine
dANN’s original goal was to be a full AI framework to allow many dissimilar AI algorithms to integrate with each other in a flexible, modular, and seamless fashion. Since that original vision dANN itself has become a diverse collection of AI algorithms but still awaits the engine itself which will automate the process of connecting these algorithms. While dANN is graph based (as in Graph Theory) and does lend itself well to interconnection of its components, it relies on the implementer to actually build the framework which determines how to connect these components and process the data through them. So the next stage in the project is of course the engine to automate some of this process in an extensible way. While originally this was intended to be part of dANN itself we have recently decided that it would be more useful to segregate out the various components of the library and write the engine such that it uses and depends on dANN but is separate from dANN itself. In practice this will serve the same purpose but allow a developer more control over what functionality they need or want.
In order to attain this fully realized vision we have begun working on a few initiatives leading up to the full engine. For starters we are restructuring our code to better allow for a plug-in architecture where other third-party AI libraries and algorithms can be integrated into the framework and interconnect in much the same way dANN components currently do. This poses some unique challenges as each AI library can be rather unique and may be written in any number of languages. This is currently being designed as an abstraction of our graph model which will be extended by plug-ins. We have been in touch with several AI projects to help ensure our framework can accommodate their software in a sane way including: OpenCog, Neuroph, Encog, and Automenta to name a few.
The other initiative towards this goal is the communications abstraction layer itself. The engine is specifically designed to be based off mesh/cloud concepts due to the massively parallel and interconnected nature of most AI algorithms. This means we need to be able to run the engine across thousands or millions of computers via an intranet or internet to form a single coherent AI “brain”. While not all uses of the engine will be so grand, and it will still be useful in smaller setups involving dozens of computers, or even a single computer, this is still a major design consideration. The current design models we have in place has very special needs for how the underlying communication operates. The communication between nodes need to be spatially and contextually aware, so generic clustering and cloud solutions don’t work. As such the communication considerations alone constitute its own project and considerable work yet to be done. As such we have begun work on a project unofficially code named “Hermes” to act as a communications abstraction layer that fits these needs. The Hermes project allows for protocol features (such as reliability, packet ordering, packet checksumming, etc) to be added piece by piece on top of any generic transport mechanism (like TCP or UDP). because of its nature it gives full access to the low level features of the transport protocol allowing you to expand and enhance it for the needs of a project by interconnecting various modules. Without going into too much detail this library will be written specifically in consideration of our AI Engine but as the design is currently structured should be immensely useful to any communications based application that needs low level access to the transport protocol while adding certain transport features on top of it.
Once these two initiatives have reached a more mature point, released, and considered relatively stable in terms of design then we can build the actual engine which will bring it all together. The engine will consist of a collection of plug-ins which represent the AI algorithms as well as Evolutionary Algorithms. The AI algorithms will act as the processing components, and interconnect with each other; not just the algorithms as a whole but even the individual components of the algorithm can be interconnected as well thus allowing, for example, a hybrid of artificial neurons and Bayesian network nodes to co-exist within the same graph. By using various Evolutionary Algorithm plug-ins (which themselves can be combined and hybridized) one can provide the tools for the engine to automatically discover and evolve new AI algorithms by dynamically constructing and testing new hybridizations of preexisting AI algorithm components.
Many of the necessary components have been completed and published as part of dANN. Hyperassociative Map is a type of graph drawing algorithm specifically designed to handle the distribution of the nodes of the graph within the distributed cloud, as an example. Another example is the Genetic Wavelets Algorithm which will act as an Evolutionary Algorithm for the final engine.
AIDE: Automated Inference Detection Engine
The most recent addition to the Syncleus Open-Source project collection is the AIDE project. We recently finished up our first version (likely to be named version 0.1 or 1.0 upon release). It is currently RC1 and the source and application haven’t been posted on our website just yet, but should be posted within the next month. It is fully functional and has been tested against real world data having produced significantly useful inference data from the current data set. Our current use case is a real world company with a massive database. With their cooperation we hope to have a number of statistics regarding the improvement it has had on their company’s use of data. You can check out the prerelease screen shots and a description of its current feature set at the AIDE project page . also we expect to release it as open-source under the same conditions as all our other products. therefore it will be free to download, and use, and you will be welcome to modify the code for your own use under the terms of the license.
Let me explain a bit about what AIDE is intended for, and how it might be useful. AIDE will process a database, any third-party database regardless of content, schema, model (such as mysql or oracle), or structure. As long as the proper drivers exist (and they do for virtually all databases) then AIDE can process it. The only caveat is that for each new database you need to write a configuration file that tells AIDE a little about the database it is interfacing with. In future versions this configuration file will be reduced or at least eliminate most database specific components. Once AIDE is configured it can be told to discover inference (influence) amongst the various data in the database. The user can choose how narrow a scope AIDE is to apply, either it can scan the entire database or simply certain tables, columns, or rows. In this way they can reduce the time needed for the discovery and also obtain just the information they are concerned with. Once the discovery process is complete AIDE will report how much each data point effects the others, and by what percentage. For example it might tell you that having a very snowy winter last year, means that every time it snows this year your sales of snow tires will go up by 327%. Or that if you feel tired, and have a red rash then your chance of having limes disease is 87%. All of this is displayed by a collection of tables, pie charts, heat maps, etc and can be exported in CSV format (XML and other formats coming soon).
Also feel free to sign up for the AIDE, AIDE-Dev or AIDE-Announce mailing lists.
Check out the link i mentioned for more detail, there is a lot to this project, but for simplicity sake let me just include the bullet point list of features from the link:
- Dynamic Bayesian Network AI utilizing the dANN library.
- Works with almost every Database server.
- Database schema neutral, it doesn’t care how your tables and columns are organized.
- SOAP web-service based protocol for easy development of custom clients.
- Works with any firewall that allows HTTP (web site) traffic.
- Platform-Independent, it will work on almost any operating system or hardware.
- Extensive visualizations to clearly display inference patterns in your data.
- Results can be exported in CSV format.
- Distributed clustering for faster AI and excellent scaling.
- HTTP authentication to allow for user privileges or to run the server privately.
- Extensive logging in your choice of XML, HTML, plain text, and other formats.
- Optional fault recovery to keep critical processes running after a failure.
- Filters to improve relevant matches and speed up processing time.
Semantic Syncleus Wiki
While we have always had a wiki to act as a place for our developers to collaborate ( http://wiki.syncleus.com ) and host documentation and tutorials, we have decided to go in a new exciting direction with our platform. It will continue to host our development space as always, with a few additions. We have added a number of new extensions to the wiki, including a few custom additions, that add semantics to the data (RDF, OWL, SPARQL, etc). All of this is to support the wiki’s new mission statement.
Semantics allow the data on the wiki to be understood more easily by computers, as well as allow for computers to more easily contribute or modify some of the existing data. It works very much like a normal wiki where one page will link to several others as it references those concepts within its text. The difference is that there is meaning added to the link. So if the page on “Apple Tree” links to the page on “Fruit Trees” the semantic data will indicate that the link is because an “Apple Tree” is a “Kind of” “Fruit Tree”. A computer then has some sense of context. All of this is represented in the RDF/OWL output format which the wiki can be exported as.
The intention is to make both the current Syncleus project documentation semantic as well as host several other spaces on the wiki for general knowledge (much like Wikipedia) that will also be written semanticly, This will provide a general database of knowledge to act as a data set that future AI’s can use to train from and learn. Since it is hosted publicly on the internet any AI developer will be welcome to pull data from the wiki to train their AI, they can even do so selectivly based on the topics or namespaces most useful to the AI being developed. In this sense the wiki will have a secondary purpose as a collection of training data for AI, both general and specific. Since it will be maintained by humans this ensures greater quality and also allows for subjective information (for example people’s likes and dislikes) to be incorporated. Since our own documentation will still be hosted and made semantic this also opens exciting new possibilities for AI to be self-aware of its own design and possibly better able to intelligently improve its own code base autonomously.
In addition we also intend to host various competitions for AI bot design using the wiki as the primary point of interaction. With these competitions custom AI bots will learn from the wiki and generate information in special reserved areas of the wiki. For example one idea is to host an area of the wiki consisting of recipes, including individual comments as to which users of the wiki like or dislike certain recipes, as well as having users contribute their own. In this concept for a competition the AI bots would learn from the recipes as well as the individuals likes and dislikes and formulate their own recipes based on this knowledge. The bots would also have access to general information pages like “pineapple” to get information about the individual ingredients so it can learn concepts like pineapples and grapes are both sweet (something you cant learn from a recipe alone). In addition the AI can suggest recipes to specific users based on their likes and dislikes, as well as modifications to a specific recipe that might suite a particular user’s tastes more than the original recipe alone. The winning AI bot could then be determined based on how many successful suggestions it made and how highly rated its recipes and suggestions were. Through these competitions we hope to foster advancement in AI as well as encourage the growth of the knowledge base contained within the wiki.
New Team Members
We have had several new team members and contributors recently:
- Freeone3000 (James) has contributed significantly to AIDE as a new member helping write the majority of the UI.
- Hoijui has recently been contributing several commits. Specifically to the dANN project, helping us finally move from ant to maven.
- Of course many of the usual faces are still around making contributions. We are always very thankful for all the contributions we see, we couldn’t do this without you guys.