Semantic Web
Using Wikis as Pre-Packaged Knowledge Bases
AI^3 - Mon, 07/26/2010 - 06:31
TechWiki DocWiki While Also Discovering Hidden Publication and Collaboration Potentials
A few weeks back I completed a three-part introductory series to what Structured Dynamics calls a ‘total open solution‘. A total open solution as we defined it is comprised of software, structure, methods and documentation. When provided in toto, these components provide all of the necessary parts for an organization to adopt new open source solutions on its own (or with the choice of its own consultants and contractors). A total open solution fulfills SD’s mantra that, “We’re successful when we’re not needed.”
Two of the four legs to this total open solution are provided by documentation and methods. These two parts can be seen as a knowledge base that instructs users on how to select, install, maintain and manage the solution at hand.
Today, SD is releasing publicly for the first time two complementary knowledge bases for these purposes: TechWiki, which is the technical and software documentation complement, in this case based around SD’s Open Semantic Framework and its associated open source software projects; and DocWiki, the process methodology and project management complement that extends this basis, in this case based around the Citizen Dan local community open data appliance.
All of the software supporting these initiatives is open source. And, all of the content in the knowledge bases is freely available under a Creative Commons 3.0 license with attribution.
Mindset and ObjectivesIn setting out the design of these knowledge bases, our mindset was to enable single-point authoring of document content, while promoting easy collaboration and rollback of versions. Thus, the design objectives became:
- A full document management system
- Multiple author support
- Authors to document in a single, canonical form
- Collaboration support
- Mixing-and-matching of content from multiple pages and articles to re-purpose for different documents, and
- Excellent version/revision control.
Assuming these objectives could be met, we then had three other objectives on our wish list:
- Single source publishing: publish in multiple formats (HTML, PDF, doc, csv, RTF?)
- Separate theming of output products for different users, preferably using CSS, and
- Single-click export of the existing knowledge base, followed by easy user modification.
Our initial investigations looked at conventional content and document management systems, matched with version control systems or SVNs. Somewhat surprisingly, though, we found the Mediawiki platform to fulfill all of our objectives. Mediawiki, as detailed below, has evolved to become a very mature and capable documentation platform.
While most of us know Mediawiki as a kind of organic authoring and content platform — as it is used on Wikipedia and many other leading wikis — we also found it perfect for our specific knowledge base purposes. To our knowledge, no one has yet set up and deployed Mediawiki in the specific pre-packaged knowledge base manner as described herein.
TechWiki v DocWikiTechWiki is a Mediawiki instance designed to support the collaborative creation of technical knowledge bases. The TechWiki design is specifically geared to produce high-quality, comprehensive technical documentation associated with the OpenStructs open source software. This knowledge base is meant to be the go-to source for any and all documentation for the codes, and includes information regarding:
- Coding and code development
- Systems configurations and architectures
- Installation
- Set-up and maintenance
- Best practices in these areas
- Technical background information, and
- Links to external resources.
As of today, TechWiki contains 187 articles under 56 categories, with a further 293 images. The knowledge base is growing daily.
DocWiki is a sibling Mediawiki instance that contains all TechWiki material, but has a broader purpose. Its role is to be a complete knowledge base for a given installation of an Open Semantic Framework (in the current case, Citizen Dan). As such, it needs to include much of the technical information in the TechWiki, but also extends that in the following areas:
- Relation and discussion of the approach viz. other information development initiatives
- Use of a common information management framework and vocabulary (MIKE2.0)
- A five-phased, incremental approach to deployment and use
- Specific tasks, activities and phases under which this deployment takes place, including staff roles, governance and outcome measurement
- Supporting background material useful for executive management and outside audiences.
The methodology portions of the DocWiki are drawn from the broader MIKE2.0 (Method for Integrated Knowledge Environments) approach. I have previously written about this open source methodology championed by Bearing Point and Deloitte.
As of today, DocWiki contains 357 articles and 394 structured tasks in 70 activity areas under 77 categories. Another 115 images support this content. This knowledge base, too, is growing daily.
Both of these knowledge bases are open source and may be exported and installed locally. Then, users may revise and modify and extend that pre-packaged information in any way they see fit.
Basic Wiki OverviewThe basic design of these systems is geared to collaboration and embeds what we think are really responsive work flows. These extend from supporting initial idea noodling to full-blown public documentation. The inherent design of the system also supports single-source publishing and book or PDF creation from the material that is there. Here is the basic overview of the design:
(click for full size)
Mediawiki provides the standard authoring and collaboration environment. There are a choice of editing methods. As content is created, it is organized in a standard way and stored in the knowledge base. The Mediawiki API supports the export of information in either XHTML or XML, which in turn allows the information to be used in external apps (including other Mediawiki instances) or for various single-source publication purposes. The Collection extension is one means by which PDFs or even entire books (that is, multi-page documents with potentially chapters, etc.) may be created. Use of a well-designed CSS ensures that outputs can be readily styled and themed for different purposes or audiences.
As wikis designed from the get-go to be reusable, and then downloaded and installed locally, it is important that we maintain quality and consistency across content. (After download, users are free to do with it as they wish, but it is important the initial database be clean and coherent.) The overall interaction with the content thus occurs via one of three levels: 1) simple reading, which is publicly available without limitation to any visitor, including source inspection and export; 2) editing and authoring, which is limited to approved contributors; and 3) draft authoring and noodling, which is limited to the group in #2 but for which the in-progress content is not publicly viewable. Built-in access rights in the system enable these distinctions.
Features and BenefitsBesides meeting all of the objectives noted at the opening of this post, these wikis (knowledge bases) also have these specific features:
- Relatively complete (and growing) knowledge base content
- Book, PDF, or XHTML publishing
- Single-click exports and imports
- Easy branding and modification of the knowledge bases for local use (via the XML export files)
- Pre-designed, standard categorization systems for easy content migration
- Written guidance on use and best practices
- Ability to keep content in-development “hidden” from public viewing
- Controlled, assisted means for assigning categories to content
- Direct incorporation of external content
- Efficient multi-category search and filtering
- Choice of regular wikitext, WikED or rich-text editing
- Standard embeddable CSS objects
- Semantic and readily themed CSS for local use and for specialty publications
- Standard templates
- Sharable and editable images (SVG inclusion in process)
- Code highlighting capabilities (GeSHi, for TechWiki)
- Pre-designed systems for roles, tasks and activities (DocWiki)
- Semantic Mediawiki support and forms (DocWiki)
- Guided navigation and context (DocWiki).
Many of these features come from the standard extensions in the TechWiki/DocWiki packages.
The net benefits from this design are easily shared and modified knowledge bases that users and organizations may either contribute to for the broader benefit of the OpenStructs community, or download and install with simple modifications for local use and extension. There is actually no new software in this approach, just proper attention to packaging, design, standardization and workflow.
A Smooth WorkflowVia the sharing of extensions, categories and CSS, it is quite easy to have multiple instances or authoring environments in this design. For Structured Dynamics, that begins with our own internal wiki. Many notes are taken and collected there, some of a proprietary nature and the majority not intended or suitable for seeing public release.
Content that has developed to the point of release, however, can be simply tagged using conventions in the workflow. Then, with a single Export command, the relevant content is then sent to an XML file. (This document can itself be edited, such as for example changing all ‘TechWiki’ references to something like ‘My Content Site’; see further here.)
Depending on the nature of the content, this exported content may then be imported with a single Import command to either the TechWiki or DocWiki sites. (Note: Import does require admin rights.) A simple migration may also occur from the TechWiki to the DocWiki. Also, of course, initial authoring may begin at any of the sites, with collaborators an explicit feature of the TechWiki or DocWiki versions.
Any DocWiki can also be specifically configured for different domains and instance types. In terms of our current example, we are using Citizen Dan, but that could be any such Open Semantic Framework instance type:
(click for full size)
Under this design, then, the workflow suggests that technical content authoring and revision take place within the TechWiki, process and methodology revision in the DocWiki. Moreover, most DocWikis are likely to be installed locally, such that once installed, their own content would likely morph into local methods and steps.
So long as page titles are kept the same, newer information can be updated on any target wiki at any time. Prior versions are kept in the version history and can be reinstated. Alternatively, if local content is clearly diverging yet updates of initial source material is still desired, the local content need only be saved under a new title to preserve it from import overwrites.
Where Is It Going from Here?We are really excited by this design and have already seen benefits in our own internal work and documentation. We see, for example, easier management of documentation and content, permanent (canonical) URLs for specific content items, and greater consistency and common language across all projects and documentation. Also, when all documentation is consolidated into one point with a coherent organizational and category structure, documentation gaps and inconsistencies also become apparent and can readily be fixed.
Now, with the release of these systems to the OpenStructs (Open Semantic Framework) and Citizen Dan communities, we hope to see broader contributions and expansion of the content. We encourage you to check on these two sites periodically to see how the content volume continues to grow! And, we welcome all project contributors to join in and help expand these knowledge bases!
We think this general design and approach — especially in relation to a total open solution mindset — has much to recommend it for other open source projects. We think these systems, now that we have designed and worked out the workflows, are amazingly simple to set up and maintain. We welcome other projects to adopt this approach for their own. Let us know if we can be of assistance, and we welcome ideas for improvement!
Categories: Semantic Web
Another Milestone in Semantic Enterprise Awareness
AI^3 - Thu, 07/15/2010 - 21:06
Cisco Video is a Good Starting Intro for Management
Like the seminal linked data publication by PricewaterhouseCoopers of about a year ago (see “PWC Dedicates Quarterly Technology Forecast to Linked Data“, May 29, 2009), a video released by Cisco yesterday is another signal of the emergence of the semantic enterprise.
The Cisco tech brief on The Semantic Enterprise is a quite accessible — but a bit eerie — seven-minute introduction. The video was prepared by Cisco’s Internet Business Solutions Group (IBSG), with Shaun Kirby, its Director of Innovations Architectures, as the narrator:
YouTube: http://www.youtube.com/watch?v=3lUzs2I8BKIWell, as for being eerie, when the video first came up, I thought I was looking at an advanced, next generation avatar, perhaps a reincarnation of Douglas Adams’ Hyperland. Maybe this semantic stuff was closer at hand than we thought!
But, as it turned out, that first blush was only a reaction to how the video was shot. As it gets rolling, the Cisco video is extremely well done and informative. It is a great intro for sharing with management when contemplating your own moves to becoming a semantic enterprise.
I suggest you first view — and then bookmark — this one.
Categories: Semantic Web
Setting Http Proxy when using the selenium-webdriver gem and Firefox
Chris Lowis Blog - Wed, 07/14/2010 - 00:00
I've been using the selenium-webdriver ruby gem to do some automated Cucumber testing of our application. Sadly, I'm sat behind a firewall which requires an HTTP proxy to access the outside world. Using the Chrome WebDriver bridge worked fine as it inherits the system preferences on OS X. However with Firefox I had to perform some trickery to get it to work:
require 'rubygems' require 'selenium-webdriver' include Selenium profile = WebDriver::Firefox::Profile.new profile["network.proxy.type"] = 1 profile["network.proxy.http"] = "www-cache.at.my.work.co.uk" profile["network.proxy.http_port"] = 80 driver = WebDriver.for(:firefox, :profile => profile) driver.navigate.to "http://google.com"The []= method on Profile is setting the options that you find by typing 'about:config' in the Firefox URL bar. Take a look there in a working Firefox profile to see what variables you should set for your webdriver-specific profile in the code above.
Categories: Semantic Web
‘Pay as You Benefit’: A New Enterprise IT Strategy
AI^3 - Tue, 07/13/2010 - 04:57
‘Pay as You Benefit’: A New Enterprise IT Strategy&rft.aulast=Bergman&rft.aufirst=Mike&rft.subject=Adaptive Innovation&rft.subject=MIKE2.0&rft.subject=Semantic Enterprise&rft.source=AI3:::Adaptive Information&rft.date=2010-07-12&rft.type=blogPost&rft.format=text&rft.identifier=http://www.mkbergman.com/896/pay-as-you-benefit-a-new-enterprise-it-strategy/&rft.language=English">
Using Incremental, Low-risk Semantic and Open World ApproachesOK. So, you’re looking at your garage … or your bedroom closet … or your office and its files. They are a mess, and you can’t find anything and you can’t stuff anything more into the nooks, cubbies, crannies or cabinets. What do you do?
Well, when you finally get fed up and have a rainy day or some other excuse, you tackle the mess. Maybe you grab a big mug of coffee to prepare for the pending battle. Maybe you strip down to comfort clothes. Then, if you’re like me, you begin to organize stuff into piles. Labeled piles and throwaway piles and any other piles that can provide a means to start bringing order to the chaos.
In the semantic Web world, there is a phrase coined by Jim Hendler that captures this approach: A little semantics goes a long way [1]. A little semantics, just like your labeled piles, helps to bring order to information chaos.
Mind you, this is not fancy or expensive stuff. In the case of my office, it is colored sheets of paper labeled with Magic Markers as “Taxes” or “Internal” or “Blog Posts” or whatever. Then, I begin sifting and distributing. In the case of the semantic world, these are classifying things into like categories and simply relating them to other categories with simple relationships, such as “is Part Of” or “is Narrower Than”.
Of course, I could have approached my mess in a different way. I could have hired an efficiency expert to come in, interview me and all of my employees and colleagues, gotten a written analysis and report, and then committed to a multi-week project to completely store and place every single last piece of paper in my office or organize every rake and set of abandoned golf clubs in my garage. When done, I would have shelled out much money and I suspect still not have been able to find anything.
Sort of sounds like the traditional way IT does its business, doesn’t it? To clean up their information messes, enterprises need to find a better strategy.
I’m not too long from having returned from the SemTech conference, which overall was quite an excellent show. But despite its emphasis on semantic technologies and their usefulness to businesses and enterprises, I found one critical theme unspoken: the ability of semantic approaches to change how enterprise IT actually does business. New ways have got to be found to clean up the many and growing information piles emerging all around us.
The Changing Nature of ITIT is — and has been — going through a fundamental set of changes for decades. In the last decade, these changes have led to lowered relative spending, a shift in spending priorities toward services, less innovation, and less productivity. Some data and observations by researchers and analysts document these trends.
The following chart, using US Bureau of Economic Analysis data [2], shows the clear 50-year trend in declining hardware costs for enterprises, mostly resulting from the observation known as Moore’s Law. These massive hardware cost reductions (logarithmic scale) have also resulted in lower prices for IT as a whole. In 2008, for example, total relative IT prices were about two-thirds what they were a mere decade earlier:
Source: M.K. Bergman and Bureau of Economic Analysis [2] (click for full size)
In contrast, relative prices for software and services have remained remarkably flat over this entire period, including for the past decade. This is somewhat surprising given the emergence of packaged software and more recently open source. However, relative percentage expenditures for custom software and software developed in-house have also remained strong over the past decade [3].
The mid- to late-1990s represented the high-water mark on many bases for enterprise IT, expenditures and vendors. Roughly in 1997 or so, the number of public enterprise software vendors peaked as did venture funding [4] and relative expenditures for IT in relation to GDP. There was a major uptick in relation to preparing for Y2K and a major downtick due to the dot-com bubble, and then of course the past two years or so have seen a global economic downturn. But, as the figure below shows (red), the long-term trend tends to suggest a relative plateau for IT expenditures in relation to GDP somewhat around 2000:
Source: M.K. Bergman and Bureau of Economic Analysis [2] (click for full size)
Yet, like the first chart, software seems to be bucking this trend (blue lines above). Though perhaps the rate of growth in expenditures for software is slowing a bit, it is still on a growth upslope, especially in relation to overall IT expenditures. The next chart, in fact, specifically compares software expenditures to total IT expenditures. Software expenditures are some 40% higher in relation to total IT than they were a mere decade ago:
Source: M.K. Bergman and Bureau of Economic Analysis [2] (click for full size)
The mix of these software expenditures is also changing in major ways while stagnating in others.
The changing aspect is coming about from the shift of expenditures from license and maintenance fees to services. A number of software vendors began to see revenues from services overcome that from licensing in the 1990s. By the early 2000s, this was true for the enterprise software sector as a whole [4]. Today, service revenues account for 70% or so of aggregate sector revenues. Combined with the emergence of open source and other alternatives such as software as a service (SaaS), I think it fair to say that the era of proprietary software with exceedingly high margins from monopoly rents is over [5].
The stagnating aspect occurs in how the software expenditures are applied. According to Gartner, in the US, more than 70% of IT expenditures are devoted to simply running existing systems, with only about 11% of budgets devoted to innovation; other parts of the world spend nearly double on innovation and much lower for operations [6]. This relative lack of support for innovation and high percentages for running existing systems has held true for about a decade. Meanwhile, IT’s contribution to US productivity has been declining since 2001 [7].
What is the Cause for IT’s Ills?Last year, PricewaterhouseCoopers published a major report with the provocative title, “Why Isn’t IT Spending Creating More Value?” [7]. The 42-page report covered many of the aspects above. Among other factors, the PWC authors speculated that:
As consumption of IT increases and as technologies change and advance, businesses have been left to cobble together disparate software and hardware systems and tools. The end result? Unchecked IT spending, unneeded complexity, redundant systems, underutilized hardware and data centers, the need for expensive IT security, and, inevitably, diminishing returns from IT. In short, low levels of IT productivity create conditions for an IT cost crisis. [7]I suppose one could add to this litany other factors such as the growth and emergence of the Internet, sector consolidations through mergers and acquisitions, the rise of open source and alternatives such as SaaS, etc.
But which of these are causes? Which are symptoms? And which might only be consequences or coincident?
To be sure, all recognize the explosion of digital data and information, with sources and formats springing up faster than Whack-a-Mole. It is such an evident and ubiquitous phenomenon that pointing to it as a cause appears on the face of it quite obvious. Also obvious is that these new sources carry with them a diversity of systems and tools. While not categorically stated as such, it appears that PWC fingers the difficulties of “cobbling” these systems together as the root cause for low productivity and thus the IT cost crisis.
I agree totally that these are symptoms of what we see in IT’s current circumstance. I would even say these factors are a proximate cause to these ills. But I disagree they are the root cause. To discover that root, I believe, we must look deeper to mindset and assumptions.
Closed World Mindset as the Root CauseThere are some phenomena that are so obvious that they are easily missed. Not seeing your fingertip six inches between your eyes is one of these. We aren’t used to focusing on things so near at hand.
So, let’s look for a moment at the closed world assumption (CWA), a key underpinning to most standard relational data systems and enterprise schema and logics. CWA is the logic assumption that what is not currently known to be true, is false. If CWA is not directly familiar to you that is understandable; it is an implied assumption of these systems and logics. As such, it is not often inspected directly and therefore not often questioned [8].
With regard to standard IT systems, the closed world assumption has two important aspects:
- The assumption is that the information domain at hand is complete [9], and
- The related negation as failure, which assumes every predicate to be false that cannot be proved to be true.
On the face of them, these assumptions seem tame enough. And, indeed, there are some enterprise data systems that absolutely rely on them for efficient processing and completion times, such as most transaction systems. CWA is absolutely the appropriate design for such applications.
However, for knowledge management or representation applications — that is, applications which involve combining or using heterogeneous data or information from multiple data sources, which are exactly the same sources requiring information “cobbling” noted above by PWC — there are two very critical implications of the closed-world assumption (CWA):
- Efforts or projects can not be undertaken incrementally; if done in pieces, each piece must be complete and consistent, which is expensive to scope and do
- To be consistent and explicit, the predicates (properties or relationships) must also be complex to model the “reality” of the system, which is also expensive to scope and do [10].
The net effect, which I have argued before, most notably in a major piece about the open world assumption [11], is that typical projects with a knowledge management aspect have become costly, take very long to complete, often fail, and require much planning and coordination. These facts have been true for three decades as enterprises have attempted to extract knowledge from their electronic information using closed world approaches based on relational systems. And, as recognized by PWC, these problems are only getting worse with growth in diversity and scope of systems.
The implications of closed world v. open world approaches are absolutely at the root of the causes leading to declining productivity, low innovation, significant failures and increasing costs — all exacerbated with more data and more systems — now characterizing traditional enterprise IT. Moreover, it is not a problem for open world systems to link to and incorporate closed world approaches. With open world, there is no need for Hobson’s choices. Unfortunately, such is not true when one begins with a closed world premise.
Incremental is Good: Pay as You GoAs best as I can tell, Alon Halevy was the first to use the phrase “pay as you go” in 2006 to describe the incremental aspect of the open world approach in relation to the semantic Web [12]. The “pay as you go” phrase had been applied earlier to data management and storage and had also been used to describe phone calling plans.
Incremental concepts and “agility” have been popular topics for the past five to ten years in IT, most often related to software development. And, while “incremental” sounds good in relation to enterprise projects, especially of a knowledge management or information integration/federation nature, the actual methodologies put forward were anything but incremental in their conceptual underpinnings.
Unfortunately, the “pay as you go” phrase has (and still is) largely confined to incremental, open world approaches involving the semantic Web. How this approach might apply and benefit enterprises has yet to be articulated. Nonetheless, I like the phrase, and I think it evokes the right mindset. In fact, I think with linked data and many other aspects of the current semantic Web we are seeing such approaches come to fruition. Inch-by-inch, brick-by-brick, data on the Web is getting exposed and interlinked. “Pay as you go” is incremental, and that is good.
Purposeful is Better: Pay as You BenefitYet the idea of “pay as you benefit” is more purposeful, able to be planned and implemented, and founded on standard enterprise cost-benefit principles. I think it is a better (and more nuanced) expression of the “pay as you go” mindset in an enterprise setting. What it means is you can start small and be incomplete. You can target any domain or department or scope that is most useful and illustrative for your organization. You can deploy your first stand-ups as proofs-of-concept or sandboxes. And, you can build on each prior step with each subsequent one.
One of the reasons we (Structured Dynamics) embraced the MIKE2.0 methodology [13] was its inherent incremental character. (Government deployments often call them “spirals”.) In general, the five phases of MIKE2.0 can be represented as follows:
(click for full size)
It is specifically during the fifth phase, testing and improvement, that quantitative and qualitative benefits from the current increment are calculated and documented. This evolving methodology is where the enterprise can assess the results of its prior investment and scope and budget for the next one. These can be quick, rapid increments, or more involved ones, depending on the schedule, prior results and risk profile of the enterprise (or department) at that time.
Much is made of “incremental” or “agile” deployments within enterprises, but the nature of the traditional data system (and its closed world assumption) can act to undermine these laudable steps. The inherent nature of an open world approach, matched with methodologies and best practices, can work wonderfully with KM-related projects.
Quite Simply a Different Way to Do BusinessWe see in our current IT circumstances a number of embedded practices and assumptions. We have been assuming control and completeness — the closed world opposite to the open world approach. We have thus embraced and promoted “global” or enterprise-wide solutions: be they desktop operating systems or browsers or expensive enterprise-level proprietary software solutions. This scope leads to immense hurdle rates and risks: we better get our choices right up front, because if we don’t, the department or enterprise are at risk. We have an inward focus about our own resources, our own networks, our own systems. Meanwhile, when we look outward, we wonder how all of these new Web companies can grow and expand so rapidly in comparison to us.
Clearly, we are seeing shifts to more services than products, more open source, more outsourcing, and more software as a service. Yet, because of the legacy of decades-long commitments from prior IT investment and the failures of many hyped “solutions” such as ERP or BI or data warehousing or a dozen others, we also see a decline and a reluctance for IT to embrace new and transforming approaches. Our prior choices were practically tantamount to “betting the enterprise.” What if our new approaches fail as so many of their predecessors did? In a demanding, competitive environment can we afford to make such wrong choices again with such immense implications?
Yet, now that information technology is a given, it only seems natural that its role becomes an integral part of the enterprise, and not a special function. Like procurement, IT has matured to become a support function. Businesses should not succeed or fail based on the types of pencils and paper stock they use; so should they not depend on the software support choices that IT makes. Enterprises are now past the need to get “computerized”; they are thoroughly so. But our understanding of IT’s role and position has not evolved with its own success.
The first whiffs of these challenges to IT’s initial hegemony came from the departmental introduction of PCs and local networks in the early 1980s. It has continued with desktop software, spreadsheets and Web portals and sites. Large, mature companies awoke in horror in the last decade to discover they had hundreds — sometimes thousands — of Web sites and content dissemination points over which IT had little or no control. Such is the nature of entropy, and it is a fact for any organization of any size.
So, now, with strategies such as “pay as you benefit,” there is no longer an excuse not to innovate. There is not a justification to put off testing and discovering benefits that the open world and semantic approaches can bring to your organization. There is now a basis to make the case and set the affordable budgets within desirable timelines for becoming a semantic enterprise.
Mindsets and expectations do require some adjustment. For example, not everything will be known or modeled in early phases. But, is that also not true in any “real” real world? We’re not talking high-throughput transaction systems here, but beginning to pull together and link the information that is important to your organization strategically.
Remember the intro statement that “a little semantics goes a long way”? Well, that truth — and it is true — when combined with incremental deployment firmly tied to demonstrable results, promises quite simply a different way to do business. Never before have enterprises had working and winnable approaches such as this to test and innovate and learn and discover. Jump on in; the water is clear and warm.
And, oh, as to that mess in your closet or garage? Well, if you adhere to CWA, you will need to define a place for everything to go before you can start cleaning things up. I say: forget those false hurdles. If you’d really want to make a dent in the mess, grab a broom and start cleaning.
[1] Jim Hendler, “a little semantics goes a long way.” See http://www.cs.rpi.edu/~hendler/LittleSemanticsWeb.html. [2] All starting data is for the United States only and comes from the U.S. Bureau of Economic Analysis, U.S. Department of Commerce. The data tables were downloaded from the BEA Web site at http://www.bea.gov/national/nipaweb/SelectTable.asp. GDP data is from Section 1; enterprise private investment data from Section 5. For reasons as described in the text, all relative BEA numbers were re-adjusted from a 2005 baseline to 1997 based on absolute figures. Software figures and expenditures include packaged software, custom software and software developed in-house, but excludes software bundled or included within hardware. [3] Data not shown; see the “Software Investment and Prices, by Type” data on the BEA Web page http://www.bea.gov/national/info_comm_tech.htm. [4] Michael A. Cusumano, 2008. “The Changing Software Business: Moving from Products to Services,” Massachusetts Institute of Technology, in Computer, Vol 41 (1): 20-27, January 2008. See http://www.iae.univ-lille1.fr/SitesProjets/bmcommunity/Research/cusumano.pdf. This shift has occurred despite the recognition that potential gross margins from software packages can exceed 90% due to zero costs of reproduction. As Cusumano notes in a rule, “99 percent of zero is zero: The great profit opportunity from software products becomes theoretical and not practical” if not sold. Also, another interesting observation made by Cusumano is that in the shift to services vendors with both low percentages and high percentages of services, or what he calls the “sweet spots”, show higher contributions to profitability than vendors in the middle. He posits that low percentage vendors are getting mostly profitable maintenance fees, while those above 60% in services show profitability due to learning more replicable and systematic processes and approaches for service delivery. [5] While we may occasionally see some vendors successfully buck this trend, I suspect these will only occur for established vendors with established platform advantages or for isolated applications where the innovating vendors have a significant first-mover advantage. [6] Garnter calls the innovation category “transform”; see Gartner, Incorporated, 2009. “IT Software and Services, 2007-2010,” see http://www.slideshare.net/rsink/gartner-report-it-spending-2010. Also, see Jed Rubin and Howard Rubin, 2006. “Worldwide IT Benchmark Service New Trends & Findings for 2007: Strategic Performance Management and Measurement,” from Gartner Consulting Worldwide IT Benchmark Service; see http://www.gartner.com/teleconferences/attributes/attr_161183_115.pdf. [7] PricewaterhouseCoopers, 2009. “Why Isn’t IT Spending Creating More Value?”, see http://www.pwc.com/en_US/us/increasing-it-effectiveness/assets/it_spending_creating_value.pdf. [8] Though relational database systems did not begin with an understanding of CWA, but rather Edgar Codd’s 12 rules, the understandings of these were formulated later by Raymond Reiter. Reiter first described the basis of CWA in 1978, and then provided an axiomatization of relational databases and their deductive generalizations and basis in CWA in 1984; see http://prism.cs.umd.edu/papers/Min02:reiter_memoriam/memoriam-tplp.pdf. [9] Relational database systems also assume unique names for objects, which, while not perhaps the best design for federated systems, can be overcome in other ways. [10] For semantics-related projects there is a corollary problem to the use of CWA which is the need for upfront agreement on what all predicates “mean”, which is difficult if not impossible in reality when different perspectives are the explicit purpose for the integration. [11] See M. K. Bergman, 2009. The Open World Assumption: Elephant in the Room, December 21, 2009. The open world assumption (OWA) generally asserts that the lack of a given assertion or fact being available does not imply whether that possible assertion is true or false: it simply is not known. In other words, lack of knowledge does not imply falsity. Another way to say it is that everything is permitted until it is prohibited. OWA lends itself to incremental and incomplete approaches to various modeling problems. [12] This was also the first instance (I believe) of Alon coining the “dataspace” term. First use of the “pay as you go” phrase was, Alon Halevy, Michael Franklin, and David Maier, 2006. “Principles of Dataspace Systems,” in Proceedings of ACM Symposium on Principles of Database Systems, pp: 1-9. See also the slides accompanying that talk, Alon Halevy, 2006. “Principles of Dataspace Systems (PODS),” June 26, 2006; see http://www.cs.washington.edu/homes/alon/files/pods06-keynote.ppt, 2006. More explicitly the next year see Jayant Madhavan, Shirley Cohen, Xin (Luna) Dong, Alon Y. Halevy, Shawn R. Jeffery, David Ko, and Cong Yu, 2007. “Web-scale Data Integration: You Can Afford to Pay as You Go.” in 3rd Conf. on Innovative Data Systems Research (CIDR), pp 342-350, see http://research.yahoo.com/files/paygo.pdf. The term has been picked up by many others, notably Rada Chirkova, Dongfeng Cheny, Fereidoon Sadriz and Timo J. Salo, 2007. “Pay-As-You-Go Information Integration: The Semantic Model Approach,” see ftp://ftp.csc.ncsu.edu/pub/tech/2007/TR-2007-30.pdf; and most recently papers by Gerhard Weikum on RDF-3X; see http://domino.mpi-inf.mpg.de/internet/reports.nsf/c125634c000710cec125613300585c64/70e8f906d8090e6bc125757f00448ec9!OpenDocument&ExpandSection=-1. [13] See M.K. Bergman, 2010. “MIKE2.0: Open Source Information Development in the Enterprise,” AI3 Blog posting, February 23, 2010; and M.K. Bergman, 2010. “Open SEAS: A Framework to Transition to a Semantic Enterprise,” AI3 Blog posting, March 1, 2010.Categories: Semantic Web
Consolidating a Coherent Message with OSF
AI^3 - Tue, 07/06/2010 - 07:52
Release of Semantic Components Adds Final Layer, Leads to Streamlined Sites
Yesterday Fred Giasson announced the release of code associated with Structured Dynamics‘ open source semantics components (also called sComponents). A semantic component is an ontology-driven component, or widget, based on Flex. Such a component takes record descriptions, ontologies and target attributes/types as inputs and then outputs some (possibly interactive) visualizations of the records.
Though not all layers are by any means complete, from an architectural standpoint the release of these semantic components provides the last and missing layer to complete our open semantic framework. Completing this layer now also enables Structured Dynamics to rationalize its open source Web sites and various groups and mailing lists associated with them.
The OSF “Semantic Muffin”We first announced the open semantic framework — or OSF — a couple of weeks back. Refer to that original post for more description of the general design [1]. However, we can show this framework with the semantic components layer as illustrated by what some have called the “semantic muffin”:
(click for full size)
The OSF stack consists of these layers, moving from existing assets upward through increasing semantics and usability:
- Existing assets — any and all existing information and data assets, ranging from unstructured to structured. Preserving and leveraging those assets is a key premise
- scones / irON — this layer is for general conversion of non-RDF data and data schema to RDF (via irON or RDFizers) or for information extraction of subject concepts or named entities (scones)
- structWSF — is the pivotal Web services framework layer, and provides the standard, common interface by which existing information assets get represented and presented to the outside world and to other layers in the OSF stack
- Semantic components — the highlighted layer in the “semantic muffin”; in essence, this is the visualization and data interaction layer in the OSF stack; see more below
- Ontologies — are the layer containing the structured assets “driving” the system; this includes the concepts and relationships of the domain at hand, and administrative ontologies that guide how the user interfaces or widgets in the system should behave, and
- conStruct — is the content management system (CMS) layer based on Drupal and the thinnest layer with respect to OSF; this optional layer provides the theming, user rights and permissions, or other functionality drawn from Drupal’s 6500 third-party modules.
Not all of these layers are required in a given deployment and their adoption need not be sequential or absolutely depend on prior layers. Nonetheless, they do layer and interact with one another in the general manner shown.
The Semantics Components LayerCurrent semantic components, or widgets, include: filter; tabular templates (similar to infoboxes); maps; bar, pie or linear charts; relationship (concept) browser; story and text annotator and viewer; workbench for creating structured views; and dashboard for presenting pre-defined views and component arrangements. These are generic tools that respond to the structures and data fed to them, adaptable to any domain without modification.
Though Fred’s post goes into more detail — with subsequent posts to get into the technical nuances of the semantic components — the main idea of these components is shown by the diagram below.
These various semantic components get embedded in a layout canvas for the Web page. By interacting with the various components, new queries are generated (most often as SPARQL queries) to the various structWSF Web services endpoints. The result of these requests is to generate a structured results set, which includes various types and attributes.
An internal ontology that embodies the desired behavior and display options (SCO, the Semantic Component Ontology) is matched with these types and attributes to generate the formal instructions to the semantic components. These instructions are presented via the sControl component, that determines which widgets (individual components, with multiples possible depending on the inputs) need to be invoked and displayed on the layout canvas. Here is a picture of the general workflow:
(click for full size)
New interactions with the resulting displays and components cause the iteration path to be generated anew, again starting a new cycle of queries and results sets. As these pathways and associated display components get created, they can be named and made persistent for later re-use or within dashboard invocations.
Consolidating and Rationalizing Web Sites and Mailing ListsAs the release of the semantic components drew near, it was apparent that releases of previous layers had led to some fragmentation of Web sites and mailing lists. The umbrella nature of the open semantic framework enabled us to consolidate and rationalize these resources.
Our first change was to consolidate all OSF-related material under the existing OpenStructs.org Web site. It already contained the links and background material to structWSF and irON. To that, we added the conStruct and OSF material as well. This consolidation also allowed us to retire the previous conStruct Web site as well, which now re-directs to OpenStructs.
We also had fragmentation in user groups and mailing lists. Besides shared materials, these had many shared members. The Google groups for irON, structWSF and conStruct were thus archived and re-directed to the new Open Semantic Framework Google group and mailing list. Personal notices of the change and invites have been issued to all members of the earlier groups. For those interested in development work and interchange with other developers on any of these OSF layers, please now direct your membership and attention to the OSF group.
There has also been a revigoration of the developers’ community Web site at http://community.openstructs.org/. It remains the location for all central developer resources, including bug and issue tracking and links to SVNs.
Actual code SVN repositories are unchanged. These code repositories may be found at:
We hope you find these consolidations helpful. And, of course, we welcome new participants and contributors!
[1] An alternative view of this layer diagram is shown by the general Structured Dynamics product stack and architecture.Categories: Semantic Web
Growl notifications from Ruby on OS X
Chris Lowis Blog - Tue, 07/06/2010 - 00:00
I wanted to generate Growl notifications from a Ruby script on OSX and had a bit of trouble getting it to work. To save you the same hassle, here's the steps I took:
Install growlInstall Growl if you haven't already. I used version 1.2.
Set some growl preferencesIn the growl preference plane (find it under System Preferences)
- under network check "Listen for incoming notifications" and "Allow remote application registration"
- restart growl
There's a couple of different ruby bindings for Growl but ruby-growl seemed to work for me
gem install ruby-growl Send a growl notification from your Ruby scriptYou can now send notifications from Ruby
require 'rubygems' require 'ruby-growl' g = Growl.new "127.0.0.1", "ruby-growl", ["ruby-growl Notification"] g.notify "ruby-growl Notification", "It Came From Ruby-Growl", "Greetings!"Categories: Semantic Web
A Personal Thanks
AI^3 - Fri, 07/02/2010 - 16:44
As of July 1, Daily Readership Passed 3000
As of yesterday, the readership on this AI3 blog passed 3000 daily for the first time. It has been steadily inching upward, and finally passed that minor milestone. Thank you!
I’ve been writing this blog for five years now, with some 400 total posts, or about 1.5 blog posts per week. I know my style is toward longer articles and less frequent posting, most often of a fairly detailed or technical nature. And, while I have a Twitter account, I do not bleat. My style is for more meaty discussions. Perhaps it belies my age.
The real growth in this blog, however, has come about with my conscious attempt to write for the enterprise audience. RDF, the semantic enterprise, linked data and ontologies need a bridge from the technical community to the one of practitioners. Much progress and uptake has been occurring with these business and government audiences.
At the recent SemTech meeting, I was taken aside by many individuals noting my blog posts and thanking me for the thought and effort behind them. Thank you for noticing, and reading, and you are welcome. We need more translation of semantic topics and technologies to pragmatic terms.
If you have been following the standard W3C and SemWeb mailing lists recently, you will have noticed an anxiety and a continuation of the fractious nature of this “community”. In part this comes about because there are efforts afoot to revisit the RDF specs. But, mostly, I think, it is the ongoing nature of many in this group to snatch defeat from the jaws of victory. The search by some for perfection and insistence on parochial needs and preferences can give a pettiness to this “community” that is unbecoming.
Many of us have abandoned those forums for those reasons. As for myself, I will continue to evangelize to the buying market and keep the gaze pointing outward. There is a wealth of need for tools, techniques, methods, documentation, structures, and narratives. Thanks to all of you, the readership of this blog, for continuing to affirm this value.
So, in the great scheme of things, the readership of this blog is quite small in comparison with the big boys. On the other hand, very few individuals have higher numbers, and all of this for a fairly esoteric area. I think this proves there is a market and a need out there for semantic solutions.
Thanks again! And, for those in the United States, have a most enjoyable 4th of July holiday!
Categories: Semantic Web
Domain-specific Instantiations Based on the Open Semantic Framework
AI^3 - Thu, 06/17/2010 - 17:56
Open Semantic Framework&rft.aulast=Bergman&rft.aufirst=Mike&rft.subject=Adaptive Innovation&rft.subject=Ontology Best Practices&rft.subject=Open Source&rft.subject=Semantic Enterprise&rft.subject=Software Development&rft.subject=Structured Dynamics&rft.subject=Web-oriented Architecture&rft.source=AI3:::Adaptive Information&rft.date=2010-06-17&rft.type=blogPost&rft.format=text&rft.identifier=http://www.mkbergman.com/891/domain-specific-instantiations-based-on-the-open-semantic-framework/&rft.language=English">
Structured Dynamics Completes Design Phase; Citizen Dan First ExemplarStructured Dynamics has been in a fervent — and, we believe, fruitful — design phase for the past 18 months. All of the working parts related to how to embrace becoming a semantic enterprise have now been defined and designed. Actual tools and components accompany many of these parts and have been deployed.
Recently, I have been speaking and blogging much about rationale, process, mindset and approach for how to bring semantics into the organization. But, prior to now, we have not spoken much about the overall design behind our approach. Today, as we complete our design phase and introduce our first exemplar instance of it — Citizen Dan [1] — we are finally in a position to describe this overall approach.
We term our approach the open semantic framework, also OSF. The open semantic framework is a combination of a layered architecture and modular software. The open semantic framework represents the software component of the four-component total open solution, recently described in a three part series. I return to this topic in the conclusion of this post.
Revisiting Design ObjectivesOver the past nine months, I have been focusing my writing largely on the semantic enterprise, with more specificity regarding our Open SEAS (Semantic Enterprise Adoption and Solutions) initiative. In bits and pieces, these writings have tended to reflect a number of objectives:
- Leverage existing information assets (data + structure) as much as possible
- Develop incrementally, and validate and justify as you go
- Emphasize, where possible, open standards and open software
- Employ Web-oriented architectures
- Adopt an open-world approach that acknowledges that information is most often incomplete; the approach is a key enabler for incremental deployments
- Use URIs as object identifiers, and use linked data where practical
- Embrace any data format found in the wild, but use RDF as the ultimate integration data model
- Design architectures and APIs that avoid “lock-in” and support multiple tools options across the stack
- Provide systems and capabilities that put all information sources — text, media, semi-structured and conventional databases — on an equal footing
- Promote designs that bring the ability to create useful results into the hands of users and decisionmakers; relegate IT to a support role.
To date, the result of these design objectives is perhaps best captured in my Seven Pillars of the Open Semantic Enterprise posting, as well as our general discussions regarding adaptive ontologies. Yet, still, these writings have been somewhat piecemeal. What this document attempts to do is to place all of these perspectives into a single, coherent whole.
The Incremental Layers of the Open Semantic FrameworkStructured Dynamics has been a strong advocate for layered architectures, with clear APIs between layers as appropriate. But these layers are not “laminates” that completely cover the layer below, nor are they all needed or necessary. Depending on the circumstance, some layers are unneeded or superfluous. Layers may be added or not incrementally.
In this manner, then, the open semantic framework is perhaps more akin to a pearl, than to a laminate or cocoon. Each subsequent layer does not “embed” the layer prior to it, and some layers actually may inter-operate with multiple layers below or above it (this is notably true for the “ontologies” layer, which has interactions up and down the stack).
Nonetheless, we can envision this pearl of the open semantic framework and its layers as follows:
(click for full size)
Others have termed this the “semantic muffin” or even “semantic muppet” or “semantic blob”. Whatever (hehe). The real idea is that layers may accrete (as in the growth of a pearl) and occur over time and be uneven. Each layer, though, does have a role to play (though it may not be needed in a given deployment), and does act to augment existing information assets in the transition to a semantic framework. Beginning at the core, each of these layers — with external references as appropriate for more details — is described below.
Existing Assets LayerThe open semantic framework is premised on leveraging existing information assets. Sure, once the framework is in place, new information can be brought into it in a more direct, semantic manner. But, the real thrust and benefit of this framework is to provide an incremental pathway for finally inter-operating and federating prior decades of data, structure and information assets.
These information assets may reside inside or outside the enterprise. They may (and DO!) exist in many formats and are described by many schema. They may come from internal transaction systems or warehouses, or may exist external on the Web or at supplier or partner sites. These information assets may span from conventional databases and relational data systems to XML interchange standards, Web pages and standard internal text or documents. In short, there is NO information asset that is not amenable to be included in this framework.
The Information Transformation (scones/irON) LayerThe information transformation layer provides either: 1) extraction of concepts and entities as structured metadata from source text or documents; or 2) conversion of existing data assets to interoperable form. As implemented by Structured Dynamics, the extractions are conducted by either scones (Subject Concept or Named EntitieS) or third-party utilities, and the conversions occur via irON (instance record Object Notation) or third-party “RDFizers“.
Depending on the source, the net result of the transformation is to produce interoperable data and information that can be ingested and used by other layers in the framework.
Though not strictly analogous, this layer bears some resemblance to the ETL (extract, transfer, load) utilities used in many enterprise information integration applications. Unlike those conventional systems, this information transformation layer also may capture and represent some of the source schema.
In all cases, however, these transformations are relatively simple and get parsed against the available structure (the ontologies, schema and entity reference lists) in the system to generate the semantic metadata (tags).
At this point, the extracted structure is generally at the level of instance records, or the ABox, with simple assertions of attribute-value pairs for specific records [2]. Little schema transformation or mapping occurs at this layer (if such is needed, that occurs at the structWSF layer; see next). Actual federation or interoperation occurs at later layers based on the TBox structures [2].
This modular portion of the framework is explicitly designed with APIs to allow third-party tools to be plugged in and substituted.
The structWSF LayerThe major workhorse of the open semantic framework is the structWSF (Web services framework) layer. structWSF is the most complicated of the OSF layers and has many supporting software packages and capabilities. The structWSF layer provides the standard, common interface (”canonical”) layer by which existing information assets get represented and presented to the outside world and to other layers in the OSF stack.
structWSF is a platform-independent Web services framework for accessing and exposing structured RDF data. Its central organizing perspective is that of the dataset. These datasets contain instance records, with the structural relationships amongst the data and their attributes and concepts defined via ontologies (schema with accompanying vocabularies; see below).
The structWSF middleware framework is generally RESTful in design and is based on HTTP and Web protocols and open standards. The current structWSF framework comes packaged with a baseline set of about twenty Web services in CRUD, browse, search and export and import. All Web services are exposed via APIs and SPARQL endpoints. Each request to an individual Web service returns an HTTP status and optionally a document of resultsets. Each results document can be serialized in many ways, and may be expressed as either RDF or pure XML. An internal representation, structXML [3], is used for internal communications across all structWSF Web services and with other layers.
structWSF has a central service that governs access rights and permissions. These rights occur at the level of the dataset, which gives immense flexibility to how data may be accessed, read, modified, created or deleted (or not). Datasets within a given structWSF instance may be accessed directly via API or via SPARQL queries to the instance’s endpoint. Depending on rights and query, results sets may be returned from a given structWSF instance in an infinite variety of ways.
This latter capability is the essential interface for subsequent layers in the open semantic framework stack. Depending on those subsequent components, pre-staged data and results sets may be returned for an essentially limitless variety of purposes.
Each structWSF instance also has a unique Web address that enables one or a multitude of instances to communicate and share with one another. This simple, but elegant, method enables structWSF instances to participate or not in potentially global or restricted local networks and collaboration environments. This is currently the largest untapped potential of structWSF with respect to its existing deployments.
The Semantic Components LayerThe newest layer in the stack is the semantic components layer. This layer takes results sets — most often generated by a specific query or data slice request — from one or more structWSF instances and then presents that information via a variety of data visualization or data presentation widgets (what we specifically call ‘semantic components‘ due to their design [4]). The operation and sensitivity of these display components are themselves driven by a presentation and data analysis (including statistics) ontology.
Current display widgets include: filter; tabular templates (similar to infoboxes); maps; bar, pie or linear charts; relationship (concept) browser; story and text annotator and viewer; workbench for creating structured views; and dashboard for presenting pre-defined views and component arrangements. These are generic tools that respond to the structures and data fed to them, adaptable without modification to any domain.
As presently implemented by Structured Dynamics, this layer consists either of Flex data visualization components or structured data display templates based on Smarty. The inherent design allows for updates to other bases (such as HTML5). The layer may also be swapped out or substituted with third-party capabilities.
The strength and power of this system is governed by its own ontology, the Semantic Component Ontology (SCO) (see next).
This is an extremely flexible layer in the open semantic framework stack. Expect an ongoing series of explanatory blog posts and online resources in the upcoming weeks to explain this innovative capability.
The Ontologies LayerThe ontologies layer actually refers to all structured assets driving the system. As such, this layer might be considered the “brain” (though rather simply specified!) of the open semantic framework.
At a true schema or TBox level [2], the ontologies layer represents the concept and relationships of the domain at hand. This layer also hosts the specific local entities and prominent things (people, places, events, etc.) useful for extracting local and domain-specific relevance. However, those views are also supplemented with some administrative ontologies (two examples are SCO and irON) that guide how the user interfaces or widgets in the system should behave.
The concept level represents the “world view” of the specific instantiation of the open semantic framework at hand. This conceptual (TBox) view provides the structural organization of information, inferencing capabilities, and navigation, faceting and explorer structure. The entity (ABox) view provides tagging for prominent individuals and instances important to the domain at hand, and guides the structure behind data visualizations of attribute or indicator data.
The administrative level uses simple roles and relationships for attributes and indicators to inform the framework as to how and with what widget to display information. For example, a “type” of information that is geographically related can be instructed to use the map component as an option for display. Whether some information is used for totals, comparison purposes, or other specifications useful to data visualization and graphing may also be specified.
The language and relationships (predicates or properties) of these administrative ontologies are simple and straightforward. It is, for example, relatively easy to define data display functions at the broad dataset and attributes level. Simple determinations drive how results sets and their associated results types may be displayed, no matter what datasets or slices may be generated as a result of the queries or requests fed to the system.
The structure in these layers can be replaced by other structures for other instantiations and circumstances. Indeed, all other layers in the open semantic framework can remain relatively fixed while tailoring the instance to new domains solely via this layer. The ontologies layer is what gives any given instantiation of OSF — such as Citizen Dan — its unique focus and scope.
The Content Management System (conStruct) LayerThe thinnest layer (that is, least substantial with respect to this framework) is the content management system (CMS) layer. In its current form, the open semantic framework uses the Drupal CMS via our conStruct plug-in modules. The design of the framework, however, has explicitly accommodated the possibility that other CMSs may substitute for this role.
The CMS layer is optional if structWSF endpoints are sufficient or if simple Web pages hosting semantic components are deemed as adequate. Very small organizations or deployments may reasonably choose to have no CMS layer at all.
However, for most sites or portals with more than a few active users, it is desirable to have broad flexibility in theming (”skinning”), user rights and permissions, or other functionality. These are the roles of the CMS layer. Drupal, for example, is presently supported by more than 4500 third-party modules in every conceivable function, from polling to blogs and rating systems and bulletin boards.
For such generalized portals or collaboration environments, it makes sense to adopt and install a flexible CMS system, such as Drupal. Much of the user experience and functional environment can be provided through such means.
The open semantic framework is thus designed to reside easily in a CMS while also providing the hooks to take advantage of the generalized user rights and functionality of the CMS. In this manner, the open semantic framework is able to stay focused on its structured data and interoperability purposes, while still gaining the advantages of rich-featured content management systems.
The OSF is a Web-oriented ArchitectureWith its inherent open-world orientation [5] and distributed and collaborative potential, the open semantic framework was designed from the outset to be Web-capable and Web-oriented:
(click for full size)
A Web-oriented architecture (WOA) has a number of understood requirements, to which the open semantic framework adheres. Specifically, these design considerations support the framework as being part of WOA:
- Data and objects are all identified with Web addresses (URIs)
- Data is generally exposed (and universally available) as linked data
- SPARQL endpoints and APIs are generally RESTful in design
- The overall architecture is modular, with inherent decentralized and distributed aspects
- All display and visualization aspects are cross-browser ready and capable.
Citizen Dan is our first exemplar instance of this open semantic framework. The details page for the project goes into some of Citizen Dan’s functionality and capabilities.
Citizen Dan is specifically geared to local governments and localities, with an emphasis on community indicator systems (CIS). CIS have become a popular way of measuring and tracking measures of local economic and social well-being; they are closely related to sustainability and how to measure it as used in many economic and environmental domains.
However, in the context of this post, what is really interesting about Citizen Dan is that its semantic framework is a completely open and generic one. The same set of tools and capabilities described on its details page can be applied to any domain that needs to manage and understand information in its own domain. This includes from unstructured text or documents to conventional structured databases.
What changes from domain to domain are the data structures (the ontologies, schema and entity reference lists; see above) that are fed to this open semantic framework. By swapping out new structures, what can be called Citizen Dan in one instance can morph to become Curriculum Carla in say, the education instance or Doctor Doolittle in the veterinary science instance [6].
We can illustrate these multiple instances as follows:
(click for full size)
What this figure illustrates is that even a branded expression of the framework — such as Citizen Dan — is merely an instance of that framework. And, actually, when expressed in such a packaged manner, we can more accurately call the standard and bundled suite of generic functions and accompanying structure of Citizen Dan as an instantiation of the open semantic framework:
in·stan·ti·ate \in-ˈstan(t)-shē-āt\ (transitive verb) is to:- (transitive) to represent an abstract concept by a concrete instance
- (transitive, object orientated computing) to create an object (an instance) of a specific class
in·stan·ti·a·tion \in-‘stan(t)-shē-ā-shən\ (noun) [7]
By replacing the structure bases, and by tailoring the function suite appropriate to a given market and use, we can create many instantiations of the open semantic framework for different domains and markets. In this manner, Citizen Dan can be seen as an early exemplar of the framework, but not as a definer and limiter to it.
OSF is the Software Leg to a ‘Total Open Solution‘So far, this discussion has focused solely on considerations of software and architecture. While we see the power of the open semantic framework, highly useful in itself, this is inadequate alone to achieve acceptance and success in the enterprise (as we noted in our most recent posts). The very forces that are compelling enterprises to look at new options, are also the same ones that pose difficult hurdle rates for acceptance of open source.
To address this issue, we have developed a four-legged foundation to what we termed the total open solution. The solution involves software, structure, documentation and methods (or best practices). Each of these connect and relate to the other foundations.
The open semantic framework is clearly the software (and architecture) leg to this foundation. Again, however, what is interesting is that the mere swapping out of the structure can also make the system relatively ready for other domains.
We see these relationships in the following diagram, that also shows that the DocWiki portions of the solution embody the documentation (aside from code-level comments) and methods legs of the foundation:
(click for full size)
Differences between domains may also lead to differences as to which components are included or not in that domain’s desired instantiation.
The hugely important implied point, however, from the diagram above, is to show how nearly universal the content and methods in the DocWiki may be to other domains. Because the deltas between domains largely result from structure and what specific functional components are included or not, it becomes clear that most documentation and practices shared with the DocWiki will be applicable across domains. Sure, the use cases and some of the specific terminology may change, but we can also now see a high degree of re-usability of documentation and knowledge base across markets. This realization makes the usefulness and leverage of the DocWiki even higher.
A Common Language and Framework for Moving ForwardDeveloping “common language” by which to describe and convey things — especially new things like semantics that also have strong technical aspects — is tough, very tough. We are only now beginning on this process; we look to many in the community and elsewhere to help define informative and evocative terminology.
Per the original design objectives above, Structured Dynamics has approached the challenge of the semantic enterprise in what we think is both a pragmatic and a new way. The insistence on preserving and respecting existing information assets, matched with the opportunities and different mindsets arising from an open-world approach [5], have necessitated thinking through new designs and developing new concepts. Any time such new thinking and concepts occurs, new language and new metaphors must accompany it.
While certainly there are components and various software packages that populate and comprise an open semantic framework, the framework is also just as importantly a world view or way to think about information, information development, and its architecture. For example, a pivotal concept is that an open semantic framework is built around generic tools responsive to the information structures fed to them. This realization shifts the locus of emphasis from software development per se to creating, managing and adapting data and information structures. While this democratizes the information development process and is more inclusive of all knowledge workers, it also imposes needs for new toolsets and business processes. We are only at the nascent stages of understanding and learning about these differences.
Similarly, a development approach that is inherently incremental and leverages (rather than replaces or displaces) existing information assets means IT projects need to be considered in a new light. Small projects with more emphasis on tangible and demonstrable benefits will alter budgets, lower risks, and place a need for quicker turnaround. Like the architecture of the open semantic framework itself, projects based on OSF are also more distributed, decentralized and modular.
With such decentralization also comes the need for mechanisms and systems to overcome vendor “lock-in” and proprietary systems. A key thrust in support of what we have called the total open solution and its mixture of documentation and methods to accompany software and structure is specifically targeted at this issue. Tools and means for collaboration and concurrent contributions are another possible answer. Prior software practices in agile development and version control will see extensions to all manner of information development across the enterprise.
We are proud of our design work and proof-testing with clients over the past 18 months. We believe the open semantic framework and its implications to be a fundamental shift in how organizations need to think about their information development, existing information assets, and IT budgets and processes. We know widescale adoption is not yet at hand — enterprises are justifiably conservative when it comes to new thinking. But, given global competition and tight pocketbooks, the open semantic framework is a formulation to which enterprises and governments should pay very close attention.
[1] Citizen Dan is an open source system for aggregating different indicator data concerning local, community well-being. Information sources may include the Web, real-time feeds, government datasets, municipal government information systems, or crowdsourced data. Information can range from standard structured data to local narratives, including from minutes and reports, contributed stories, blogs or news outlets. The ‘raw’ input data can come in essentially any format, which is then converted to a standard form with consistent semantics. See current details with screenshots. [2] Structured Dynamics’ best practices approach makes explicit splits between the “ABox” (for instance data) and “TBox” (for ontology schema) in accordance with our working definition for description logics, a fundamental underpinning for how we use RDF: “Description logics and their semantics traditionally split concepts and their relationships from the different treatment of instances and their attributes and roles, expressed as fact assertions. The concept split is known as the TBox (for terminological knowledge, the basis for T in TBox) and represents the schema or taxonomy of the domain at hand. The TBox is the structural and intensional component of conceptual relationships. The second split of instances is known as the ABox (for assertions, the basis for A in ABox) and describes the attributes of instances (and individuals), the roles between instances, and other assertions about instances regarding their class membership with the TBox concepts.” [3] A subsequent post will document this rather straightforward XML schema. [4] Contact Structured Dynamics for a early sneak peek. The Citizen Dan application will be publicly released as an online sandbox and demo by the end of summer 2010. [5] See M. K. Bergman, 2009. The Open World Assumption: Elephant in the Room, December 21, 2009. The open world assumption (OWA) generally asserts that the lack of a given assertion or fact being available does not imply whether that possible assertion is true or false: it simply is not known. In other words, lack of knowledge does not imply falsity. Anothe way to say is it that everything is permitted until it is prohibited. OWA lends itself to incremental and incomplete approaches to various modeling problems. [6] Of course, things are always not so simple as this. The CMS layer gives the open semantic framework the ready ability to change themes and layouts (”skins), not to mention the breadth and specifics of what ancillary site functionality might be provided. Moreover, the module basis of the open semantic framework also means that entire clusters of functionality might be dropped from a given instantiation (or added to it!) without violating or negating this framework. [7] Dictionary references are from Merriam-Webster and Wikitionary.Categories: Semantic Web
Brown Bag Lunch: Structure Paves the Way to the Semantic Web
AI^3 - Fri, 06/11/2010 - 06:55
How Shall We Measure Progress Over the Past Three Years?
For a dozen years, my career has been centered on Internet search, dynamic content and the deep Web. For the past few years, I have been somewhat obsessed by two topics.
The first topic, a conviction really, is that implicit structure needs to be extracted from Web content to enable it to be disambiguated, organized, shared and re-purposed. The second topic, more an open question as a former academic married to a professor, is what might replace editorial selections and peer review to establish the authoritativeness of content. These topics naturally steer one to the semantic Web.
A Millennial PerspectiveThe semantic Web, by whatever name it comes to be called, is an inevitability. History tells us that as information content grows, so do the mechanisms for organizing and managing it. Over human history, innovations such as writing systems, alphabetization, pagination, tables of contents, indexes, concordances, reference look-ups, classification systems, tables, figures, and statistics have emerged in parallel with content growth [19].
When the Lycos search engine, one of the first profitable Internet ventures, was publicly released in 1994, it indexed a mere 54,000 pages [1]. When Google wowed us with its page-ranking algorithm in 1998, it soon replaced my then favorite search engine, AltaVista. Now, tens of billions of indexed documents later, I often find Google’s results to be overwhelming dross — unfortunately true again for all of the major search engines. Faceted browsing, vertical search, and Web 2.0’s tagging and folksonomies demonstrate humanity’s natural penchant to fight this entropy, efforts that will next continue with the semantic Web and then mechanisms unforeseen to manage the chaos of burgeoning content.
An awful lot of hot air has been expelled over the false dichotomy of whether the semantic Web will fail or is on the verge of nirvana. Arguments extend from the epistemological versus ontological (classically defined) to Web 3.0 versus SemWeb or Web services (WS*) versus REST (Representational State Transfer). My RSS feed reader points to at least one such dust up every week.
Some set the difficulties of resolving semantic heterogeneities as absolutes, leading to an illogical and false rejection of semantic Web objectives. In contrast, some advocates set equally divisive arguments for semantic Web purity by insisting on formal ontologies and descriptive logics. Meanwhile, studied leaks about “stealth” semantic Web ventures mean you should grab your wallet while simultaneously shaking your head.
A Decades-Long PerspectiveMy mental image of the semantic Web is a road from here to some achievable destination — say, Detroit. Parts of the road are well paved; indeed, portions are already superhighways with controlled on-ramps and off-ramps. Other portions are two lanes, some with way too many traffic lights and some with dangerous intersections. A few small portions remain unpaved gravel and rough going.
Wreck in Nebraska during the 1919 Transcontinental Motor Convoy
A lack of perspective makes things appear either too close or too far away. The automobile isn’t yet a century old as a mass-produced item. It wasn’t until 1919 that the US Army Transcontinental Motor Convoy made the first automobile trip across the United States.
The 3,200 mile route roughly followed today’s Lincoln Highway, US 30, from Washington, D.C. to San Francisco. The convoy took 62 days and 250 recorded accidents to complete the trip (see figure), half on dirt roads at an average speed of 6 miles per hour. A tank officer on that trip later observed Germany’s autobahns during World War II. When he subsequently became President Dwight D. Eisenhower, he proposed and then signed the Interstate Highway Act.
That was 50 years ago. Today, the US is crisscrossed with 50,000 miles of interstates, which have completely remade the nation’s economy and culture [2].
Today’s PerspectiveLike the interstate system in its early years, today’s semantic Web lets you link together a complete trip, but the going isn’t as smooth or as fast as it could be. Nevertheless, making the trip is doable and keeps improving day by day, month by month.
My view of what’s required to smooth the road begins with extracting structure and meaningful information according to understandable schema from mostly uncharacterized content. Then we store the now-structured content as RDF triples that can be further managed and manipulated at scale. By necessity, the journey embraces tools and requirements that, individually, might not constitute semantic Web technology as some strictly define it. These tools and requirements are nonetheless integral to reaching the destination. We are well into that journey’s first leg, what I and others are calling the structured Web.
For the past six months or so I have been researching and assembling as many semantic Web and related tools as I can find [3]. That Sweet Tools listing now exceeds 500 tools [4] (with its presentation using the nifty lightweight Exhibit publication system from MIT’s Simile program [5]). I’ve come to understand the importance of many ancillary tool sets to the entire semantic Web highway, such as natural language processing and information extraction. I’ve also found new categories of pragmatic tools that embody semantic Web and data mediation processes but don’t label themselves as such.
In its entirety, the Sweet Tools listing provides a pretty good picture of the semantic Web’s state. It’s a surprisingly robust picture — though with some notable potholes — and includes impressive open source options in all categories. Content publishing, indexing, and retrieval at massive scales are largely solved problems. We also have the infrastructure, languages, and (yes!) standards for tying this content together meaningfully at the data and object levels.
I also think a degree of consensus has emerged on RDF as the canonical data model for semantic information. RDF triple stores are rapidly improving toward industrial strength, and RESTful designs enable massive scalability, as terabyte- and petabyte-scale full-text indexes prove.
Powerful and flexible middleware options, such as those from OpenLink [6], can transform and integrate diverse file formats with a variety of back ends. The World Wide Web Consortium’s GRDDL standard [7] and related tools, plus various “RDF-izers” from Massachusetts Institute of Technology and elsewhere [8], largely provide the conversion infrastructure for getting Web data into that canonical RDF form. Sure, some of these converters are still research-grade, but getting them to operational capabilities at scale now appears trivial.
Things start getting shakier when trying to structure information into a semantic formalism. Controlled vocabularies and ontologies range broadly and remain a contentious area. Publishers and authors perhaps have too many choices: from straight Atom or RSS feeds and feeds with tags to informal folksonomies and then Outline Processor Markup Language [9] or microformats [10]. From there, the formalism increases further to include the standard RDF ontologies such as SIOC (Semantically-Interlinked Online Communities), SKOS (Simple Knowledge Organizing System), DOAP (Description of a Project), and FOAF (Friend of a Friend) [11] and the still greater formalism of OWL’s various dialects [12].
If we compare the semantic Web to the US interstate highway system, we’re still in the early stages of a journey that will remake our economy and culture. Many potholes on the road to the semantic Web exist. One ready task is to transform existing structure to RDF. Another priority is to refine tools to extract structure and meaningful information from uncharacterized content.Arguing which of these is the theoretical best method is doomed to failure, except possibly in a bounded enterprise environment. We live in the real world, where multiple options will always have their advocates and their applications.
All of us should welcome whatever structure we can add to our information base, no matter where it comes from or how it’s done. The sooner we can embrace content in any of these formats and convert it into canonical RDF form, we can then move on to needed developments in semantic mediation, some of the roughest road on the journey.
Potholes on the Semantic HighwaySemantic mediation requires appropriate structured content. Many potholes on the road to the semantic Web exist because the content lacks structured markup; others arise because existing structure requires transformation. We need improved ways to address both problems. We also need more intuitive means for applying schema to structure. Some have referred to these issues as “who pays the tax.”
Recent experience with social software and collaboration proves that a portion of the Internet user community is willing to tag and characterize content. Furthermore, we can readily leverage that resulting structure, and free riders are welcomed. The real pothole is the lack of easy — even fun — data extractors and “structurizers.” But we’re tantalizingly close.
Tools such as Solvent and Sifter from MIT’s Simile program [13] and Marmite from Carnegie Mellon University [14] are showing the way to match DOM (document object model) inspectors with automated structure extractors. DBpedia, the alpha version of Freebase, and System One now provide large-scale, open Web data sets in RDF [15], including all of Wikipedia. Browser extensions such as Zotero [16] are showing how to integrate structure management into acceptable user interfaces, as are services such as Zoominfo [17]. Yet we still lack easy means to design the differing structures suitable for a plenitude of destinations.
Amazingly, a compelling road map for how all these pieces could truly fit together is also incomplete. How do we actually get from here to Detroit? Within specific components, architectural understandings are sometimes OK (although documentation is usually awful for open source projects, as most of the current tools are). Until our community better documents that vision, attracting new contributors will be needlessly slower, thus delaying the benefits of network effects.
So, let’s create a road map and get on with paving the gaps and filling the potholes. It’s not a matter of standards or technology — we have those in abundance. Let’s stop the silly squabbles and commit to the journey in earnest. The structured Web’s ability to reach Hyperland [18], Douglas Adam’s prescient 1990 forecast of the semantic Web, now looks to be no further away than Detroit.
This Friday brown bag leftover was first placed into the AI3 refrigerator about three years ago on May 3, 2007. The piece was my answer to a request by Jim Hendler to pen some thoughts on the semantic Web, based on I believe what he thought might be a pragmatic perspective combining Internet business with Web science. The formal piece appeared as a guest editorial in the May/June 2007 issue of IEEE Intelligent Systems. What appears above is unaltered from my original posting (aside from some minor formatting clean-up and — sorry to say — some of the projects are now defunct). [1] Chris Sherman, “Happy Birthday, Lycos!,” Search Engine Watch, August 14, 2002. See http://searchenginewatch.com/showPage.html?page=2160551. [2] David A. Pfeiffer, “Ike’s Interstates at 50: Anniversary of the Highway System Recalls Eisenhower’s Role as Catalyst,” Prologue Magazine, National Archives, Summer 2006, Vol. 38, No. 2. See: http://www.archives.gov/publications/prologue/2006/summer/interstates.html. [3] The mention of specific tool names is meant to be illustrative and not necessarily a recommendation. [4] Sweet Tools (SemWeb) listing; see http://www.mkbergman.com/new-version-sweet-tools-sem-web/ . [5] See http://simile.mit.edu/exhibit/. [6] OpenLink Software’s Virtuoso and Data Spaces products; see http://www.openlinksw.com/. [7] W3C’s Gleaning Resource Descriptions from Dialects of Languages (GRDDL, pronounced “griddle”). See http://www.w3.org/2004/01/rdxh/spec. [8] See http://simile.mit.edu/wiki/RDFizers. [9] Outline Processor Markup Language (OPML); see http://www.opml.org/. [10] Microformats; see http://microformats.org/. [11] DOAP (Description of a Project), FOAF (Friend of a Friend), SIOC (Semantically-Interlinked Online Communities) and SKOS (Simple Knowledge Organizing System). [12] W3C’s Web Ontology Language (OWL). See http://www.w3.org/TR/owl-features/. [13] Solvent (http://simile.mit.edu/wiki/Solvent) and Sifter (http://simile.mit.edu/wiki/Sifter) are from MIT’s Simile program. [14] Marmite (http://www.cs.cmu.edu/~jasonh/projects/marmite/) is from Carnegie Mellon University. [15] DBpedia (http://dbpedia.org/docs/) and Freebase (in alpha, by invitation only at http://www.freebase.com/) are two of the first large-scale open datasets on the Web; Wikipedia has also been converted to RDF by System One (http://labs.systemone.at/wikipedia3). [16] Zotero is produced by George Mason University’s Center for History and New Media; see http://www.zotero.org. [17] ZoomInfo (http://www.zoominfo.com/) provides online structured search of companies and people, plus broader services to enterprises. [18] The late Douglas Adams, of Doctor Who and A Hitchhiker’s Guide to the Galaxy fame, produced a TV program for BBC2 presaging the Internet called Hyperland. This 50-min video can be seen in five parts via YouTube at Part 1 of 5, 2 of 5, 3 of 5, 4 of 5 and 5 of 5. [19] Since I first wrote this piece, I have systematized these developments in my Timeline of Information History.Categories: Semantic Web
Listening to the Enterprise: Total Open Solutions, Part 3
AI^3 - Tue, 06/01/2010 - 04:08
Total Open Solutions, Part 3&rft.aulast=Bergman&rft.aufirst=Mike&rft.subject=Adaptive Innovation&rft.subject=MIKE2.0&rft.subject=Open Source&rft.source=AI3:::Adaptive Information&rft.date=2010-05-31&rft.type=blogPost&rft.format=text&rft.identifier=http://www.mkbergman.com/884/listening-to-the-enterprise-total-open-solutions-part-3/&rft.language=English">
Introducing the Open Source ‘DocWiki’ SystemIn the first part to this series, we began with the argument that open source software alone was not sufficient to meet the required acceptance factors in the enterprise. As a guiding way to create the right mindset around these issues we shared the saying that we have adopted at Structured Dynamics that, “We’re successful when we are not needed.”
In the second part of this series we described the four legs of a stable, open source solution. These four legs are software, structure, methods and documentation. When all four are provided, we termed this a total open solution.
Now, in this third and concluding part to our series, we introduce the open source documentation and methodology system called ‘DocWiki’. It complements the base open source software, in the process completing the conditions for a total open solution.
Though we call this system ‘DocWiki’, it is not meant to be a brand or particular product description for what Structured Dynamics is offering. Rather, ‘DocWiki’ is merely a placeholder name for a generic, open source system and knowledge base that can be downloaded, installed, branded, modified and extended in whatever way the user sees fit. ‘DocWiki’ is a baseline documentation and methodology “starter kit” that can be dressed up in new clothes or packaged and named in whatever manner best suited to a given deployment.
In describing the major components of this ‘DocWiki’ system we will again use our Citizen Dan initiative [1] as we did in Part 2. This gives us a real use case, though the same approach is applicable to any open source information management initiative by enterprises.
We call the specific version of the ‘DocWiki’ used in the case of Citizen Dan the ‘CIS DocWiki‘ (for community indicator systems), specific to the domain and local government focus of Citizen Dan. Similarly, the structured vocabulary and ontology that guides the system is the MUNI ontology. For other information development initiatives, the specific content of these components would be swapped out for ones appropriate to that initiative.
Overview of the ‘DocWiki’ SystemA number of desires and objectives intersected to guide the design of the ‘DocWiki’ system. We wanted:
- A consolidated knowledge base with complete, turnkey implementation content
- A collaborative document authoring system with authoring tools comfortable to most knowledge workers
- A version control system to enable rollbacks and restoration of prior official versions
- A system that would enable and facilitate the collection and import of relevant content; in our own case, that included widely distributed internal content in many forms and locations plus relevant external content (such as defined items in Wikipedia)
- A document management framework that would allow existing content to be mixed, combined and re-purposed for different uses, from training to marketing collateral
- A single source publishing system that would allow content to be published as paper documents, PDFs, Web pages and the like
- A system that could be easily themed, skinned and branded, tailored for any given deployment or circumstance, and
- A system built entirely from open source components and with content that had no restrictions on use or re-use.
In first formulating this design, our assumption was the major building blocks would be an open source document management system linked with some form of version control. Though we think such a formulation could work OK, our exposure to the MIKE2.0 methodology actually caused us to re-look at and re-think a wiki-based approach. Ultimately the trump card that decided the design for us was familiarity and ease-of-use.
The resulting architecture of the full ‘DocWiki’ system is shown below:
(click for full size)
What is cool about this design is that a single software download install with a few extensions (Mediawiki, the Wikipedia software, plus some standard extensions and judicious use of Semantic Mediawiki) and a single loadable database are all that is required to transfer and install the ‘DocWiki’ system.
To better describe this system, we will focus on three major interconnecting pieces in this architectural diagram: the knowledge base; the vocabulary and structure (ontology); and the authoring and publishing system (wiki).
The ‘DocWiki’ Knowledge BaseThe pre-loaded content for the ‘DocWiki’ system comes from its knowledge base. This is provided as a text-exported MySQL database that can be modified en masse before loading (such as substituting ‘YourName’ for ‘DocWiki’). The exemplar upon which this knowledge base is modeled is the MIKE2.0 framework.
MIKE2.0 (Method for an Integrated Knowledge Environment ) provides a comprehensive methodology that can be applied across a number of different projects within the information management space. MIKE2.0 provides an organized way to describe the why, when, how and who of information management projects. Via standard templates and structures, MIKE2.0 provides a consistent basis to describe and manage these projects, and in a way that helps promote their interoperability and consistency across the enterprise.
MIKE2.0 has a generalized methodology and set of templates applicable to initiatives, the phases, activities and tasks to undertake them, and supporting assets. Supporting assets can range from glossaries and definition of terms and concepts to very specific technical documents or background material. The entire system is logical and applies a consistent design and organizational structure and categories.
For our purposes, we wanted a complete, turnkey content knowledge base. This meant that we needed to accommodate all forms of project management and guidance, ranging from specific “how-to” and technical discussions to the entire suite of background and supporting material. The scope of this knowledge content is defined as what a new person assigned a lead or implementation responsibility would need to read or master.
As a destination site MIKE2.0 is quite broad: it embraces the ability to model virtually any information management initiative. This makes MIKE2.0 an invaluable source of structure and methodology guidance, but also results in it being quite limited in the specific how-tos associated with any given initiative. I have earlier spoken about the structure of MIKE2.0 and in particular its applicability to the semantic enterprise.
The strength of MIKE2.0, however, is that its structure can be grabbed and quickly applied to form an organizational and structural basis for filling out the knowledge base for any specific information development initiative. And, that is exactly what we did with the ‘CIS DocWiki.’
MIKE2.0 hosts and maintains its project-related structure in Mediawiki (with some extensions). Combined with its templates, this provides a rapid-start baseline for beginning to tailor and flesh out the specific details for a given information management initiative. Thus, after copying broad aspects of the MIKE2.0 system into the incipient ‘DocWiki’, it was relatively straightforward to let the existing structure and templates of MIKE2.0 guide next steps.
As of today’s date, the ‘CIS DocWiki’ contains about 300 substantive articles, a complete activity and tasking structure, and various re-usable templates based on Semantic Mediawiki for structured and consistent access and retrieval. New tasks and structure can be readily added to the system. Existing structure or content can be deleted or marked as archive for non-display. We are still gathering all requisite content pieces, and anticipate by first public release that the baseline knowledge base will include 2x to 3x the scale of its current content.
For new ‘CIS DocWiki’ (or Citizen Dan-based) deployments, this means the knowledge base can be completely modified and extended for local circumstances. The set-up of the Mediawiki instance is separate from the loading or modification of the knowledge base, which means the look-and-feel of the entire system, not to mention user rights and permissions, can also be readily tailored for local requirements.
The core content of the ‘CIS DocWiki’ and its basis in a set structure and methodology (derived from MIKE2.0) means that the knowledge base is also adaptable for other broader information development areas, especially in the semantic enterprise or semantic government arenas. Thus, while Structured Dynamics is first releasing the ‘CIS DocWiki’ in the context of Citizen Dan and semantic government, we also are developing a parallel instance for the Open SEAS approach to the semantic enterprise.
The approach taken here is somewhat different than the standard wiki use. As experts, we are basically sole authoring (with contributions from selected collaborators and our clients) the starting basis of the knowledge base. Unlike many wikis, this enables us to be quite consistent in content, style, and organization. Such an approach allows us to present a coherent and complete starting content and methodology foundation. However, once delivered and installed for a given deployment, its users are then free to extend and change this knowledge foundation in the standard wiki manner. Whether those subsequent extensions are free-form or more tightly controlled and managed is the choice of the new deployment’s administrators.
The Supporting MUNI StructureStrictly speaking, the vocabularies and structures (including, of course, ontologies) that drive our semantic government or semantic enterprise offerings are also part of the knowledge base. And, in fact, many of these aspects, especially related to the actual operating of the instances, are included as part of the standard knowledge base.
However, the applicable domain ontology itself is separately maintained. Descriptions of how to use and modify such ontologies are part of the general ‘DocWiki’ knowledge base, but the ontology is not. This arm’s length-separation is done to acknowledge that the ontology has independent use and value apart from the knowledge base or the software (Citizen Dan, in this case) that is the focus of it.
In the Citizen Dan instance, this structure is the MUNI ontology. MUNI is a general local government domain ontology that can find use in a broad array of circumstances, using or not Citizen Dan. Thus, like other ontologies developed and maintained by Structured Dynamics, such as BIBO (the Bibliographic Ontology), the ontology itself and its documentation, discussion forums and use cases are maintained separately.
The first release of MUNI is still under development and will be released this summer.
The Wiki/Publication Portion of ‘DocWiki’The software framework that hosts and manages all of this content is the Mediawiki software, originally developed for Wikipedia. This framework is supported by a number of standard extensions packaged with the ‘DocWiki’ distribution. One of the more notable extensions is Semantic Mediawiki. Mediawiki also is the wiki framework underlying MIKE2.0, so content sharing between the systems is straightforward.
The Collaborative Wiki PortionThe first use of the ‘DocWiki’ is to add new content to the knowledge base and to modify or extend what is provided in the baseline. For straight authoring, ‘DocWiki’ offers the standard wikitext basis for content entry and editing, as well as the WikED enhanced editor and the FCKEditor WYSIWYG rich-text editor. Each of these may be turned on or off at will.
All of the baseline content is fully organized and categorized via a standard structure. Pre-existing templates aid in entering new content in specific areas consistently or in providing standard administrative ways of tagging content for completeness or need for editorial attention. Tasks and concepts, in particular, follow set ways of entry and description. These set templates, some forms-based and some derived from Semantic Mediawiki, are also tied into automatic internal scripts for listing and organizing various items. So long as new material is entered properly, it will be reflected in various stats and listings. Unlike sole reliance on Semantic Mediawiki, the ‘DocWiki’ approach is a mix of standard wiki categories and semantic types. Both are used for effective organization of the knowledge base.
Besides the knowledge base of domain content and “how-to”, the system also comes pre-packaged with many wiki “how-to” and best practices guidance for using the system effectively and consistently. Of course, a given deployment may or may not enforce all of these practices. A poorly administered instance, for example, could degenerate fairly quickly and lose the native structure and organization of the baseline system.
As with standard wikis, there is a history of prior page revisions that gives the system rollback and version control. Mediawiki has a pretty good user access and permissions framework ranging from access, reading, editing and to uploads.
Besides the standard and required extensions, ‘DocWiki’ also comes packaged with the necessary settings and configuration files to operate “out-of-the-box” in its designed baseline mode. Of course, these settings, too, can be changed and modified by site administrators, and ‘DocWiki’ also includes guidance on how to do that.
The Publication PortionA little known but highly useful part of the Mediawiki API allows direct export of XHTML content [2]. Then, with minor XSLT conversion templates, it is possible to strip out wiki-specific conventions (such as the editing of individual sections) or to create straight XML versions. When this is combined with the use of internal ‘DocWiki’ CSS style sheets that impose some clean and semantic style identifiers, a common canonical output basis for content is possible.
From that point, a given deployment may use its own CSS styles to theme output content. Output Web pages (XHTML) or XML files then can be processed using existing and accurate utilities to produce PDF or *.doc documents. Then, with systems such as OpenOffice, an even wider variety of document formats can be produced. These facilities mean that the ‘DocWiki’ can also act as a single-source publishing environment.
In its initial release, re-purposing ‘DocWiki’ content into other presentations (for example, combining sections from multiple pages into a new document as opposed to re-using existing pages as is) will require creating new wiki pages and then cutting-and-pasting the desired content. However, it should also be noted that both DocBook and DITA have been applied to Mediawiki installations [3]. It should be possible to enable a more flexible re-purposing framework for ‘DocWiki’ moving into the future. When Available
The ‘CIS DocWiki’ is meant to accompany the first release of Citizen Dan, likely by the end of summer. The MUNI ontology will also be released roughly at the same time. At release, the ‘CIS DocWiki’ is anticipated to have on the order of 500-800 baseline content and “how to” articles.
Depending on time availability and other commitments, Structured Dynamics will also be using this information to build a semantic government composite offering to MIKE2.0. We will be contributing this new offering for free, similar to what we have done earlier for a semantic enterprise offering.
Subsequent to those events, we will then be modifying the ‘CIS DocWiki’ for the semantic enterprise domain. Much of the necessary content will have already been assembled for the ‘CIS DocWiki’.
Conclusions and ApplicabilityParadoxically, while developing such knowledge bases and systems such as ‘DocWiki’ appears to be extra work, from our standpoint as developers it is useful and efficient. Structured Dynamics already researches and assembles much material and tries to “document as it goes.” Having the ‘DocWiki’ framework not only provides a consistent and coherent way to organize that information, but it also helps to point out essential gaps in our offerings.
The ‘DocWiki’ delivers the methods, documentation and portions of the structure to a total open solution. The ‘DocWiki’ is the primary means — along with software development and accompanying code-level and API documentation, of course — for us to fulfill our mantra that “We’re successful when we are not needed.” As we pointed out in Part 1 of this series, we really think such an attitude is ultimately a self-interested one. The better we can address the acceptance factors in the enterprise for our offerings, the more opportunities we will gain.
We would like to think that other enlightened open source software developers, especially those in the semantic space but certainly not limited to them, will see the wisdom of this four-legged foundation to total open solutions. Up until now, pragmatic guidance for what it takes to create a complete open source offering to businesses and enterprises has been lacking.
The tools, methods, and workflows all exist for making total open solutions real today. All of the pieces are themselves open source. There are many useful guides for best practices across the pipeline. It is just that — prior to this — no one apparently took the time to assemble and articulate them. We think this three-part series and some of the “how to” guidance in the ‘DocWiki’ system can help fix this oversight.
Ultimately, with wider adoption by developers, goaded in part by demands of the marketplace for them, we would hope that additional innovations and ideas may be forthcoming to improve the industry’s ability to offer total open source solutions. Adding just a small bit of attentive effort to how we organize and package what we know is but a small price to pay for greater acceptance and success.
[1] Citizen Dan is an open source system for aggregating different indicator data concerning local, community well-being. Information sources may include the Web, real-time feeds, government datasets, municipal government information systems, or crowdsourced data. Information can range from standard structured data to local narratives, including from minutes and reports, contributed stories, blogs or news outlets. The ‘raw’ input data can come in essentially any format, which is then converted to a standard form with consistent semantics. See current details with screenshots. [2] Clean XHTML can be generated directly from the Mediawiki API. This can be done directly via URL with the action=render command. See for example: http://www.mediawiki.org/wiki/API:Parsing_wikitext. [3] For example, there are a number of paths to migrate from HTML or XHTML to DocBook; see http://wiki.docbook.org/topic/Html2DocBook. But, there is a specific project that also goes directly from Mediawiki; see http://code.google.com/p/gwtwiki/wiki/Mediawiki2Docbook.Categories: Semantic Web
Open-source rails apps to study and learn from
Chris Lowis Blog - Mon, 05/31/2010 - 00:00
If, like me, you learn best by studying other code here's five some open-source rails apps with something to teach. They're a great resource to study and improve your own code, or to use as the starting point for your own applications. All of these applications use Rails 2 but as they are open-source they are a great opportunity to get your feet wet in the Rails 3 world by helping to port them across.
Update: Thanks to the comments here and over on Hacker News, I've added two more applications and made some comments on the non-standard nature of the Loved by Less application code.
GemcutterA site familiar to all Ruby developers is Gemcutter - the recent replacement for the venerable Rubyforge. I wasn't aware that the source code of the site was open source until danieldon pointed it out to me, but you can study the code on github. The Gemcutter source is written in a very modern way featuring examples of using Cucumber for integration testing, a simple Rack middleware example and an interesting routes.rb file showing how to cleanly version an API.
Features- A very modern Rails application featuring the latest best practices.
- Examples of metal and rack middleware.
Spot Us is a website that allows individuals to commission freelance journalists. It has many social-network type features which make the source code interesting reading.
The comprehensive test suite is written with rspec and the ruby-like templating system HAML is used for the views. It's also an example of how to use jQuery in a rails app instead of the default prototype through the use of the jrails plugin.
Features- Haml
- rspec
- jquery with the jrails plugin.
Loved By Less is an open-source social network. I think many Rails freelancers have been asked to create a site with social network features - profiles, 'friending', photo sharing etc. Less Everything were no different and created this open-source platform to help ease the pain of getting started. The source code is available on github. It's unlikely that simply cloning the app will give you everything you need, but it's an amazing resource to read through and take the bits you need for your project.
The loved by less source code is starting to show it's age a bit, and does include some pretty extensive monkey patching which may confuse a newcomer. While I don't think this is bad in itself (it shows the flexibility of Ruby, and also the dangers of not following Rails conventions), it's something to be aware of. I think porting this to Rails 3 would give newcomers to Rails a great resource to learn from.
Features- Search with Thinking Sphinx
- Flickr integration
- Comprehensive email support
- Paperclip for attachments
If you're working on a Software as a Service application, take a look at Saasy . It's a template application designed to do billing and authentication tasks 'so you don't have to'.
Studying the code base, you'll see examples of using SSL security and ActiveMerchant. I was also interested in the accounts model which uses the acts_as_state_machine plugin to simplify some of the logic.
Features- acts_as_state_machine
- ActiveMerchant
Simply Agile by Andrew Bruce is an open source agile software project management application. Think of a simple version of Pivotal Tracker. You can add story cards to a backlog, and then assign each card to a sprint. Each time a card is completed you can drag and drop it to the appropriate column in the Simply Agile task board. Andrew has a very dedicated approach to test-driven development and as a result the application is well covered by cucumber and rspec tests. The javascript in this application is written in an unobtrusive style and is a good example of using jQuery for "progressive enhancement". The app is fully functional in browsers without javascript, but where javascript is available so is the drag-and-drop interface.
Features- Cucumber integration tests
- rspec coverage
- Drag and drop interface powered by jQuery.
Typo is "the oldest and most powerful Ruby on Rails blogware" and is still under active development. While the 15 minute blog is still a fantastic demonstration of how powerful Rails is for rapid application development, anyone who has decided to take the project further and create their own fully-featured blog system will know how much extra work is needed. From comment systems, to anti-spam protection, restful routing to upload support there are many features that need to be added. Thankfully Typo has already implemented these and many other features, and the source code is available to help you learn how to do the same.
Features- rspec for testing
- internationalisation support
- email notification
- admin and public-facing interfaces
Open source rails is a gallery of open-source rails projects. I selected this project for this list because although the code itself isn't well documented or tested, in many ways this is illustrates a great strenght of the Rails framework. It is a good project to study to realise how much can be achieved with a small amount of code and a handful of powerful plugins (not that I'd recommend starting your next Rails project without tests, of course!).
Features- OpenID integration
- Paperclip for file uploads
- A simple set of routes.
I hope you enjoyed this selection of Rails projects. This is far from a definitive list, and I encourage you to read some of the suggestions of other apps in the comments below and on the discussion over on Hacker News. With the upcoming Rails 3 release I think it's a good time to reflect how far the Rails framework has come, and also to remember to cater for people who are learning Rails for the first time - studying open-source code is a great way to learn.
Do you have a favourite open-source Rails project? Share it in the comments below.
Categories: Semantic Web
Listening to the Enterprise: Total Open Solutions, Part 2
AI^3 - Tue, 05/25/2010 - 15:24
Total Open Solutions, Part 2&rft.aulast=Bergman&rft.aufirst=Mike&rft.subject=Adaptive Innovation&rft.subject=MIKE2.0&rft.subject=Open Source&rft.source=AI3:::Adaptive Information&rft.date=2010-05-25&rft.type=blogPost&rft.format=text&rft.identifier=http://www.mkbergman.com/883/listening-to-the-enterprise-total-open-solutions-part-2/&rft.language=English">
The Four Legs to a Stable Open Source SolutionIn the first part to this series, we put forward the argument that incomplete provision of important support factors was limiting the adoption of open source software in the enterprise. We can liken the absence of these factors to having a chair with one or more absent or broken legs.
This second part of the series goes into the four legs of a stable, open source solution. These four legs are software, structure, methods and documentation. When all four are provided, we can term this a total open solution.
These considerations are not simply a matter of idle curiosity. New approaches and new methods are required for enterprises to modernize their IT systems while adding new capabilities and preserving sunk assets. Extending and modernizing existing IT is often not in the self-interests of the original supplying vendors. And enterprises are well aware that IT commitments can extend for decades.
While the benefits and capabilities of open source software become apparent by the day, rates of open source software adoption lag in enterprises. We have seen entire Internet-based businesses arise and get huge in just a few short years. But it is the rare existing enterprise that has committed to and embraced similar Web-oriented architectures and IT strategies [1].
The enterprise IT ecosystem is evolving to become an unhealthy one. New software vendors have generally abandoned enterprises as a market. Much more action takes place with consumer apps and Internet plays, often premised on ad-based revenues or buzz and traffic as attractors for acquisition. Existing middle-tier enterprise vendors are themselves being gobbled up and disappearing. I’m sure all observers would agree that IT software and services are increasingly dominated by a shrinking slate of vendors. I suspect most observers — myself included — would argue that enterprise-based IT innovation is also on the wane.
The argument posed in the first part of this series is that such atrophy should not be unexpected. The current state of open source software is not addressing the realities of enterprise IT needs.
And that is where the other legs of the total open solution come in. In their entirety, they amount to a form of capacity building for the enterprise [2]. It is not simply enough to put forward buzzwords matched with open source software packages. Exciting innovations in social networks, collaboration, semantic enterprise, mobile apps, REST, Web-oriented architectures, information extraction, linked data and a hundred others are being validated on the Internet. But until the full spectrum of success and adoption factors gets addressed, enterprises will not embrace these new innovations as central to their business.
As we describe these four legs to the total open solution, we will sometimes point to our Citizen Dan initiative [3]. That is not because of some universal applicability of the system to the enterprise; indeed Citizen Dan is mostly targeted to local communities and municipalities. But, Citizen Dan does represent the first instance known to us where each of these total open solution success factors is being explicitly recognized and developed. We think the approach has some transferability to the broader enterprise.
Let’s now discuss these four legs in turn.
Leg One: SoftwareOf course, the genesis of this series is grounded in open source software and what it needs to do in order to find broader enterprise acceptance. Clearly that is the first leg amongst the four to be discussed. We also have acknowledged that, generally, best-of-breed open source software is also better documented at the code level, and has documented APIs. We will return to this topic under Leg Four below.
Open source software useful to the enterprise is often a combination of individual open source packages. Some successful vendors of open source to the enterprise in fact began as packagers and documenters of multiple packages. Red Hat for Linux or Alfresco in document management or Pentaho in business intelligence come to mind, as examples.
In the case of Citizen Dan, here are the open source packages presently contained in its offering: Linux (Ubuntu), Apache, MySQL, PHP (these comprising the LAMP stack), Drupal, a variety of third-party Drupal modules, Virtuoso, Solr, ARC2, Smarty, Yahoo UI, TinyMCE, Axiis, Flex, ClearMaps, irON, conStruct, structWSF, and some others. Such combinations of packages are not unusual in open source settings, since new value-add typically comes from extensions to existing systems or unique ways to combine or package them. For example, the installation guide for structWSF alone is quite comprehensive with multiple configuration and test scripts.
Thus, besides direct software, it is also critical that configuration, settings, installation guidance and the like be addressed to enable relatively straightforward set-up. This is an area of frequent weakness. Targeting it directly is a not-so-secret factor for how some vendors have begun to achieve some success with the enterprise market.
Leg Two: StructureAll software works on data. While some data is unstructured (such as plain text) and some is semi-structured (such as HTML or Web pages that mixes markup with text), the objective of information extraction or natural language processing is to extract the “structure” from such sources. Once extracted, such structure can interoperate on a common footing with the structured data common to standard databases.
Thus, we use “structure” to denote the concepts and their relationships (the “schema” or “ontology”) and the indicators and data (attributes and values) to describe them, and the “entities” (distinct individuals or nameable instances) that populate them. In other words, “structure” refers to all of the schema (concepts + relationships) + data + attributes + indicators + records that make up the information upon which software can operate.
Structure exists in many forms and serializations. Generally, software represents its internal information in one or a few canonical storage and manipulation formats, though that same software may also be able to import (ingest) or export its information and data in many different external formats.
In our semantic enterprise work, especially with its premise in ontology-driven applications using adaptive ontologies, structure is an absolutely essential construct. But, frankly, no information technology system exists that does not also depend on structure to a more or less greater extent.
The interplay between software and structure is one source of expertise that vendors guard closely and use to competitive advantage. In years past, proprietary software could partially hide the bases for performance or algorithmic advantages. Expert knowledge and intimate familiarity with these systems was the other bases to keep these advantages closely held.
It is perhaps not too surprising given this history, then, that the software industry really has very little emphasis or discussion on the interaction between software and structure. But, if software is being brought in as open source, where is the accompanying expertise or guidance for how data structure can be used to gain full advantage? The same acquired knowledge that, say, accompanied the growth of relational databases in such areas as schema development, materialized views or (de)normalization now needs to be made explicit and exposed for all sorts of open source systems.
In the realm of the semantic enterprise we are seeing attempts at this via open source ontologies and greater emphasis on APIs and documentation of same. Citizen Dan, for example, will be first publicly released with an accompanying MUNI ontology as a reference schema and starting point. Descriptions and methods for how to obtain indicator data and relevant attribute and entity information for the domain will also accompany it.
As open source software continues to emphasize semantics and interoperability, exemplar structures and best practices will need to be an essential part of the technology transfer. Just as the “secrets” of much software began to be opened up via open source, so too must the locked-up expertise of experts and practitioners in how to effectively structure data be exposed.
Leg Three: MethodsThe need for structure explication and guidance is but one unique slice of a much broader need to expose methods and best practices surrounding a given information management initiative. The reason that any open source software might be adopted in the first place is based on the hope for some improved information management process.
Recently I have been touting MIKE2.0, the first open source, replicable and extensible framework for organizing and managing information in the enterprise. MIKE2.0 (Method for an Integrated Knowledge Environment ) provides a comprehensive methodology that can be applied across a number of different projects within the information management space. It can be applied to any type of information development.
MIKE2.0 provides an organized way to describe the why, when, how and who of information management projects. Via standard templates and structures, MIKE2.0 provides a consistent basis to describe and manage these projects, and in a way that helps promote their interoperability and consistency across the enterprise.
MIKE2.0 and its forthcoming extensions, one of which we have developed for the semantic enterprise and are now extending into the semantic government in the context of Citizen Dan, are exciting because they provide a systematic approach and guidance for how (and for what!) to document new projects and initiatives. What MIKE2.0 represents is the first time that the embedded, proprietary expertise of traditional IT consultants has been exposed for broader use and extension.
The real premise behind any approach like MIKE2.0 or variants is to codify the expertise and knowledge that was previously locked up by experts and practitioners. The framework in MIKE2.0 provides a structure by which knowledge bases of background information can be assembled to accompany an open source project. This structure extends from initial evaluation and design all the way through operation and end of life.
The ‘CIS DocWiki’ that is being developed to accompany Citizen Dan is such an example of a MIKE2.0-informed knowledge base. At present, the CIS DocWiki has more than 300 specific articles useful to community indicator systems for local governments, and a complete deployment and maintenance methodology. By public release, it will likely be 2-3 times that size. All of this will be downloadable and installable as a wiki, and as open source content, ready for branding and modification for any local circumstance. CIS DocWiki is a natural methods and documentation complement to the Citizen Dan software and its MUNI structure. Release is scheduled for summer.
As we will focus on in Part 3 of this series, we are combining a MIKE2.0 organizational approach with a documentation and single-source publication platform to fulfill the method and documentary aspects of projects. It was really through the advantages gained by the combination of these pieces that we began to see the inadequacy of many current open source projects for the enterprise.
Leg Four: DocumentationThis series began in part with a recognition that superior open source projects are often the better documented ones. But, even there, documentation is often restricted to code-level documentation or perhaps APIs.
As the material above suggests, documentation needs to extend well beyond software. We need documentation of structure, methods, best practices, use cases, background information, deployment and management, and changing needs over the lifetime of the system. And, as we have also seen in Part 1, the lifetime of that system might be measured in decades.
Documentation is no equal to paid partners and their expertise. But, documentation can be cheaper, and if that documentation is sufficient, might be a means for changing the equation in how IT projects are solicited, acquired and managed.
Today, enterprises appear to be stuck between two difficult choices: 1) the traditional vendor lock-in approach with high costs and low innovation; or 2) open source with minimal documentation and vendor knowledge and little assurance of support longevity.
These trade-offs look pretty unpalatable.
Documentation alone, even as extended into the other legs of the solution, is not prima facie going to be a deal maker. But, its absence, I submit, is a deal breaker. Just as open source itself has taken some years to build basic comfort in the enterprise, so too a concerted attack on all acceptance factors may be necessary before actual wide adoption occurs.
The ‘CIS DocWiki’ platform noted for Citizen Dan we hope will be an exemplar for this combination of documentation and methodology. It is a single-source publishing platform that allows the entire knowledge base behind a given IT initiative to be used for collaboration, operational, training or collateral purposes. And all of this is based on open source software.
Software vendors need to recognize these documentation factors and build their ventures for success. Yes, writing code and producing software is a lot more fun and rewarding than (yeech) documentation. But, unless our current generation of vendors that is committed to open source and its benefits takes its markets seriously — and thus commits to the serious efforts these markets demand — we will continue to see minimal uptake of open source in the enterprise.
An Interacting Whole Greater than the Sum of its PartsEach of these four legs of a total open solution can interact with and reinforce the other parts. Once one begins to see the problem of open source adoption in the enterprise as a holistic one, a new systems-level perspective emerges.
Enterprises know full well that software is only one means to address an information management problem, and only a first step at that. Traditional vendors to the enterprise also understand this, which is why through their embedded systems and built-up expertise they have been able to perpetuate what often amounts to a monopoly position.
Pressures are building for a earthquake in the IT landscape. Enterprises are on an anvil of global competition and limited resources. Existing IT systems are not up to the task but too expensive and embedded to abandon. Traditional vendors have near monopoly positions and little incentive to innovate. New software vendors don’t have the expertise and gravitas to handle enterprise-scale challenges. Meanwhile, the rest of the globe is leapfrogging embedded systems with agile, Web-based systems.
The true innovation that is occurring is all based around open source, nurtured by the global computing platform of the Internet, and fueled by countless individuals able to compete on downward-spiraling cost bases. But on so many levels, open source as presently constituted, either fails or poses too many risks to the commercial enterprise.
The Internet itself was the basis of a paradigm shift, but I think we are only now seeing its manifestation at the enterprise level. We are also now seeing global reordering and changes of the economic order. How will companies respond? How will their IT systems adapt? And what will new vendors need to do and recognize in order to thrive in this changing environment?
I’m not sure I have found the language or rhetoric to convey what I see coming, and coming soon. I know open source is part of it; I know enterprises need it; and I know what is presently being offered does not meet the test.
As I noted in our first part, the mantra that we use in Structured Dynamics to express this challenge is, “We’re Successful When We’re Not Needed“. I think the essence behind this statement is that premises of dependency or proprietary advantage will not survive the jet streams of change that are blowing away the old order.
Sound like too much hyperbole? Actually, my own gut feeling is that it is not nearly enough.
In any case, windy rhetoric always falls short if there is not some actionable next steps. In these first two parts of this series, I have tried to present the ingredients that need to go into the cake. In the third part I try to offer a new, and complementary, open source means for bringing stability to the foundation.
In all cases, though, I think these challenges are permanent ones and do not lend themselves to facile solutions. Four legs, or seven foundations, or twelve steps are all just too simplistic for dealing with the global and complex tsunamis blowing away the old order.
One really does not need to lick a finger to sense the direction of these winds of change. It is coming, and coming hard, and all of it is from the direction of open source. What enterprises do, and what the vendors who want to serve them do, is perhaps less clear. I think open source offers a way out of the box in which enterprise IT is currently stuck. But, at present, I also think that most open source options do not have the necessary legs to stand on.
[1] One notable exception to this are the consumer-facing aspects of some businesses, such as automobiles or personal care or fashion products. These businesses are leading the way into some of the “build your own” or “design your own” uses of modern Web technology. [2] In the 1970s the major term for this approach was “technology transfer.” [3] Citizen Dan is an open source system for aggregating different indicator data concerning local, community well-being. Information sources may include the Web, real-time feeds, government datasets, municipal government information systems, or crowdsourced data. Information can range from standard structured data to local narratives, including from minutes and reports, contributed stories, blogs or news outlets. The ‘raw’ input data can come in essentially any format, which is then converted to a standard form with consistent semantics. See current details with screenshots.Categories: Semantic Web
SD Gets New Logo, Look
AI^3 - Thu, 05/20/2010 - 21:03
Growth Demanded a Professional Upgrade
Structured Dynamics today updated its image with a new logo and new color schemes on its Web pages and collateral. Other upgrades to various SD product logos and other adjustments are also being made.
Fred Giasson and I formed the company rapidly back in November 2008. We had other fish to fry, namely starting work with customers coming out of the gate, and (literally) grabbed a toss-off logo that had been laying in the drawer to start the company. That worked well in the early days, but we increasingly felt our image looked tired and distinctly “non-dynamic.”
So we commissioned a competition a few weeks back and left the next steps to the professionals. The winning design is shown above. We had many good options to choose from, and we will be working with some of the other finalist designers for some of our other product designs. The first in that series is the Citizen Dan logo.
So, with growth and presence it feels good to now have a professional look as well. We’re proud that we continue to be able to fully self-fund the company and look to walk arm-in-arm with this logo for quite some time to come!
Categories: Semantic Web
Listening to the Enterprise: Total Open Solutions, Part 1
AI^3 - Wed, 05/12/2010 - 15:22
Total Open Solutions, Part 1&rft.aulast=Bergman&rft.aufirst=Mike&rft.subject=Adaptive Innovation&rft.subject=Open Source&rft.subject=Software Development&rft.subject=Software and Venture Capital&rft.source=AI3:::Adaptive Information&rft.date=2010-05-12&rft.type=blogPost&rft.format=text&rft.identifier=http://www.mkbergman.com/882/listening-to-the-enterprise-total-open-solutions-part-1/&rft.language=English">
“We’re Successful When We’re Not Needed”Structured Dynamics has been engaged in open source software development for some time. Inevitably in each of our engagements we are asked about the viability of open source software, its longevity, and what the business model is behind it. Of course, I appreciate our customers seemingly asking about how we are doing and how successful we are. But I suspect there is more behind this questioning than simply good will for our prospects.
Besides the general facts that most of us know — of hundreds of thousands of open source projects only a miniscule number get traction — I think there are broader undercurrents in these questions. Even with open source, and even with good code documentation, that is not enough to ensure long-term success.
When open source broke on the scene a decade or so ago [1], the first enterprise concerns were based around code quality and possible “enterprise-level” risks: security, scalability, and the fact that much open source was itself LAMP-based. As comfort grew about major open source foundations — Linux, MySQL, Apache, the scripting languages of PHP, Perl and Python (that is the very building blocks of the LAMP stack) — concerns shifted to licensing and the possible “viral” effects of some licenses to compromise existing proprietary systems.
Today, of course, we see hugely successful open source projects in all conceivable venues. Granted, most open source projects get very little traction. Only a few standouts from the hundreds of thousands of open source projects on big venues like SourceForge and Google Code or their smaller brethren are used or known. But, still, in virtually every domain or application area, there are 2-3 standouts that get the lion’s share of attention, downloads and use.
I think it fair to argue that well-documented open source code generally out-competes poorly documented code. In most circumstances, well-documented open source is a contributor to the virtuous circle of community input and effort. Indeed, it is a truism that most open source projects have very few code committers. If there is a big community, it is largely devoted to documentation and assistance to newbies on various forums.
We see some successful open source projects, many paradoxically backed by venture capital, that employ the “package and document” strategy. Here, existing open source pieces are cobbled together as more easily installed comprehensive applications with closer to professional grade documentation and support. Examples like Alfresco or Pentaho come to mind. A related strategy is the “keystone” one where platform players such as Drupal, WordPress, Joomla or the like offer plug-in architectures and established user bases to attract legions of third-party developers [2].
OK, So What Has This to Do with the Enterprise?I think if we stand back and look at this trajectory we can see where it is pointing. And, where it is pointing also helps define what the success factors for open source may be moving forward.
Two decades ago most large software vendors made on average 75% to 80% of their revenues from software licences and maintenance fees; quite the opposite is true today [3]. The successful vendors have moved into consulting and services. One only needs look to three of the largest providers of enterprise software of the past two decades — IBM, Oracle and HP — to see evidence of this trend.
How is it that proprietary software with its 15% to 20% or more annual maintenance fees has been so smoothly and profitably replaced with services?
These suppliers are experienced hands in the enterprise and know what any seasoned IT manager knows: the total lifecycle costs of software and IT reside in maintenance, training, uptime and adaptation. Once installed and deployed, these systems assume a life of their own, with actual use lifetimes that can approach two to three decades.
This reality is, in part, behind my standard exhortation about respecting and leveraging existing IT assets, and why Structured Dynamics has such a commitment to semantic technology deployment in the enterprise that is layered onto existing systems. But, this very same truism can also bring insight into the acceptable (or not) factors facing open source.
Great code — even if well documented — is not alone the mousetrap that leads the world to the door. Listen to the enterprise: lifecycle costs and longevity of use are facts.
But what I am saying here is not really all that earthshaking. These truths are available to anyone with some experience. What is possibly galling to enterprises is two smug positions of new market entrants. The first, which is really naïve, is the moral superiority of open source or open data or any such silly artificial distinctions. That might work in the halls of academia, but carries no water with the enterprise. The second, more cynically based, is to wrap one’s business in the patina of open source while engaging in the “wink-wink” knowledge that only the developer of that open source is in a position to offer longer term support.
Enterprises are not stupid and understand this. So, what IT manager or CIO is going to bet their future software infrastructure on a start-up with immature code, generally poor code documentation or APIs, and definitely no clear clue about their business?
The Slow SqueezeYet, that being said, neither enterprises nor vendors nor software innovators that want to work with them can escape the inexorable force of open source. While it has many guises from cloud computing to social software or software as a service or a hundred other terms, the slow squeeze is happening. Big vendors know this; that is why there has been the rush to services. Start-up vendors see this; that is why most have gone consumer apps and ad-based revenue models. And enterprises know this, which is why most are doing nothing other than treading water because the way out of the squeeze is not apparent.
The purpose of this three-part series is to look at these issues from many angles. What might the absolute pervasiveness of open source mean to traditional IT functions? How can strategic and meaningful change be effected via these new IT realities in the enterprise? And, how can software developers and vendors desirous of engaging in large-scale initiatives with enterprises find meaningful business models?
Lead-in to the Series: a Total Open SolutionAnd, after we answer those questions, we will rest for a day.
But, no, seriously, these are serious questions.
There is no doubt open source is here to stay, yet its maturity demands new thinking and perspectives. Just as enterprises have known that software is only the beginning of decades-long IT commitments and (sometimes) headaches, the purveyors and users of open source should recognize the acceptance factors facing broad enterprise adoption and reliance.
Open source offers the wonderful prospect of avoiding vendor “lock-in”. But, if the full spectrum of software use and adoption is also not so covered, all we have done is to unlock the initial selection and install of the software. Where do we turn for modifications? for updates? for integration with other packages? for ongoing training and maintenance? And, whatever we do, have we done so by making bets on some ephemeral start-up? (We know how IBM will answer that question.)
The first generation of open source has been a substitute for upfront proprietary licenses. After that, support has been a roll of the dice. Sure, broadly accepted open source software provides some solace because of more players and more attention, but how does this square with the prospect of decades of need?
The perverse reality in these questions is that most all early open source vendors are being gobbled up or co-opted by the existing big vendors. The reward of successful market entry is often a great sucking sound to perpetuate existing concentrations of market presence. In the end, how are enterprises benefiting?
Now, on the face of it, I think it neither positive nor negative whether an early open source firm with some initial traction is gobbled up by a big player or not. After all, small fish tend to be eaten by big fish.
But two real questions arise in my mind: One, how does this gobbling fix the current dysfunction of enterprise IT? And, two, what is a poor new open source vendor to do?
The answer to these questions resides in the concerns and anxieties that caused them to be raised in the first place. Enterprises don’t like “lock-in” but like even less seeing stranded investments. For open source to be successful it needs to adopt a strategy that actively extends its traditional basis in open code. It needs to embrace complete documentation, provision of the methods and systems necessary for independent maintenance, and total lifecycle commitments. In short, open source needs to transition from code to systems.
We call this approach the total open solution. It involves — in addition to the software, of course — recipes, methods, and complete documentation useful for full-life deployments. So, vendors, do you want to be an enterprise player with open source? Then, embrace the full spectrum of realities that face the enterprise.
“We’re Successful When We’re Not Needed”The actual mantra that we use to express this challenge is, “We’re Successful When We’re Not Needed“. This simple mental image helps define gaps and tells us what we need to do moving forward.
The basic premise is that any taint of lock-in or not being attentive to the enterprise customer is a potential point of failure. If we can see and avoid those points and put in place systems or whatever to overcome them, then we have increased comfort in our open source offerings.
Like good open source software, this is ultimately a self-interest position to take. If we can increase comfort in the marketplace that they can adopt and sustain our efforts without us, they will adopt them to a greater degree. And, once adopted, and when extensions or new capabilities are needed, then as initial developers with a complete grasp on the entire lifecycle challenges we become a natural possible hire. Granted, that hiring is by no means guaranteed. In fact, we benefit when there are many able players available.
In the remaining two parts of this series we will discuss all of the components that make up a total open solution and present a collaboration platform for delivering the methods and documentation portions. We’re pretty sure we don’t yet have it fully right. But, we’re also pretty sure we don’t have it wrong.
[1] Of course, stalwart open source applications such as Linux and MySQL and even the open source movement extend back about twenty years. But, it was only about a decade ago that real traction and visibility in the enterprise began. [2] BTW, with regard to the latter, I think it notable that no semantic technology player has played or attracted third parties to any notable extent. That is possibly a topic for a later blog post! [3] I first wrote about this five years ago (and updated it a year later), with analysis of many public vendors. See M.K. Bergman, Redux: Enterprise Software Licensing on Life Support, June 2, 2006.Categories: Semantic Web
Two Presentations at SemTech 2010
AI^3 - Mon, 05/10/2010 - 05:06
We’re Speaking on Rich Interfaces and MIKE2.0
As I reported about a year ago after my first attendance, I think the Semantic Technology Conference is the best venue going for pragmatic discussion of semantic approaches in the enterprise. I’m really pleased that I will be attending again this year. The conference (#SemTech) will be held at the Hilton Union Square in downtown San Francisco on June 21-25, 2010. Now in its sixth year and the largest of its kind, it is again projected to attract 1500 attendees or so.
I will be presenting two papers this year, covering rather dramatically different topics. Such is the business of a young company like Structured Dynamics that wears many hats!
Rich User Interfaces for Semantic TechnologiesA really exciting presentation for us is, Sizzle for the Steak: Rich, Visual Interfaces for Ontology-driven Apps, on Wed, June 23 in the 2:00 PM – 3:00 PM session.
A nagging gap in the semantic technology stack is acceptable — better still, compelling — user experiences. After our exile for a couple of years doing essential infrastructure work, we have been unshackled over the past year or so to innovate on user interfaces for semantic technologies.
Our unique approach uses adaptive ontologies to drive rich Internet applications (RIAs) through what we call “semantic components.” This framework is unbelievably flexible and powerful and can seamlessly interact with our structWSF Web services framework and its conStruct Drupal implementations.
We will be showing these rich user interfaces for the first time in this session. We will show concept explorers, “slicer-and-dicers”, dashboards, information extraction and annotation, mapping, data visualization and ontology management. Get your visualization anyway you’d like, and for any slice you’d like!
While we will focus on the sizzle and demos, we will also explain a bit of the technology that is working behind Oz’s curtain.
Methods for Becoming a Semantic EnterpriseA more informal, interactive F2F discussion will be, MIKE2.0 for the Semantic Enterprise, on Thurs, June 24 in the 4:45 PM – 5:45 PM slot.
MIKE2.0 (Method for an Integrated Knowledge Environment) is an open source methodology for enterprise information management that is coupled with a collaborative framework for information development. It is oriented around a variety of solution “offerings”, ranging from the comprehensive and the composite to specific practices and technologies. A couple of months back, I gave an overview of MIKE2.0 that was pretty popular.
We have been instrumental in adding a semantic enterprise component to MIKE2.0, with our specific version of it called Open SEAS. In this Face-to-Face session, experts and desirous practitioners will join together to discuss how to effectively leverage this framework. While I will intro and facilitate, expect many other MIKE2.0 aficionados to participate.
This is perhaps a new concept to many, but what is exciting about MIKE2.0 is that it provides a methodology and documentation complement to technology alone. When combined with that technology, all pieces comprise what might be called a total open solution. I personally think it is the next logical step beyond open source.
Hope to See You There!So, if you have not already made plans, consider adjusting your schedule today. And, contact me in advance (mike at structureddynamics dot com) if you’ll be there. We’d love to chat!
Categories: Semantic Web
The Bipolar Disorder of Linked Data
AI^3 - Thu, 04/29/2010 - 00:12
An Acceptance of Its Natural Role is the Prozac Substitute
There has been a bit of a manic-depressive character on the Web waves of late with respect to linked data. On the one hand, we have seen huzzahs and celebrations from the likes of ReadWriteWeb and Semantic Web.com and, just concluded, the Linked Data on the Web (LDOW) workshop at WWW2010. This treatment has tended to tout the coming of the linked data era and to seek ideas about possible, cool linked data apps [1]. This rise in visibility has been accomplished by much manic and excited discussion on various mailing lists.
On the other hand, we have seen much wringing of hands and gnashing of teeth for why linked data is not being used more and why the broader issue of the semantic Web is not seeing more uptake. This depressive “call to arms” has sometimes felt like ravings with blame being given to the poor state of apps and user interfaces to badly linked data to the difficulty of publishing same. Actually using linked data for anything productive (other than single sources like DBpedia) still appears to be an issue.
Meanwhile, among others, Kingsley Idehen, ubiquitous voice on the Twitter #linkeddata channel, has been promoting the separation of identity of linked data from the notion of the semantic Web. He is also trying to change the narrative away from the association of linked data with RDF, instead advocating “Data 3.0″ and the entity-attribute-value (EAV) model understanding of structured data.
As someone less engaged in these topics since my own statements about linked data over the past couple of years [2], I have my own distanced-yet-still-biased view of what all of this crisis of confidence is about. I think I have a diagnosis for what may be causing this bipolar disorder of linked data [3].
The Semantic Web Boogie ManA fairly universal response from enterprise prospects when raising the topic of the semantic Web is, “That was a big deal of about a decade ago, wasn’t it? It didn’t seem to go anywhere.” And, actually, I think both proponents and keen observers agree with this general sentiment. We have seen the original advocate, Tim Berners-Lee, float the Giant Global Graph balloon, and now Linked Data. Others have touted Web 3.0 or Web of Data or, frankly, dozens of alternatives. Linked data, which began as a set of techniques for publishing RDF, has emerged as a potential marketing hook and saviour for the tainted original semantic Web term.
And therein, I think, lies the rub and the answer to the bipolar disorder.
If one looks at the original principles for putting linked data on the Web or subsequent interpretations, it is clear that linked data (lower case) is merely a set of techniques. Useful techniques, for sure; but really a simple approach to exposing data using the Web with URLs as the naming convention for objects and their relationships. These techniques provide (1) methods to access data on the Web and (2) specifying the relationships to link the data (resources). The first part is mechanistic and not really of further concern here. And, while any predicate can be used to specify a data (resource) relationship, that relationship should also be discoverable with a URL (dereferencable) to qualify as linked data. Then, to actually be semantically useful, that relationship (predicate) should also have a precise definition and be part of a coherent schema. (Note, this last sentence is actually not part of the “standard” principles for linked data, which itself is a problem.)
When used right, these techniques can be powerful and useful. But, poor choices or execution in how relationships are specified often leads to saying little or nothing about semantics. Most linked data uses a woefully small vocabulary of data relationships, with even a smaller set ever used for setting linkages across existing linked data sets [4]. Linked data techniques are a part of the foundation to overall best practices, but not the total foundation. As I have argued for some time, linked data alone does not speak to issues of context nor coherence.
To speak semantically, linked data is not a synonym for the semantic Web nor is it the sameAs the semantic Web. But, many proponents have tried to characterize it as such. The general tenor is to blow the horns hard anytime some large data set is “exposed” as linked data. (No matter whether the data is incoherent, lacks a schema, or is even poorly described and defined.) Heralding such events, followed by no apparent usefulness to the data, causes confusion to reign supreme and disappointment to naturally occur.
The semantic Web (or semantic enterprise or semantic government or similar expressions) is a vision and an ideal. It is also a fairly complete one that potentially embraces machines and agents working in the background to serve us and make us more productive. There is an entire stack of languages and techniques and methods that enable schema to be described and non-conforming data to be interoperated. Now, of course this ideal is still a work in progress. Does that make it a failure?
Well, maybe so, if one sees the semantic Web as marketing or branding. But, who said we had to present it or understand it as such?
The issue is not one of marketing and branding, but the lack of benefits. Now, maybe I have it all wrong, but it seems to me that the argument needs to start with what “linked data” and the “semantic Web” can do for me. What I actually call it is secondary. Rejecting the branding of the semantic Web for linked data or Web 3.0 or any other somesuch is still dressing the emperor in new clothes.
A Nicely Progressing Continuum, Thank You!For a couple of years now I have tried in various posts to present linked data in a broader framework of structured and semantic Web data. I first tried to capture this continuum in a diagram from July 2007:
Document Web Structured Web Semantic Web Linked Data- Document-centric
- Document resources
- Unstructured data and semi-structured data
- HTML
- URL-centric
- circa 1993
- Data-centric
- Structured data
- Semi-structured data and structured data
- XML, JSON, RDF, etc
- URI-centric
- circa 2003
- Data-centric
- Linked data
- Semi-structured data and structured data
- RDF, RDF-S
- URI-centric
- circa 2006
- Data-centric
- Linked data
- Semi-structured data and structured data
- RDF, RDF-S, OWL
- URI-centric
- circa ???
Now, three years later, I think the transitional phase of linked data is reaching an end. OK, we have figured out one useful way to publish large datasets staged for possible interoperability. Sure, we have billions of triples and assertions floating out there. But what are we to do with them? And, is any of it any good?
The Reality of a Heterogeneous WorldI think Kingsley is right in one sense to point to EAV and structured data. We, too, have not met a structured data format we did not like. There are hundreds of attribute-value pair models of even more generic nature that also belong to the conversation.
One of my most popular posts on this blog has been, ‘Structs’: Naïve Data Formats and the ABox, from January 2009. Today, we have a multitude of popular structured data formats from XML to JSON and even spreadsheets (CSV). Each form has its advocates, place and reasons for existence and popularity (or not). This inherent diversity is a fact and fixture of any discussion of data. It is a major reason why we developed the irON (instance record and object notation) non-RDF vocabulary to provide a bridge from such forms to RDF, which is accessible on the Web via URIs. irON clearly shows that entities can be usefully described and consumed in either RDF or non-RDF serialized forms.
Though RDF and linked data is a great form for expressing this structured information, other forms can convey the same meaning as well. Of the billions of linked data triples exposed to date, surely more than 99% are of this instance-level, “ABox” type of data [5]. And, more telling, of all of the structured data that is publicly obtainable on the Web, my wild guess is that less than 0.0000000001% of that is even linked RDF data [6].
Neither linked data nor RDF alone will — today or in the near future — play a pivotal or essential role for instance data. The real contribution from RDF and the semantic Web will come from connecting things together, from interoperation and federation and conjoining. This is the provenance of the TBox and is a role barely touched by linked data. Publishing data as linked data helps tremendously in simplifying ingest and guiding the eventual connections, but the making of those connections, testing for their quality and reliability, are steps beyond the linked data ken or purpose.
Promoting Linked Data to its Level of IncompetenceIt seems, then, that we see two different forces and perspectives at work, each contributing in its own way to today’s bipolar nature of linked data.
On the manic side, we see the celebration for the release of each large, linked data set. This perspective seems to care most about volumes and numbers, with less interest in how and whether the data is of quality or useful. This perspective seems to believe “post the data, and the public will come.” This same perspective is also quite parochial with respect to the unsuitability of non-linked data, be it microdata, microformats or any of the older junk.
On the depressed side, linked data has been seen as a more palatable packaging for the disappointments and perceived failures or slow adoption of the earlier semantic Web phrasing. When this perspective sees the lack of structure, defensible connections and other quality problems with linked data as it presently exists, despair and frustration ensue.
But both of these perspectives very much miss the mark. Linked data will never become the universal technique for publishing structured data, and should not be expected to be such. Numbers are never a substitute for quality. And linked data lacks the standards, scope and investment made in the semantic Web to date. Be patient; don’t despair; structured data and the growth of semantics and useful metadata is proceeding just fine.
Unrealistic expectations or wrong roles and metrics simply confuse the public. We are fortunate that most potential buyers do not frequent the community’s various mailing lists. Reduced expectations and an understanding of linked data’s natural role is perhaps the best way to bring back balance.
Linked Data’s Natural RoleWe have consciously moved our communications focus from speaking internally to the community to reaching out to the broader enterprise public. There is much of education, clarification and dialog that is now needed with the buying public. The time has moved past software demos and toys to workable, pragmatic platforms, and the methodologies and documentation necessary to support them. This particular missive speaking to the founding community is (perhaps many will Hurray!) likely to become even more rare as we continue to focus outward.
As Structured Dynamics has stated many times, we are committed to linked data, presenting our information as such, and providing better tools for producing and consuming it. We have made it one of the seven foundations to our technology stack and methodology.
But, linked data on its own is inadequate as an interoperability standard. Many practitioners don’t publish it right, characterize it right, or link to it right. That does not negate its benefits, but it does make it a poor candidate to install on the semantic Web throne.
Linked data based on RDF is perhaps the first citizen amongst all structured data citizens. It is an expressive and readily consumed means for publishing and relating structured instance data and one that can be easily interoperated. It is a natural citizen of the Web.
If we can accept and communicate linked data for these strengths, for what it naturally is — a useful set of techniques and best practices for enabling data that can be easily consumed — we can rest easy at night and not go crazy. Otherwise, bring on the Prozac.
[1] Actually, in my opinion, the suggested listing of apps from these discussions is distinctly unimpressive and not compelling. As argued in the main body of the post, I think this is because linked data is really just a technique or best practice, and not a basis alone for enabling compelling apps. As initial developers of such apps as the UMBEL concept explorer or Dataviewer, Structured Dynamics understands the use of linked data and has a defensible basis to comment on applications. Our own applications intimately integrate linked data, but only as one of seven foundations. [2] Here are some of my relevant posts over the past year discussing the role of linked data: Moving Beyond Linked Data (Sept. 20, 2009); Fresh Perspectives on the Semantic Enterprise (Sept. 28, 2009); The Law of Linked Data (Oct. 11, 2009); When Linked Data Rules Fail (Nov. 16, 2009). [3] The current bipolar discussion reminds me of the “Six Phases of a Project,” a copy of which has been a permanent fixture on my office wall:- Enthusiasm
- Disillusionment
- Panic
- Search for the guilty
- Punishment of the innocent
- Honors & praise for the non-participants.
Categories: Semantic Web
Brown Bag Lunch: Historical Origins of the Knowledge Economy
AI^3 - Fri, 04/23/2010 - 07:48
A Reprise AI3 Post from Four Years Ago
In 2002 Joel Mokyr, an economic historian from Northwestern University, wrote a book that should be read by anyone interested in knowledge and its role in economic growth. The Gifts of Athena : Historical Origins of the Knowledge Economy is a sweeping and comprehensive account of the period from 1760 (in what Mokyr calls the “Industrial Enlightenment”) through the Industrial Revolution beginning roughly in 1820 and then continuing through the end of the 19th century.
The book (and related expansions by Mokyr available as separate PDFs on the Internet) should be considered as the definitive reference on this topic to date. The book contains 40 pages of references to all of the leading papers and writers on diverse technologies from mining to manufacturing to health and the household. The scope of subject coverage, granted mostly focused on western Europe and America, is truly impressive.
Mokyr deals with ‘useful knowledge,’ as he acknowledges Simon Kuznets‘ phrase. Mokyr argues that the growth of recent centuries was driven by the accumulation of knowledge and the declining costs of access to it. Mokyr helps to break past logjams that have attempted to link single factors such as the growth in science or the growth in certain technologies (such as the steam engine or electricity) as the key drivers of the massive increases in economic growth that coincided with the era now known as the Industrial Revolution.
Mokyr cracks some of these prior impasses by picking up on ideas first articulated through Michael Polanyi’s “tacit knowing” (among other recent philosophers interested in the nature and definition of knowledge). Mokyr’s own schema posits propositional knowledge, which he defines as the science, beliefs or the epistemic base of knowledge, which he labels omega (Ω), in combination with prescriptive knowledge, which are the techniques (”recipes”), and which he also labels lambda (λ). Mokyr notes that an addition to omega (Ω) is a discovery, an addition to lambda (λ) is an invention.
One of Mokyr’s key points is that both knowledge types reinforce one another and, of course, the Industrial Revolution was a period of unprecedented growth in such knowledge. Another key point, easily overlooked when “discoveries” are seemingly more noteworthy, is that techniques and practical applications of knowledge can provide a multiplier effect and are equivalently important. For example, in addition to his main case studies of the factory, health and the household, he says:
The inventions of writing, paper, and printing not only greatly reduced access costs but also materially
affected human cognition, including the way people thought about their environment.
Mokyr also correctly notes how the accumulation of knowledge in science and the epistemic base promotes productivity and more still-more efficient discovery mechanisms:
The range of experimentation possibilities that needs to be searched over is far larger if the searcher knows nothing about the natural principles at work. To paraphrase Pasteur’s famous aphorism once more, fortune may sometimes favor unprepared minds, but only for a short while. It is in this respect that the width of the epistemic base makes the big difference.
In my own opinion, I think Mokyr starts to get closer to the mark when he discusses knowledge “storage”, access costs and multiplier effects from basic knowledge-based technologies or techniques. Like some other recent writers, he also tries to find analogies with evolutionary biology. For example:
Much like DNA, useful knowledge does not exist by itself; it has to be “carried” by people or in storage
devices. Unlike DNA, however, carriers can acquire and shed knowledge so that the selection process is quite different. This difference raises the question of how it is transmitted over time, and whether it can actually shrink as well as expand.
One of the real advantages of this book is to move forward a re-think of the “great man” or “great event” approach to history. There are indeed complicated forces at work. I think Mokyr summarizes well this transition when he states:
A century ago, historians of technology felt that individual inventors were the main actors that brought about
the Industrial Revolution. Such heroic interpretations were discarded in favor of views that emphasized deeper economic and social factors such as institutions, incentives, demand, and factor prices. It seems, however, that the crucial elements were neither brilliant individuals nor the impersonal forces governing the masses, but a small group of at most a few thousand peopled who formed a creative community based on the exchange of knowledge. Engineers, mechanics, chemists, physicians, and natural philosophers formed circles in which access to knowledge was the primary objective. Paired with the appreciation that such knowledge could be the base of ever-expanding prosperity, these elite networks were indispensible, even if individual members were not. Theories that link education and human capital of technological progress need to stress the importance of these small creative communities jointly with wider phenomena such as literacy rates and universal schooling.
There is so much to like and to be impressed with this book and even later Mokyr writings. My two criticisms are that, first, I found the pseudo-science of his knowledge labels confusing (I kept having to mentally translate the omega symbol) and I disliked the naming distinctions between propositional and prescriptive, even though I think the concepts are spot on.
My second criticism, a more major one, is that Mokyr notes, but does not adequately pursue, “In the decades after 1815, a veritable explosion of technical literature took place. Comprehensive technical compendia appeared in every industrial field.” Statements such as these, and there are many in the book, hint at perhaps some fundamental drivers.
Mokyr has provided the raw grist for answering his starting question of why such massive economic growth occurred in conjunction with the era of the Industrial Revolution. He has made many insights and posited new factors to explain this salutary discontinuity from all prior human history. But, in this reviewer’s opinion, he still leaves the why tantalizingly close but still unanswered. The fixity of information and growing storehouses because of declining production and access costs remain too poorly explored.
This Friday brown bag leftover was first placed into the AI3 refrigerator about four years ago on July 6, 2006. It was part of a series of book reviews I was doing at that time getting at the importance of bulk paper production as a key enabler of economic growth. No changes have been made to the original posting.Categories: Semantic Web
Unreluctantly Cutting the Tether
AI^3 - Wed, 04/14/2010 - 04:40
You’ve Got to be Crazy to Look to an Ad-based Revenue Model
OK. After an experiment of more than three years, I have just now canceled my Google AdSense participation. (Which, Google, by the way, makes almost impossible to do: Finding the cancel link is hard enough; but who remembers the day they first signed up for ads and how many impressions they got that day? Both are required to get a cancellation request approved. Give me a break. It is worse than banks claiming small digits from bank interest for their own income!!)
Despite my sub-title, I never did expect to make much (or, really, any) money from Google ads. When I first signed up for it in Dec 2006, I stated I was doing it to find out how this ad-based business really works.
Well, from my standpoint, it does not work well; actually, not well at all.
Over the years I have seen visits on this site climb to nearly 3 K per day, and other nice growth factors. Perhaps if I were really focused on ad revenue I would have rotated stuff, tried alternative placements, yada yada. But, mostly, I was just trying to see who made out in this ad game.
It is certainly not the standard blog. I think my stats put me somewhere in the top 1% of all sites visited, but even that is not enough to even pay my monthly server charges (now higher with Amazon EC2).
Yet, in recent months, I have noticed some vendors have specifically targeted advertising on my blog and there also has been an increase in full banner ads (away from the standard, unobtrusive link Google ads of years past). Maybe they know something I don’t and they are winning, but my monthly ad income has dropped or remained flat.
And, then, I began to get full panel flashing ads on my site that just screamed Hit me! Hit me!. WTF. It was the last straw. Where did the unobtrusive link stuff go? Screw it; I can afford to pay my own monthly chump change.
This is probably not the time or place to discuss business models on the Web, but the woeful state of ad-based revenue is apparent. My goodness, I’m getting tired of ReadWriteWeb, as an example and one of the biggest at that, shilling with repeats and big ads with stories for their prominent advertisers each weekend. And, they are one of the only few ad winners!
My honest guess is that fewer than 1/10 of 1% of Web sites with advertising make enough to cover their bandwidth and server costs. How do you spell s-m-a-r-t?
So, the experiment is over. I will now think a bit about how I can reclaim that valuable Web page space from my former charitable contribution to the Google cafeteria. Bring on the sushi!
Categories: Semantic Web
Brown Bag Lunch: Methods for Semantic Discovery, Annotation and Mediation
AI^3 - Fri, 04/09/2010 - 15:34
Mediating semantic heterogeneities requires tools and automation (or semi-automation) at scale. But existing tools are still crude and lack across-the-board integration. This is one of the next challenges in getting more widespread acceptance of the semantic Web.
In earlier posts, I described the significant progress in climbing the data federation pyramid, today’s evolution in emphasis to the semantic Web, and the 40 or so sources of semantic heterogeneity. We now transition to an overview of how one goes about providing these semantics and resolving these heterogeneities.
Why the Need for Tools and Automation?In an excellent recent overview of semantic Web progress, Paul Warren points out:[1]
Although knowledge workers no doubt believe in the value of annotating their documents, the pressure to create metadata isn’t present. In fact, the pressure of time will work in a counter direction. Annotation’s benefits accrue to other workers; the knowledge creator only benefits if a community of knowledge workers abides by the same rules. . . . Developing semiautomatic tools for learning ontologies and extracting metadata is a key research area . . . .Having to move out of a user’s typical working environment to ‘do knowledge management’ will act as a disincentive, whether the user is creating or retrieving knowledge.
Of course, even assuming that ontologies are created and semantics and metadata are added to content, there still remains the nasty problems of resolving heterogeneities (semantic mediation) and efficiently storing and retrieving the metadata and semantic relationships.
Putting all of this process in place requires the infrastructure in the form of tools and automation and proper incentives and rewards for users and suppliers to conform to it.
Areas Requiring Tools and AutomationIn his paper, Warren repeatedly points to the need for “semi-automatic” methods to make the semantic Web a reality. He makes fully a dozen such references, in addition to multiple references to the need for “reasoning algorithms.” In any case, here are some of the areas noted by Warren needing “semi-automatic” methods:
- Assign authoritativeness
- Learn ontologies
- Infer better search requests
- Mediate ontologies (semantic resolution)
- Support visualization
- Assign collaborations
- Infer relationships
- Extract entities
- Create ontologies
- Maintain and evolve ontologies
- Create taxonomies
- Infer trust
- Analyze links
- etc.
In a different vein, SemWebCentral lists these clusters of semantic Web-related tasks, each of which also requires tools:[2]
- Create an ontology — use a text or graphical ontology editor to create the ontology, which is then validated. The resulting ontology can then be viewed with a browser before being published
- Disambiguate data – generate a mapping between multiple ontologies to identify where classes and properties are the same
- Expose a relational database as OWL — an editor is first used to create the ontologies that represent the database schema, then the ontologies are validated, translated to OWL and then the generated OWL is validated
- Intelligently query distributed data – repository and again able to be queried
- Manually create data from an ontology — a user would use an editor to create new OWL data based on existing ontologies, which is then validated and browsable
- Programmatically interact with OWL content — custom programs can view, create, and modify OWL content with an API
- Query non-OWL data — via an annotation tool, create OWL metadata from non-OWL content
- Visualize semantic data — view semantic data in a custom visualizer.
With some ontologies approaching tens to hundreds of thousands to millions of triples, viewing, annotating and reconciling at scale can be daunting tasks, the efforts behind which would never be taken without useful tools and automation.
A Workflow Perspective Helps Frame the ChallengeA 2005 paper by Izza, Vincent and Burlat (among many other excellent ones) at the first International Conference on Interoperability of Enterprise Software and Applications (INTEROP-ESA) provides a very readable overview on the role of semantics and ontologies in enterprise integration.[3] Besides proposing a fairly compelling unified framework, the authors also present a useful workflow perspective emphasizing Web services (WS), also applicable to semantics in general, that helps frame this challenge:
Generic Semantic Integration Workflow (adapted from [3])
For existing data and documents, the workflow begins with information extraction or annotation of semantics and metadata (#1) in accordance with a reference ontology. Newly found information via harvesting must also be integrated; however, external information or services may come bearing their own ontologies, in which case some form of semantic mediation is required.
Of course, this is a generic workflow, and depending on the interoperation task, different flows and steps may be required. Indeed, the overall workflow can vary by perspective and researcher, with semantic resolution workflow modeling a prime area of current investigations. (As one alternative among scores, see for example Cardoso and Sheth.[4])
Matching and Mapping Semantic HeterogeneitiesSemantic mediation is a process of matching schemas and mapping attributes and values, often with intermediate transformations (such as unit or language conversions) also required. The general problem of schema integration is not new, with one prior reference going back as early as 1986. [5] According to Alon Halevy:[6]
As would be expected, people have tried building semi-automated schema-matching systems by employing a variety of heuristics. The process of reconciling semantic heterogeneity typically involves two steps. In the first, called schema matching, we find correspondences between pairs (or larger sets) of elements of the two schemas that refer to the same concepts or objects in the real world. In the second step, we build on these correspondences to create the actual schema mapping expressions.
The issues of matching and mapping have been addressed in many tools, notably commercial ones from MetaMatrix,[7] and open source and academic projects such as Piazza, [8] SIMILE, [9] and the WSMX (Web service modeling execution environment) protocol from DERI. [10] [11] A superb description of the challenges in reconciling the vocabularies of different data sources is also found in the thesis by Dr. AnHai Doan, which won the 2003 ACM’s Prestigious Doctoral Dissertation Award.[12]
What all of these efforts has found is the inability to completely automate the mediation process. The current state-of-the-art is to reconcile what is largely unambiguous automatically, and then prompt analysts or subject matter experts to decide the questionable matches. These are known as “semi-automated” systems and the user interface and data presentation and workflow become as important as the underlying matching and mapping algorithms. According to the WSMX project, there is always a trade-off between how accurate these mappings are and the degree of automation that can be offered.
Also a Need for Efficient Semantic Data StoresOnce all of these reconciliations take place there is the (often undiscussed) need to index, store and retrieve these semantics and their relationships at scale, particularly for enterprise deployments. This is a topic I have addressed many times from the standpoint of scalability, more scalability, and comparisons of database and relational technologies, but it is also not a new topic in the general community.
As Stonebraker and Hellerstein note in their retrospective covering 35 years of development in databases,[13] some of the first post-relational data models were typically called semantic data models, including those of Smith and Smith in 1977[14] and Hammer and McLeod in 1981.[15] Perhaps what is different now is our ability to address some of the fundamental issues.
At any rate, this subsection is included here because of the hidden importance of database foundations. It is therefore a topic often addressed in this series.
A Partial Listing of Semantic Web ToolsIn all of these areas, there is a growing, but still spotty, set of tools for conducting these semantic tasks. SemWebCentral, the open source tools resource center, for example, lists many tools and whether they interact or not with one another (the general answer is often No).[16] Protégé also has a fairly long list of plugins, but not unfortunately well organized. [17]
In the table below, I begin to compile a partial listing of semantic Web tools, with more than 50 listed. Though a few are commercial, most are open source. Also, for the open source tools, only the most prominent ones are listed (Sourceforge, for example, has about 200 projects listed with some relation to the semantic Web though most of minor or not yet in alpha release).
NAME
URL
DESCRIPTION
Almo http://ontoware.org/projects/almo An ontology-based workflow engine in Java Altova SemanticWorks http://www.altova.com/products_semanticworks.html Visual RDF and OWL editor that auto-generates RDF/XML or nTriples based on visual ontology design Bibster http://bibster.semanticweb.org/ A semantics-based bibliographic peer-to-peer system cwm http://www.w3.org/2000/10/swap/doc/cwm.html A general purpose data processor for the semantic Web Deep Query Manager http://www.brightplanet.com/products/dqm_overview.asp Search federator from deep Web sources DOSE https://sourceforge.net/projects/dose A distributed platform for semantic annotation ekoss.org http://www.ekoss.org/ A collaborative knowledge sharing environment where model developers can submit advertisements Endeca http://www.endeca.com Facet-based content organizer and search platform FOAM http://ontoware.org/projects/map Framework for ontology alignment and mapping Gnowsis http://www.gnowsis.org/ A semantic desktop environment GrOWL http://ecoinformatics.uvm.edu/technologies/growl-knowledge-modeler.html Open source graphical ontology browser and editor HAWK http://swat.cse.lehigh.edu/projects/index.html#hawk OWL repository framework and toolkit HELENOS http://ontoware.org/projects/artemis A Knowledge discovery workbench for the semantic Web Jambalaya http://www.thechiselgroup.org/jambalaya Protégé plug-in for visualizing ontologies Jastor http://jastor.sourceforge.net/ Open source Java code generator that emits Java Beans from ontologies Jena http://jena.sourceforge.net/ Opensource ontology API written in Java KAON http://kaon.semanticweb.org/ Open source ontology management infrastructure Kazuki http://projects.semwebcentral.org/projects/kazuki/ Generates a java API for working with OWL instance data directly from a set of OWL ontologies Kowari http://www.kowari.org/ Open source database for RDF and OWL LuMriX http://www.lumrix.net/xmlsearch.php A commercial search engine using semantic Web technologies MetaMatrix http://www.metamatrix.com/ Semantic vocabulary mediation and other tools Metatomix http://www.metatomix.com/ Commercial semantic toolkits and editors MindRaider http://mindraider.sourceforge.net/index.html Open source semantic Web outline editor Model Futures OWL Editor http://www.modelfutures.com/OwlEditor.html Simple OWL tools, featuring UML (XMI), ErWin, thesaurus and imports Net OWL http://www.netowl.com/ Entity extraction engine from SRA International Nokia Semantic Web Server https://sourceforge.net/projects/sws-uriqa An RDF based knowledge portal for publishing both authoritative and third party descriptions of URI denoted resources OntoEdit/OntoStudio http://ontoedit.com/ Engineering environment for ontologies OntoMat Annotizer http://annotation.semanticweb.org/ontomat Interactive Web page OWL and semantic annotator tool Oyster http://ontoware.org/projects/oyster Peer-to-peer system for storing and sharing ontology metadata Piggy Bank http://simile.mit.edu/piggy-bank/ A Firefox-based semantic Web browser Pike http://pike.ida.liu.se/ A dynamic programming (scripting) language similar to Java and C for the semantic Web pOWL http://powl.sourceforge.net/index.php Semantic Web development platform Protégé http://protege.stanford.edu/ Open source visual ontology editor written in Java with many plug-in tools RACER Project https://sourceforge.net/projects/racerproject A collection of Projects and Tools to be used with the semantic reasoning engine RacerPro RDFReactor http://rdfreactor.ontoware.org/ Access RDF from Java using inferencing Redland http://librdf.org/ Open source software libraries supporting RDF RelationalOWL https://sourceforge.net/projects/relational-owl Automatically extracts the semantics of virtually any relational database and transforms this information automatically into RDF/OW Semantical http://semantical.org/ Open source semantic Web search engine SemanticWorks http://www.altova.com/products_semanticworks.html SemanticWorks RDF/OWL Editor Semantic Mediawiki https://sourceforge.net/projects/semediawiki Semantic extension to the MediaWiiki wiki Semantic Net Generator https://sourceforge.net/projects/semantag Utility for generating topic maps automatically Sesame http://www.openrdf.org/ An open source RDF database with support for RDF Schema inferencing and querying SMART http://web.ict.nsc.ru/smart/index.phtml?lang=en System for Managing Applications based on RDF Technology SMORE http://www.mindswap.org/2005/SMORE/ OWL markup for HTML pages SPARQL http://www.w3.org/TR/rdf-sparql-query/ Query language for RDF SWCLOS http://iswc2004.semanticweb.org/demos/32/ A semantic Web processor using Lisp Swoogle http://swoogle.umbc.edu/ A semantic Web search engine with 1.5 M resources SWOOP http://www.mindswap.org/2004/SWOOP/ A lightweight ontology editor Turtle http://www.ilrt.bris.ac.uk/discovery/2004/01/turtle/ Terse RDF “Triple” language WSMO Studio https://sourceforge.net/projects/wsmostudio A semantic Web service editor compliant with WSMO as a set of Eclipse plug-ins WSMT Toolkit https://sourceforge.net/projects/wsmt The Web Service Modeling Toolkit (WSMT) is a collection of tools for use with the Web Service Modeling Ontology (WSMO), the Web Service Modeling Language (WSML) and the Web Service Execution Environment (WSMX) WSMX https://sourceforge.net/projects/wsmx/ Execution environment for dynamic use of semantic Web services Tools Still Crude, Integration Not CompellingIndividually, there are some impressive and capable tools on this list. Generally, however, the interfaces are not intuitive, integration between tools is lacking, and why and how standard analysts should embrace them is lacking. In the semantic Web, we have yet to see an application of the magnitude of the first Mosaic browser that made HTML and the World Wide Web compelling.
It is perhaps likely that a similar “killer app” may not be forthcoming for the semantic Web. But it is important to remember just how entwined tools are to accelerating acceptance and growth of new standards and protocols.
This Friday brown bag leftover was first placed into the AI3 refrigerator about four years ago on June 12, 2006. It was the follow-on to last week’s Brown Bag Lunch posting. It is also the first attempt I made at assembling semantic Web- and -related tools, which has now grown into the 800+ Sweet Tools listing. No changes have been made to the original posting. [1] Paul Warren, “Knowledge Management and the Semantic Web: From Scenario to Technology,” IEEE Intelligent Systems, vol. 21, no. 1, 2006, pp. 53-59. See http://dsonline.computer.org/portal/site/dsonline/menuitem.9ed3d9924aeb0dcd82ccc6716bbe36ec/index.jsp?&pName=dso_level1&path=dsonline/2006/02&file=x1war.xml&xsl=article.xsl& [2] See http://www.semwebcentral.org/index.jsp?page=workflows. [Link now missing.] [3] Said Izza, Lucien Vincent and Patrick Burlat, “A Unified Framework for Enterprise Integration: An Ontology-Driven Service-Oriented Approach,” pp. 78-89, in Pre-proceedings of the First International Conference on Interoperability of Enterprise Software and Applications (INTEROP-ESA’2005), Geneva, Switzerland, February 23 – 25, 2005, 618 pp. See http://interop-esa05.unige.ch/INTEROP/Proceedings/Interop-ESAScientific/OneFile/InteropESAproceedings.pdf. [4] Jorge Cardoso and Amit Sheth, “Semantic Web Processes: Semantics Enabled Annotation, Discovery, Composition and Orchestration of Web Scale Processes,” in the 4th International Conference on Web Information Systems Engineering (WISE 2003), December 10-12, 2003, Rome, Italy. See http://lsdis.cs.uga.edu/lib/presentations/WISE2003-Tutorial.pdf. [5] C. Batini, M. Lenzerini, and S.B. Navathe, “A Comparative Analysis of Methodologies for Database Schema Integration,” in ACM Computing Survey, 18(4):323-364, 1986. [6] Alon Halevy, “Why Your Data Won’t Mix,” ACM Queue vol. 3, no. 8, October 2005. See http://www.acmqueue.org/modules.php?name=Content&pa=showpage&pid=336. [7] Chuck Moser, Semantic Interoperability: Automatically Resolving Vocabularies, presented at the 4th Semantic Interoperability Conference, February 10, 2006. See http://colab.cim3.net/file/work/SICoP/2006-02-09/Presentations/CMosher02102006.ppt. [8] Alon Y. Halevy, Zachary G. Ives, Peter Mork and Igor Tatarinov, “Piazza: Data Management Infrastructure for Semantic Web Applications,” Journal of Web Semantics, Vol. 1 No. 2, February 2004, pp. 155-175. See http://www.cis.upenn.edu/~zives/research/piazza-www03.pdf. [9] Stefano Mazzocchi, Stephen Garland, Ryan Lee, “SIMILE: Practical Metadata for the Semantic Web,” January 26, 2005. See http://www.xml.com/pub/a/2005/01/26/simile.html. [10] Adrian Mocan, Ed., “WSMX Data Mediation,” in WSMX Working Draft, W3C Organization, 11 October 2005. See http://www.wsmo.org/TR/d13/d13.3/v0.2/20051011. [11] J.Madhavan , P. A. Bernstein , P. Domingos and A. Y. Halevy, “Representing and Reasoning About Mappings Between Domain Models,” in the Eighteenth National Conference on Artificial Intelligence, pp.80-86, Edmonton, Alberta, Canada, July 28-August 01, 2002. [12] AnHai Doan, Learning to Map between Structured Representations of Data, Ph.D. Thesis to the Computer Science & Engineering Department, University of Washington, 2002, 133 pp. See http://anhai.cs.uiuc.edu/home/thesis/anhai-thesis.pdf. [13] Michael Stonebraker and Joey Hellerstein, “What Goes Around Comes Around,” in Joseph M. Hellerstein and Michael Stonebraker, editors, Readings in Database Systems, Fourth Edition, pp. 2-41, The MIT Press, Cambridge, MA, 2005. See http://mitpress.mit.edu/books/chapters/0262693143chapm1.pdf. [14] John Miles Smith and Diane C. P. Smith, “Database Abstractions: Aggregation and Generalization,” ACM Transactions on Database Systems 2(2): 105-133, 1977. [15] Michael Hammer and Dennis McLeod, “Database Description with SDM: A Semantic Database Model,” ACM Transactions on Database Systems 6(3): 351-386, 1981. [16] See http://www.semwebcentral.org/index.jsp?page=home. [17] See http://protege.cim3.net/cgi-bin/wiki.pl?ProtegePluginsLibraryByType.Categories: Semantic Web





