Approximate Subgraph Matching Algorithm (ASM) |
The subgraph matching problem (subgraph isomorphism) is NP-complete. Previously, we designed an exact subgraph matching (ESM) algorithm for dependency graphs using a backtracking approach (http://esmalgorithm.sourceforge.net). We further designed an approximate subgraph matching (ASM) algorithm that is capable of detecting approximate subgraph matching based on a subgraph distance. Assume that the graph G and the subgraph Gs have m and n vertices, and km and kn edges respectively, the total worst-case algorithm complexity is O(m^n * n(n-1)/2 * km * log m).
We have demonstrated the successful usage of our algorithm in different biomedical information extraction (relation and event) applications: BioNLP 2011 and 2013 shared tasks on event extraction in molecular biology and cancer genetics, and Protein-Residue association detection.
The ASM result natually subsumes the ESM result, i.e., approximate subgraph matching includes exact subgraph matching. However, if you look for exact subgraph isomorphism or graph isomorphism, please consider the ESM implementation (http://esmalgorithm.sourceforge.net), which is more efficient in complexity.
If you use our ASM implementation to support academic research, please cite the following paper:
Haibin Liu, Lawrence Hunter, Vlado Keselj, and Karin Verspoor. Approximate Subgraph Matching-based Literature Mining for Biomedical Events and Relations. PLOS ONE, 8:4 e60954, 2013.
Source code and resources pertaining to the ASM 1.0 release are available here
Java API documentation can be found at http://asmalgorithm.sourceforge.net/apidocs/
Version 1.0 of the ASM is available via a Maven repository. If you use Maven as your build tool, you can add the ASM as a dependency by adding the following to your pom.xml file:
<dependency> <groupId>edu.ucdenver.ccp</groupId> <artifactId>asm</artifactId> <version>1.0</version> </dependency> <repository> <id>bionlp-sourceforge</id> <url>http://svn.code.sf.net/p/bionlp/code/repo/</url> </repository>