### Abstract

Subgraph querying has wide applications in various fields such as cheminformatics and bioinformatics. Given a query graph, q, a subgraph-querying algorithm retrieves all graphs, D(q), which have q as a subgraph, from a graph database, D. Subgraph querying is costly because it uses subgraph isomorphism tests, which are NP-complete. Graph indices are commonly used to improve the performance of subgraph querying in graph databases. Subgraph-querying algorithms first construct a candidate answer set by filtering out a set of false answers and then verify each candidate graph using subgraph isomorphism tests. To build graph indices, various kinds of substructure (subgraph, subtree, or path) features have been proposed with the goal of maximizing the filtering rate. Each of them works with a specifically designed index structure, for example, discriminative and frequent subgraph features work with gIndex, δ-TCFG features work with FG-index, etc. We propose Lindex, a graph index, which indexes subgraphs contained in database graphs. Nodes in Lindex represent key-value pairs where the key is a subgraph in a database and the value is a list of database graphs containing the key. We propose two heuristics that are used in the construction of Lindex that allows us to determine answers to subgraph queries conducting less subgraph isomorphism tests. Consequently, Lindex improves subgraph-querying efficiency. In addition, Lindex is compatible with any choice of features. Empirically, we demonstrate that Lindex used in conjunction with subgraph indexing features proposed in previous works outperforms other specifically designed index structures. As a novel index structure, Lindex (1) is effective in filtering false graphs (2) provides fast index lookups, (3) is fast with respect to index construction and maintenance, and (4) can be constructed using any set of substructure index features. These four properties result in a fast and scalable subgraph-querying infrastructure. We substantiate the benefits of Lindex and its disk-resident variation Lindex+ theoretically and empirically.

Original language | English |
---|---|

Pages (from-to) | 229-252 |

Number of pages | 24 |

Journal | VLDB Journal |

Volume | 22 |

Issue number | 2 |

DOIs | |

Publication status | Published - 2013 |

Externally published | Yes |

### Fingerprint

### ASJC Scopus subject areas

- Hardware and Architecture
- Information Systems

### Cite this

*VLDB Journal*,

*22*(2), 229-252. https://doi.org/10.1007/s00778-012-0284-8

**Lindex : A lattice-based index for graph databases.** / Yuan, Dayu; Mitra, Prasenjit.

Research output: Contribution to journal › Article

*VLDB Journal*, vol. 22, no. 2, pp. 229-252. https://doi.org/10.1007/s00778-012-0284-8

}

TY - JOUR

T1 - Lindex

T2 - A lattice-based index for graph databases

AU - Yuan, Dayu

AU - Mitra, Prasenjit

PY - 2013

Y1 - 2013

N2 - Subgraph querying has wide applications in various fields such as cheminformatics and bioinformatics. Given a query graph, q, a subgraph-querying algorithm retrieves all graphs, D(q), which have q as a subgraph, from a graph database, D. Subgraph querying is costly because it uses subgraph isomorphism tests, which are NP-complete. Graph indices are commonly used to improve the performance of subgraph querying in graph databases. Subgraph-querying algorithms first construct a candidate answer set by filtering out a set of false answers and then verify each candidate graph using subgraph isomorphism tests. To build graph indices, various kinds of substructure (subgraph, subtree, or path) features have been proposed with the goal of maximizing the filtering rate. Each of them works with a specifically designed index structure, for example, discriminative and frequent subgraph features work with gIndex, δ-TCFG features work with FG-index, etc. We propose Lindex, a graph index, which indexes subgraphs contained in database graphs. Nodes in Lindex represent key-value pairs where the key is a subgraph in a database and the value is a list of database graphs containing the key. We propose two heuristics that are used in the construction of Lindex that allows us to determine answers to subgraph queries conducting less subgraph isomorphism tests. Consequently, Lindex improves subgraph-querying efficiency. In addition, Lindex is compatible with any choice of features. Empirically, we demonstrate that Lindex used in conjunction with subgraph indexing features proposed in previous works outperforms other specifically designed index structures. As a novel index structure, Lindex (1) is effective in filtering false graphs (2) provides fast index lookups, (3) is fast with respect to index construction and maintenance, and (4) can be constructed using any set of substructure index features. These four properties result in a fast and scalable subgraph-querying infrastructure. We substantiate the benefits of Lindex and its disk-resident variation Lindex+ theoretically and empirically.

AB - Subgraph querying has wide applications in various fields such as cheminformatics and bioinformatics. Given a query graph, q, a subgraph-querying algorithm retrieves all graphs, D(q), which have q as a subgraph, from a graph database, D. Subgraph querying is costly because it uses subgraph isomorphism tests, which are NP-complete. Graph indices are commonly used to improve the performance of subgraph querying in graph databases. Subgraph-querying algorithms first construct a candidate answer set by filtering out a set of false answers and then verify each candidate graph using subgraph isomorphism tests. To build graph indices, various kinds of substructure (subgraph, subtree, or path) features have been proposed with the goal of maximizing the filtering rate. Each of them works with a specifically designed index structure, for example, discriminative and frequent subgraph features work with gIndex, δ-TCFG features work with FG-index, etc. We propose Lindex, a graph index, which indexes subgraphs contained in database graphs. Nodes in Lindex represent key-value pairs where the key is a subgraph in a database and the value is a list of database graphs containing the key. We propose two heuristics that are used in the construction of Lindex that allows us to determine answers to subgraph queries conducting less subgraph isomorphism tests. Consequently, Lindex improves subgraph-querying efficiency. In addition, Lindex is compatible with any choice of features. Empirically, we demonstrate that Lindex used in conjunction with subgraph indexing features proposed in previous works outperforms other specifically designed index structures. As a novel index structure, Lindex (1) is effective in filtering false graphs (2) provides fast index lookups, (3) is fast with respect to index construction and maintenance, and (4) can be constructed using any set of substructure index features. These four properties result in a fast and scalable subgraph-querying infrastructure. We substantiate the benefits of Lindex and its disk-resident variation Lindex+ theoretically and empirically.

UR - http://www.scopus.com/inward/record.url?scp=84875405898&partnerID=8YFLogxK

UR - http://www.scopus.com/inward/citedby.url?scp=84875405898&partnerID=8YFLogxK

U2 - 10.1007/s00778-012-0284-8

DO - 10.1007/s00778-012-0284-8

M3 - Article

AN - SCOPUS:84875405898

VL - 22

SP - 229

EP - 252

JO - VLDB Journal

JF - VLDB Journal

SN - 1066-8888

IS - 2

ER -