public class Miner
extends java.lang.Object
implements java.lang.Runnable
Modifier and Type | Field and Description |
---|---|
static int |
ALLEXTS
flag for generating all extensions
|
static int |
AROMATIZE
flag for converting Kekulé representations
|
protected moss.RepoElem[] |
bins
the repository of processed substructures (hash table)
|
protected long |
canonic
for benchmarking: canonical form pruning counter
|
static int |
CHAINEXT
flag for extensions by chains
|
protected long |
chains
for benchmarking: invalid chains counter
|
static int |
CLASSES
flag for using node equivalence classes
|
static int |
CLOSED
flag for restriction to closed fragments
|
static int |
CLOSERINGS
flag for filtering open rings
|
protected CanonicalForm |
cnf
the canonical form and restricted extension generator
|
protected int[] |
cnts
the numbers of graphs in focus and complement
|
protected Recoder |
coder
the recoder for the node types
|
protected int |
comp
the maximum support in the complement as an absolute value
|
static java.lang.String |
COPYRIGHT
the copyright information for this program
|
protected NamedGraph |
curr
the current insertion point for the focus
|
static int |
DEFAULT
default search mode flags: edge extensions, embeddings,
canonical form and full perfect extension pruning
|
static java.lang.String |
DESCRIPTION
the program description
|
protected long |
duplic
for benchmarking: duplicate fragments counter
|
static int |
EDGEEXT
flag for extensions by single edges
|
protected long |
embcmp
for benchmarking: the number of comparisons with embeddings
|
protected long |
embcnt
for benchmarking: the number of created embeddings
|
protected int |
emblvl
the level at which to switch to embeddings
|
protected long |
equiv
for benchmarking: equivalent frag.
|
static int |
EQVARS
flag for extensions by equivalent variants of rings
|
protected Graph |
exseed
the node types that are excluded as seeds
|
protected Graph |
extype
the excluded node types
|
protected double |
fcomp
the maximum support in the complement as a fraction
|
protected Fragment |
frag
the initial fragment (embedded seed structure)
|
protected long |
fragcmp
for benchmarking: the number of fragment comparisons
|
protected long |
fragcnt
for benchmarking: the number of created fragments
|
protected double |
fsupp
the minimum support in the focus as a fraction
|
protected NamedGraph |
graphs
the list of graphs to mine (database)
|
protected int |
group
the group for graphs with a value below the threshold
|
protected long |
invalid
for benchmarking: invalid fragments counter
|
protected long |
isocnt
for benchmarking: the number of isomorphism tests
|
protected java.io.PrintStream |
log
stream to write progress messages to
|
static int |
LOGIC
flag for conversion to logic representation
|
protected long |
lowsupp
for benchmarking: insufficient support pruning counter
|
protected int[] |
masks
the masks for nodes and edges
|
protected int |
max
the maximum size of substructures to report (number of nodes)
|
protected int |
maxdep
for benchmarking: the maximum depth of the search tree
|
protected int |
maxepg
the maximum number of embeddings per graph
|
static int |
MERGERINGS
flag for merging ring extensions with the same first edge
|
protected int |
min
the minimum size of substructures to report (number of nodes)
|
protected int |
mode
the search mode flags
|
protected long |
nodecnt
for benchmarking: the number of search tree nodes
|
protected long |
nonclsd
for benchmarking: non-closed fragments counter
|
protected CanonicalForm |
norm
the canonical form for normalizing the output
|
static int |
NORMFORM
flag for normalized substructure output
|
static int |
NOSTATS
flag for no search statistics output
|
protected long |
openrgs
for benchmarking: open ring fragments counter
|
static int |
ORBITS
flag for extension filtering with node orbits
|
protected long |
perfect
for benchmarking: perfect extension pruning counter
|
static int |
PR_CANONIC
flag for canonical form pruning
|
static int |
PR_EQUIV
flag for equivalent sibling extension pruning
|
static int |
PR_PARTIAL
flag for partial perfect extension pruning
|
static int |
PR_PERFECT
flag for full perfect extension pruning
|
static int |
PR_UNCLOSE
flag for pruning fragments with unclosable rings
|
protected GraphReader |
reader
the graph data set file reader
|
protected int |
recnt
the size of the repository (number of substructures)
|
protected long |
repcnt
for benchmarking: the number of repository accesses
|
protected int |
rgmax
the maximum size of rings (number of nodes/edges)
|
protected int |
rgmin
the minimum size of rings (number of nodes/edges)
|
static int |
RINGEXT
flag for extensions by rings
|
protected long |
ringord
for benchmarking: ring order pruning counter
|
protected Graph |
seed
the seed structure to start the search from
|
protected int |
subcnt
the number of reported substructures
|
protected int |
supp
the minimum support in the focus as an absolute value
|
protected NamedGraph |
tail
the tail of the list of graphs (insertion point for complement)
|
protected double |
thresh
the threshold for the split into focus and complement
|
static int |
TRANSFORM
flag for conversion to another description format
|
protected int |
type
the type of support to use
|
static int |
UNEMBED
flag for unembedding siblings of the current search tree nodes
|
static int |
VERBOSE
flag for verbose reporting
|
static java.lang.String |
VERSION
the version of this program
|
protected Notation |
vntn
the notation for verbose output
|
protected java.io.Writer |
wrids
the identifier file writer
|
protected GraphWriter |
writer
the substructure file writer
|
Constructor and Description |
---|
Miner()
Create an empty miner with default parameter settings.
|
Modifier and Type | Method and Description |
---|---|
void |
abort()
Abort the miner (if running as a thread).
|
void |
addGraph(NamedGraph graph)
Add a graph to the database.
|
int |
embed()
Embed the seed structure into all graphs.
|
int |
getCurrent()
Get the substructures that have been found up to now.
|
java.lang.Throwable |
getError()
Get the error status of the search process.
|
void |
init(java.lang.String[] args)
Initialize the miner from command line arguments.
|
static void |
main(java.lang.String[] args)
Command line invocation of the molecular substructure miner.
|
protected void |
mine()
Preprocess the graphs, embed the seed, and start the search.
|
protected boolean |
report(Fragment frag)
Check and report a found fragment/substructure.
|
void |
run()
Run the miner and clean up after the search finished.
|
void |
setCnF(CanonicalForm cnf)
Set the canonical form.
|
void |
setEmbed(int level,
int maxepg)
Set the embeddings parameters.
|
void |
setExcluded(Graph extype,
Graph exseed)
Set the excluded nodes and excluded seeds.
|
void |
setExcluded(java.lang.String extype,
java.lang.String exseed,
java.lang.String format)
Set the excluded nodes and excluded seeds.
|
void |
setGrouping(double thresh,
boolean invert)
Set the grouping parameters.
|
void |
setInput(GraphReader reader)
Set the input reader.
|
void |
setInput(java.lang.String fname,
java.lang.String format)
Set the input reader.
|
void |
setLimits(double supp,
double comp)
Set the support limits.
|
void |
setLog(java.io.PrintStream stream)
Sets the stream to which progress messages are written.
|
void |
setMasks(int node,
int edge,
int ringnode,
int ringedge)
Set the node and edge masks.
|
void |
setMode(int mode)
Set the search mode.
|
void |
setOutput(GraphWriter writer)
Set the output writer.
|
void |
setOutput(GraphWriter writer,
java.io.Writer wrids)
Set the output writers.
|
void |
setOutput(java.lang.String fname,
java.lang.String format)
Set the output writer.
|
void |
setOutput(java.lang.String fn_sub,
java.lang.String format,
java.lang.String fn_ids)
Set the output writers.
|
void |
setRingSizes(int min,
int max)
Set the minimum and maximum ring size.
|
void |
setSeed(Graph seed)
Set the seed structure to start the search from.
|
void |
setSeed(java.lang.String desc,
java.lang.String format)
Set the seed structure to start the search from.
|
void |
setSizes(int min,
int max)
Set the minimum and maximum fragment size.
|
void |
setType(int type)
Set the support type.
|
void |
stats()
Print statistics about the search.
|
protected void |
term()
Clean up after the search finished or was aborted.
|
void |
writeGraphs()
Write all graphs of the database.
|
public static final java.lang.String DESCRIPTION
public static final java.lang.String VERSION
public static final java.lang.String COPYRIGHT
public static final int EDGEEXT
public static final int RINGEXT
public static final int CHAINEXT
public static final int EQVARS
public static final int ORBITS
public static final int CLASSES
public static final int ALLEXTS
public static final int CLOSED
public static final int CLOSERINGS
public static final int MERGERINGS
public static final int PR_UNCLOSE
public static final int PR_PARTIAL
public static final int PR_PERFECT
public static final int PR_EQUIV
public static final int PR_CANONIC
public static final int UNEMBED
public static final int NORMFORM
public static final int VERBOSE
public static final int AROMATIZE
public static final int TRANSFORM
public static final int LOGIC
public static final int NOSTATS
public static final int DEFAULT
protected int mode
protected int type
protected double fsupp
protected int supp
protected double fcomp
protected int comp
protected int min
protected int max
protected int rgmin
protected int rgmax
protected int[] masks
protected Recoder coder
protected Graph seed
protected Graph extype
protected Graph exseed
protected NamedGraph graphs
protected NamedGraph curr
protected NamedGraph tail
protected int[] cnts
protected int emblvl
protected Fragment frag
protected int maxepg
protected moss.RepoElem[] bins
protected int recnt
protected CanonicalForm cnf
protected CanonicalForm norm
protected int subcnt
protected GraphReader reader
protected double thresh
protected int group
protected GraphWriter writer
protected java.io.Writer wrids
protected Notation vntn
protected java.io.PrintStream log
protected int maxdep
protected long nodecnt
protected long fragcnt
protected long embcnt
protected long lowsupp
protected long perfect
protected long equiv
protected long ringord
protected long canonic
protected long duplic
protected long nonclsd
protected long openrgs
protected long chains
protected long invalid
protected long repcnt
protected long fragcmp
protected long isocnt
protected long embcmp
public Miner()
public void setMode(int mode)
The search mode is a combination of the search mode flags,
e.g. RINGEXT
or PR_CANONIC
.
mode
- the search modepublic void setType(int type)
Constants for support types are defined in the class
Fragment
.
type
- the support type to useFragment
public void setLimits(double supp, double comp)
Positive values are fractions of the focus or complement set, negative values are absolute numbers.
supp
- the minimum support in the focuscomp
- the maximum support in the complementpublic void setSizes(int min, int max)
min
- the minimum fragment size (number of nodes)max
- the maximum fragment size (number of nodes)public void setRingSizes(int min, int max)
min
- the minimum ring size (number of nodes/edges)max
- the maximum ring size (number of nodes/edges)public void setMasks(int node, int edge, int ringnode, int ringedge)
node
- the mask for nodes outside (marked) ringsedge
- the mask for edges outside (marked) ringsringnode
- the mask for nodes in (marked) ringsringedge
- the mask for edges in (marked) ringspublic void setEmbed(int level, int maxepg)
Restricting the maximum number of embeddings per graph can reduce the amount of memory needed in the search, but slows down the operation (sometimes considerably).
level
- the level at which to switch to embeddingsmaxepg
- the maximum number of embeddings per graphpublic void setExcluded(Graph extype, Graph exseed)
Excluded nodes are completely removed from the search, that is, no substructure containing such an node will be reported. Nodes that are only excluded as seeds may appear in reported fragments, but are not used as seeds. This can be useful, for example, in the case where carbon is the most frequent element and one is not interested in fragments containing only carbon nodes.
extype
- the node types to exclude from the searchexseed
- the node types to exclude as seedspublic void setExcluded(java.lang.String extype, java.lang.String exseed, java.lang.String format) throws java.io.IOException
The arguments exat
and exsd
are
parsed as graph descriptions in the notation given by the
argument format
.
extype
- the description of the excluded nodesexseed
- the description of the nodes to exclude as seedsformat
- the format of the descriptionsjava.io.IOException
public void setSeed(Graph seed) throws java.io.IOException
seed
- the seed structure for the searchjava.io.IOException
public void setSeed(java.lang.String desc, java.lang.String format) throws java.io.IOException
The argument desc
is parsed as graph
description in the notation given by the argument
format
.
desc
- the description of the seed structureformat
- the format of the seed descriptionjava.io.IOException
public void setGrouping(double thresh, boolean invert)
If invert == false
, all graphs having an
associated value smaller than the threshold thresh
are placed into the focus and all other graphs are the complement.
If invert == true
, this split is inverted, that is,
all graphs having an associated value no less than the threshold
thresh
are placed into the focus and all other
graphs are the complement.
thresh
- the threshold for the groupinginvert
- whether to invert the groupingpublic void setLog(java.io.PrintStream stream)
By default all messages are written to System.err
.
stream
- the stream to write topublic void setInput(GraphReader reader)
reader
- the reader from which to read the graphspublic void setInput(java.lang.String fname, java.lang.String format) throws java.io.IOException
fname
- the name of the input data fileformat
- the format of the input datajava.io.IOException
public void setOutput(GraphWriter writer)
writer
- the writer to write the found substructurespublic void setOutput(GraphWriter writer, java.io.Writer wrids)
writer
- the writer to write the found substructureswrids
- the writer to write the graph identifierspublic void setOutput(java.lang.String fname, java.lang.String format) throws java.io.IOException
fname
- the name of the file for the found substructuresformat
- the format for the outputjava.io.IOException
public void setOutput(java.lang.String fn_sub, java.lang.String format, java.lang.String fn_ids) throws java.io.IOException
fn_sub
- the name of the file for the found fragmentsfn_ids
- the name of the file for the graph identifiersformat
- the format for the outputjava.io.IOException
public void setCnF(CanonicalForm cnf)
cnf
- the canonical form to setpublic void addGraph(NamedGraph graph)
When the graph is added, its group is evaluated and it is added to the list in such a way that all focus graphs are at the beginning of the list and all complement graphs at the end. Hence the group of a graph must not be changed after it has been added to a miner. Note that the order in which the graphs are added is preserved in the focus and the complement lists.
graph
- the graph to addpublic int embed()
protected boolean report(Fragment frag) throws java.io.IOException
In order to be actually reported (written to the output file),
the fragment must be valid (Fragment.isValid()
),
meet the maximum support requirement for the complement part of
the database, be closed (Fragment.isClosed()
) and
must not have open rings if only fragments with closed rings are
to be reported.
frag
- the fragment to reportjava.io.IOException
public void writeGraphs() throws java.io.IOException
java.io.IOException
public void init(java.lang.String[] args) throws java.io.IOException
args
- the command line argumentsjava.io.IOException
protected void mine() throws java.io.IOException
java.io.IOException
protected void term() throws java.io.IOException
java.io.IOException
public void run()
run
in interface java.lang.Runnable
public void abort()
public int getCurrent()
This function enables progress reporting by another thread.
It is used in the graphical user interface
(class MoSS
).
If the return value is negative, it indicates the number of graphs that have been loaded, otherwise the number of substructures that have been found.
public java.lang.Throwable getError()
With this function it can be checked, after the search with
the run()
method has terminated, whether an error
occurred in the search. Note that an external abort with the
function abort()
does not trigger an
exception to be thrown.
null
if the search was successfulpublic void stats()
public static void main(java.lang.String[] args)
args
- the command line arguments