public abstract class ExpansionPattern extends Object
Modifier and Type | Field and Description |
---|---|
protected static AbbrevEP |
abbrev |
protected static CompositeEP |
composite |
protected static CurrencyEP |
currency |
protected static DateEP |
date |
protected static DurationEP |
duration |
protected static MeasureEP |
measure |
protected static MultiWordEP |
multiword |
protected static NetEP |
net |
protected static NumberEP |
number |
protected static SpecialCharEP |
specialChar |
protected static TelephoneEP |
telephone |
protected static TimeEP |
time |
Constructor and Description |
---|
ExpansionPattern() |
Modifier and Type | Method and Description |
---|---|
protected boolean |
allowMultipleTokens()
Whether patterns of this type can be composed of several tokens.
|
static List<ExpansionPattern> |
allPatterns() |
protected abstract int |
canDealWith(String input,
int typeCode)
Decide whether we can expand a string according to type
typeCode . |
protected boolean |
doesFullExpansion()
Inform whether this module performs a full expansion of the input, or whether other patterns should be applied after this
one.
|
protected abstract List<Element> |
expand(List<Element> tokens,
String text,
int typeCode)
Subclasses do their expansion in this class.
|
static ExpansionPattern |
getPattern(String typeString) |
static String |
getSplitAtChars()
A string containing the characters at which a token should be split into parts before any preprocessing patterns are
applied.
|
protected boolean |
isCandidate(Element t) |
abstract List<String> |
knownTypes()
Returns the types known by this ExpansionPattern.
|
protected List<Element> |
makeNewTokens(Document doc,
String newText)
The default way to create new token DOM elements from whitespace-separated tokens in a string.
|
protected List<Element> |
makeNewTokens(Document doc,
String newText,
boolean createMtu,
String origText) |
protected List<Element> |
makeNewTokens(Document doc,
String newText,
boolean createMtu,
String origText,
boolean forceAccents) |
void |
match(Element sayas,
String typeString)
Try to match and expand the entirety of tokens enclosed by the say-as tag
sayas . |
protected abstract int |
match(String input,
int typeCode)
Subclasses do their matching in this class.
|
boolean |
process(Element t,
List<Element> expanded)
Try to match this pattern starting at token
t . |
abstract Pattern |
reMatchingChars()
Returns the regular expression object matching any of the chars occurring in the pattern.
|
protected void |
replaceTokens(List<Element> oldTokens,
List<Element> newTokens) |
static Pattern |
reSplitAtChars()
A regular expression matching the characters at which a token should be split into parts before any preprocessing patterns
are applied.
|
protected void |
slowDown(Element e)
Enclose token in a <prosody rate="..."> tag in order to slow the spelling down, and in a <phonology> tag in
order to enforce precise pronunciation.
|
protected void |
slowDown(Element first,
Element last)
Enclose the elements' closest common ancestor.
|
protected static MultiWordEP multiword
protected static CompositeEP composite
protected static NetEP net
protected static DateEP date
protected static TimeEP time
protected static DurationEP duration
protected static CurrencyEP currency
protected static MeasureEP measure
protected static TelephoneEP telephone
protected static NumberEP number
protected static AbbrevEP abbrev
protected static SpecialCharEP specialChar
public static List<ExpansionPattern> allPatterns()
public static ExpansionPattern getPattern(String typeString)
public static Pattern reSplitAtChars()
SpecialCharEP.getRESplitAtChars()
public static String getSplitAtChars()
SpecialCharEP.splitAtChars()
protected boolean allowMultipleTokens()
protected boolean doesFullExpansion()
public abstract List<String> knownTypes()
type
attribute to the
say-as
element, as defined in MaryXML.dtd. Each subclass needs to override this to return something
meaningful.public abstract Pattern reMatchingChars()
public boolean process(Element t, List<Element> expanded)
t
. If successful, replace the matched tokens with the replaced
form.t
- the element to expand. After processing, this Element will still exist and be a valid Element, but possibly with
a different content, and possibly enclosed by an <mtu> element. In addition, <t> may have new
right-hand neighbors.expanded
- an empty list into which the expanded Elements are placed if an expansion occurred. The list will remain empty
if no expansion was performed. Elements placed in the list are not guaranteed to be only t elements, but may be
elements enclosing the expanded t elements, such as mtu elements, as well as non-t empty elements (such as
boundary elements). If the list is non-empty, it is guaranteed to contain (either directly or as descendants of
the list items) at least one t element.protected boolean isCandidate(Element t)
public void match(Element sayas, String typeString) throws DOMException
sayas
. The type
of data
to expand is given. If the tokens can be matched according to type
, they are expanded. Throws DOMException if
sayas
's tag name is not "say-as".sayas
- sayastypeString
- typeStringDOMException
- DOMExceptionprotected abstract int canDealWith(String input, int typeCode)
typeCode
. This is important in cases where a
particular expansion is requested via a say-as
element. As a default, reply that a string can be expanded if
it would be matched by the pattern recogniser. Subclasses may wish to override this with less strict requirements. Returns
the type as which it can be expanded, or -1 if expansion is not possible.input
- inputtypeCode
- typeCodeprotected abstract int match(String input, int typeCode)
input
- is the String to be matched,typeCode
- is the index in knownTypes
to match with.typeCode
is a general type (
typeCode == 0
), it may have matched with a more specific subtype). On failure, -1
is
returned.protected abstract List<Element> expand(List<Element> tokens, String text, int typeCode)
tokens
- is a list of token Elements to be replaced with their expanded form. The expanded forms are inserted into the
DOM tree at the same positions as the tokens in List tokens
. If there are more new tokens than old
tokens, the rest are inserted as siblings at the position of the last old token.text
- is the String to be expanded,typeCode
- is the index in knownTypes
this string has matched with before.protected List<Element> makeNewTokens(Document doc, String newText)
graph
or graph[phon]
, where the optional phon
, if present, is set as value to the
sampa
attribute of the t
element.
All expansion patterns that do not require any special attribute settings should create their new tokens using this method.
Returns a list of token elements created from Document doc
, but not yet attached in the tree.
doc
- docnewText
- newTextprotected List<Element> makeNewTokens(Document doc, String newText, boolean createMtu, String origText)
protected List<Element> makeNewTokens(Document doc, String newText, boolean createMtu, String origText, boolean forceAccents)
protected void slowDown(Element e)
e
- eCopyright © 2000–2016 DFKI GmbH. All rights reserved.