public class Preprocess extends InternalModule
Can process following formats:
May include:
Modifier and Type | Field and Description |
---|---|
protected String |
cardinalRule |
protected String |
ordinalRule |
protected String |
yearRule |
logger, state
MODULE_OFFLINE, MODULE_RUNNING
Constructor and Description |
---|
Preprocess() |
Modifier and Type | Method and Description |
---|---|
protected void |
expand(Document doc)
processes a document in mary xml format, from Tokens to Words which can be phonemised.
|
protected String |
expandAbbreviation(String abbrev,
boolean isCapital) |
protected String |
expandAcronym(String acronym) |
protected String |
expandConsonants(String consonants)
add a space between each char of a string
|
protected String |
expandDate(String date) |
protected String |
expandDuration(String duration) |
protected String |
expandHashtag(String hashtag) |
protected String |
expandMoney(String money,
String currency) |
protected String |
expandNumber(double number) |
protected String |
expandNumberS(String numberS)
expands a digit followed by an s.
|
protected String |
expandOrdinal(double number) |
protected String |
expandRange(String range) |
protected String |
expandRealNumber(String number) |
protected String |
expandTime(String time,
boolean isNextTokenTime) |
protected String |
expandURL(String email)
expand a URL string partially by splitting by @, / and .
|
protected String |
expandWordNumber(String wordnumseq) |
protected String |
expandYear(double number) |
protected String |
expandYearBCAD(String year) |
protected static String |
getOrdinalRuleName(com.ibm.icu.text.RuleBasedNumberFormat rbnf)
Try to extract the rule name for "expand ordinal" from the given RuleBasedNumberFormat.
|
protected static String |
getYearRuleName(com.ibm.icu.text.RuleBasedNumberFormat rbnf)
Try to extract the rule name for "expand year" from the given RuleBasedNumberFormat.
|
static Map<Object,Object> |
loadAbbrevMap() |
MaryData |
process(MaryData d)
Perform this module's processing on abstract "MaryData" input
d . |
protected String |
splitContraction(String contraction) |
getInputType, getLocale, getOutputType, getState, inputType, name, outputType, powerOnSelfTest, shutdown, startup
protected final String cardinalRule
protected final String ordinalRule
protected final String yearRule
public MaryData process(MaryData d) throws Exception
InternalModule
d
. Subclasses need to make sure that the
process()
method is thread-safe, because in server-mode, it will be called from different threads at the same
time. A sensible way to do this seems to be not to use any global or static variables, or to use them read-only.
process
in interface MaryModule
process
in class InternalModule
d
- doutputType()
encapsulating the processing result.
This method just returns its input. Subclasses should override this.
Exception
- Exceptionprotected void expand(Document doc) throws ParseException, IOException, MaryConfigurationException
doc
- docParseException
- parse exceptionIOException
- IO ExceptionMaryConfigurationException
- mary configuration exceptionprotected String expandNumber(double number)
protected String expandOrdinal(double number)
protected String expandYear(double number)
protected String expandURL(String email)
email
- emailprotected String expandConsonants(String consonants)
consonants
- consonantsprotected String expandNumberS(String numberS)
numberS
- numberSprotected String expandAbbreviation(String abbrev, boolean isCapital)
abbrev
- the token to be expandedisCapital
- whether the following token begins with a capital letterprotected String expandDate(String date) throws ParseException
ParseException
protected String expandTime(String time, boolean isNextTokenTime)
time
- the token to be expandedisNextTokenTime
- whether the following token contains am or pmprotected static String getOrdinalRuleName(com.ibm.icu.text.RuleBasedNumberFormat rbnf)
The rule name is locale sensitive, but usually starts with "%spellout-ordinal".
rbnf
- The RuleBasedNumberFormat from where we will try to extract the rule name.protected static String getYearRuleName(com.ibm.icu.text.RuleBasedNumberFormat rbnf)
The rule name is locale sensitive, but usually starts with "%spellout-numbering-year".
rbnf
- The RuleBasedNumberFormat from where we will try to extract the rule name.public static Map<Object,Object> loadAbbrevMap() throws IOException
IOException
Copyright © 2000–2016 DFKI GmbH. All rights reserved.