public class ProsodyGeneric extends InternalModule
Modifier and Type | Field and Description |
---|---|
protected boolean |
accentedSyllables |
protected String |
accentPriorities |
protected boolean |
applyParagraphDeclination |
protected HashMap<String,Object> |
listMap |
protected static Pattern |
nextPlusXAttributesPattern |
protected static Pattern |
nextPlusXTextPattern |
protected String |
paragraphDeclination |
protected static Pattern |
previousMinusXAttributesPattern |
protected static Pattern |
previousMinusXTextPattern |
protected Properties |
priorities |
protected String |
syllableAccents |
protected HashMap<String,String> |
toBI2ContourMap |
protected String |
tobiPredFilename |
protected HashMap<String,Element> |
tobiPredMap |
logger, state
MODULE_OFFLINE, MODULE_RUNNING
Constructor and Description |
---|
ProsodyGeneric() |
ProsodyGeneric(Locale locale) |
ProsodyGeneric(Locale locale,
String propertyPrefix) |
ProsodyGeneric(MaryDataType inputType,
MaryDataType outputType,
Locale locale,
String tobipredFileName,
String accentPriorities,
String syllableAccents,
String paragraphDeclination) |
ProsodyGeneric(String locale) |
ProsodyGeneric(String locale,
String propertyPrefix) |
Modifier and Type | Method and Description |
---|---|
protected boolean |
applyRules(Node n)
Verify whether this Node has a parent preventing the application of intonation rules.
|
protected void |
buildListMap() |
protected boolean |
checkAttributes(Element currentRulePart,
Element token)
checks rule part with tag "attributes"; checks if the MaryXML attributes and values of current token are the same as in the
rule
|
protected boolean |
checkAttributesOfOtherToken(String tag,
Element currentRulePart,
int position,
NodeList tokens)
checks rule part with tag "nextAttributes","previousAttributes","nextPlusXAttributes","previousMinusXAttributes"; checks if
the MaryXML attributes and values of other token than the current one are the same as in rule (f.e.
|
protected boolean |
checkFolTokens(Element currentRulePart,
int position,
NodeList tokens)
checks rule part with tag "folTokens"; there is only the "num" attribute right now; checks if the number of the following
tokens after the current token is the same as the value of the num attribute; f.e.
|
protected boolean |
checkFolWords(Element currentRulePart,
int position,
NodeList tokens)
checks rule part with tag "folWords"; there is only the "num" attribute right now; checks if the number of the following
words after the current token is the same as the value of the num attribute; f.e.
|
protected boolean |
checkList(String currentVal,
String tokenValue)
Checks if tokenValue is contained in list.
|
protected boolean |
checkPrevTokens(Element currentRulePart,
int position,
NodeList tokens)
checks rule part with tag "prevTokens"; there is only the "num" attribute right now; checks if the number of the tokens
preceding the current token is the same as the value of the num attribute; f.e.
|
protected boolean |
checkPrevWords(Element currentRulePart,
int position,
NodeList tokens)
checks rule part with tag "prevWords"; there is only the "num" attribute right now; checks if the number of the words
preceding the current token is the same as the value of the num attribute; f.e.
|
protected boolean |
checkProsodicPosition(Element currentRulePart,
String prosodicPositionType)
checks rule part with tag "prosodicPosition"; there is only the "type" attribute right now: checks if prosodic position of
a token is the same as the value of the type attribute in the rule; values: prenuclear, nuclearParagraphFinal,
nuclearParagraphNonFinal, postnuclear
|
protected boolean |
checkRulePart(Element currentRulePart,
Element token,
NodeList tokens,
int position,
String sentenceType,
String specialPositionType,
String tokenText)
checks condition of a rule part, f.e.
|
protected boolean |
checkSentence(Element currentRulePart,
String sentenceType)
checks rule part with tag "sentence"; there is only the "type" attribute right now: checks if sentence type of a token is
the same as the value of the type attribute in the rule
|
protected boolean |
checkSpecialPosition(Element currentRulePart,
String specialPositionType)
checks rule part with tag "specialPosition"; there is only the "type" attribute right now: checks if specialPosition value
of a token is the same as the value of the type attribute in the rule; values: endofvorfeld, endofpar (end of paragraph)
|
protected boolean |
checkText(Element currentRulePart,
String tokenText)
checks rule part with tag "text"; there is only the "word" attribute right now: checks if text of a token is the same as
the value of the word attribute in the rule
|
protected boolean |
checkTextOfOtherToken(String tag,
Element currentRulePart,
int position,
NodeList tokens)
checks rule part with tag "nextText","previousText","nextPlusXText" or "previousMinusXText"; there is only the "word"
attribute right now: checks if text of a token is the same as the value of the word attribute in the rule
|
protected void |
copyAccentsToSyllables(Document doc)
Go through all tokens in a document, and copy any accents to the first accented syllable.
|
protected void |
getAccentPosition(Element token,
NodeList tokens,
int position,
String sentenceType,
String specialPositionType)
checks if token receives an accent or not the information is contained in the accentposition part of rules in xml file the
token attribute "accent" receives the value "tone","force"(force accent(Druckakzent)) or ""(no accent)
|
protected boolean |
getAccentShape(Element token,
NodeList tokens,
int position,
String sentenceType,
String specialPositionType,
boolean nucleusAssigned)
determines accent types; tokens with accent="tone" will receive an accent type (f.e."L+H*"), accent="force" becomes "*" the
relevant information is contained in the accentshape part of rules in xml file
|
protected Element |
getBoundary(Element token,
NodeList tokens,
int position,
String sentenceType,
String specialPositionType,
boolean invalidXML,
Element firstTokenInPhrase)
checks if a boundary is to be inserted after the current token the information is contained in the boundaries part of rules
in xml file
|
protected String |
getForceAccent(Element token)
Check whether
token is enclosed by a <prosody> element containing an attribute
force-accent . |
protected String |
getSentenceType(NodeList tokens)
determination of sentence type values: decl, excl, interrog, interrogYN or interrogW
|
protected Element |
insertBoundary(Element token,
String tone,
int bi)
Insert a boundary after token, with the given tone and breakindex.
|
protected Element |
insertMajorBoundary(NodeList tokens,
int i,
Element firstToken,
String tone,
int breakindex)
Insert a major boundary after token number
i in tokens . |
protected boolean |
insertPhraseNode(Element first,
Element last)
Inserte a phrase element, enclosing the first and last element, into the tree.
|
protected boolean |
isPunctuation(Element token)
Verify whether a given token is a punctuation.
|
protected void |
loadTobiPredRules() |
MaryData |
process(MaryData d)
Perform this module's processing on abstract "MaryData" input
d . |
protected void |
processSentence(Element sentence) |
protected Object |
readListFromResource(String resourceName)
Read a list from an external file.
|
protected void |
setAccent(Element token,
String accent)
Assign an accent to the given token.
|
void |
startup()
Allow the module to start up, performing whatever is necessary to become operational.
|
getInputType, getLocale, getOutputType, getState, inputType, name, outputType, powerOnSelfTest, shutdown
protected String paragraphDeclination
protected boolean applyParagraphDeclination
protected String syllableAccents
protected boolean accentedSyllables
protected String accentPriorities
protected Properties priorities
protected String tobiPredFilename
protected static final Pattern nextPlusXTextPattern
protected static final Pattern previousMinusXTextPattern
protected static final Pattern nextPlusXAttributesPattern
protected static final Pattern previousMinusXAttributesPattern
public ProsodyGeneric()
public ProsodyGeneric(MaryDataType inputType, MaryDataType outputType, Locale locale, String tobipredFileName, String accentPriorities, String syllableAccents, String paragraphDeclination)
public ProsodyGeneric(String locale)
public ProsodyGeneric(Locale locale)
public void startup() throws Exception
MaryModule
startup
in interface MaryModule
startup
in class InternalModule
Exception
- Exceptionprotected void loadTobiPredRules() throws FactoryConfigurationError, ParserConfigurationException, SAXException, IOException, NoSuchPropertyException, MaryConfigurationException
protected void buildListMap() throws IOException
IOException
protected Object readListFromResource(String resourceName) throws IOException
.txt
). Subclasses may override this class to provide additional file formats. They must make sure that
checkList()
can deal with all list formats.resourceName
- resource file in classpath from which to read the list; suffix identifies list format.IllegalArgumentException
- if the fileName suffix cannot be identified as a list file format.IOException
- if the file given in fileName cannot be found or read frompublic MaryData process(MaryData d) throws Exception
InternalModule
d
. Subclasses need to make sure that the
process()
method is thread-safe, because in server-mode, it will be called from different threads at the same
time. A sensible way to do this seems to be not to use any global or static variables, or to use them read-only.
process
in interface MaryModule
process
in class InternalModule
d
- doutputType()
encapsulating the processing result.
This method just returns its input. Subclasses should override this.
Exception
- Exceptionprotected void processSentence(Element sentence)
protected void getAccentPosition(Element token, NodeList tokens, int position, String sentenceType, String specialPositionType)
token
- (current token)tokens
- (list of all tokens in sentence)position
- (position in token list)sentenceType
- (declarative, exclamative or interrogative)specialPositionType
- (end of vorfeld or end of paragraph)protected boolean getAccentShape(Element token, NodeList tokens, int position, String sentenceType, String specialPositionType, boolean nucleusAssigned)
token
- (current token)tokens
- (list of all tokens in sentence)position
- positionsentenceType
- (declarative, exclamative or interrogative)specialPositionType
- (position in sentence)nucleusAssigned
- (test, if nuclear accent is already assigned)protected Element getBoundary(Element token, NodeList tokens, int position, String sentenceType, String specialPositionType, boolean invalidXML, Element firstTokenInPhrase)
token
- (current token)tokens
- (list of tokens in sentence)position
- (position in token list)sentenceType
- (declarative, exclamative or interrogative)specialPositionType
- (endofvorfeld if sentence has vorfeld and the next token is a finite verb or end of paragraph)invalidXML
- (true if xml structure allows boundary insertion)firstTokenInPhrase
- (begin of intonation phrase)protected boolean checkRulePart(Element currentRulePart, Element token, NodeList tokens, int position, String sentenceType, String specialPositionType, String tokenText)
currentRulePart
- currentRuleParttoken
- (current token)tokens
- (list of all tokens)position
- (position in token list)sentenceType
- (declarative, exclamative or interrogative)specialPositionType
- (special position in sentence(end of vorfeld) or text(end of paragraph))tokenText
- (text of token)protected boolean checkText(Element currentRulePart, String tokenText)
currentRulePart
- currentRuleParttokenText
- tokenTextprotected boolean checkTextOfOtherToken(String tag, Element currentRulePart, int position, NodeList tokens)
tag
- tagcurrentRulePart
- currentRulePartposition
- positiontokens
- tokensprotected boolean checkFolTokens(Element currentRulePart, int position, NodeList tokens)
currentRulePart
- currentRulePartposition
- positiontokens
- tokensprotected boolean checkPrevTokens(Element currentRulePart, int position, NodeList tokens)
currentRulePart
- currentRulePartposition
- positiontokens
- tokensprotected boolean checkFolWords(Element currentRulePart, int position, NodeList tokens)
currentRulePart
- currentRulePartposition
- positiontokens
- tokensprotected boolean checkPrevWords(Element currentRulePart, int position, NodeList tokens)
currentRulePart
- currentRulePartposition
- positiontokens
- tokensprotected boolean checkSentence(Element currentRulePart, String sentenceType)
currentRulePart
- currentRulePartsentenceType
- sentenceTypeprotected boolean checkSpecialPosition(Element currentRulePart, String specialPositionType)
currentRulePart
- currentRulePartspecialPositionType
- specialPositionTypeprotected boolean checkProsodicPosition(Element currentRulePart, String prosodicPositionType)
currentRulePart
- currentRulePartprosodicPositionType
- prosodicPositionTypeprotected boolean checkAttributes(Element currentRulePart, Element token)
currentRulePart
- currentRuleParttoken
- tokenprotected boolean checkAttributesOfOtherToken(String tag, Element currentRulePart, int position, NodeList tokens)
tag
- tagcurrentRulePart
- currentRulePartposition
- positiontokens
- tokensprotected boolean checkList(String currentVal, String tokenValue)
currentVal
- the condition to check; can be either INLIST:
or !INLIST:
followed by the list name to
check.tokenValue
- value to look up in the listprotected String getSentenceType(NodeList tokens)
tokens
- tokensprotected void setAccent(Element token, String accent)
token
- a token elementaccent
- the accent string to assign.protected Element insertBoundary(Element token, String tone, int bi)
token
- tokentone
- tonebi
- biprotected Element insertMajorBoundary(NodeList tokens, int i, Element firstToken, String tone, int breakindex)
i
in tokens
.
Also inserts a phrase tag at the appropriate position.
tokens
- tokensi
- ifirstToken
- firstTokentone
- tonebreakindex
- breakindexprotected boolean insertPhraseNode(Element first, Element last)
first
- firstlast
- lastprotected boolean applyRules(Node n)
n
- ntrue
if rules are to be applied, false
otherwise.protected void copyAccentsToSyllables(Document doc)
doc
- docprotected String getForceAccent(Element token)
token
is enclosed by a <prosody>
element containing an attribute
force-accent
.token
- tokenforce-accent
attribute, if one exists, or the empty string otherwise.protected boolean isPunctuation(Element token)
token
- the t element to be tested.Copyright © 2000–2016 DFKI GmbH. All rights reserved.