public class JPhonemiser extends InternalModule
Modifier and Type | Field and Description |
---|---|
protected AllophoneSet |
allophoneSet |
protected FSTLookup |
lexicon |
protected TrainedLTS |
lts |
protected Pattern |
punctuationPosRegex |
protected boolean |
removeTrailingOneFromPhones |
protected Pattern |
unpronounceablePosRegex |
protected Map<String,List<String>> |
userdict |
logger, state
MODULE_OFFLINE, MODULE_RUNNING
Constructor and Description |
---|
JPhonemiser(String propertyPrefix) |
JPhonemiser(String componentName,
MaryDataType inputType,
MaryDataType outputType,
String allophonesProperty,
String userdictProperty,
String lexiconProperty,
String ltsProperty)
Constructor providing the individual filenames of files that are required.
|
JPhonemiser(String componentName,
MaryDataType inputType,
MaryDataType outputType,
String allophonesProperty,
String userdictProperty,
String lexiconProperty,
String ltsProperty,
String removetrailingonefromphonesProperty)
Constructor providing the individual filenames of files that are required.
|
Modifier and Type | Method and Description |
---|---|
AllophoneSet |
getAllophoneSet()
Access the allophone set underlying this phonemiser.
|
boolean |
isPosPunctuation(String pos)
Based on the regex compiled in
setPunctuationPosRegex() , determine whether a given POS string is classified as
punctuation |
boolean |
isUnpronounceable(String pos) |
String |
lexiconLookup(String text,
String pos)
Look a given text up in the (standard) lexicon.
|
boolean |
maybePronounceable(String text,
String pos)
Determine whether token should be pronounceable, based on text and POS tag.
|
String |
phonemise(String text,
String pos,
StringBuilder g2pMethod)
Phonemise the word text.
|
MaryData |
process(MaryData d)
Perform this module's processing on abstract "MaryData" input
d . |
protected Map<String,List<String>> |
readLexicon(String lexiconFilename)
Read a lexicon.
|
protected void |
setPh(Element t,
String ph) |
protected void |
setPunctuationPosRegex()
Compile a regex pattern used to determine whether tokens are processed as punctuation or not, based on whether their
pos attribute matches the pattern. |
protected void |
setUnpronounceablePosRegex()
Compile a regex pattern used to determine whether tokens are processed as unprounounceable or not, based on whether their
pos attribute matches the pattern. |
void |
startup()
Allow the module to start up, performing whatever is necessary to become operational.
|
String |
userdictLookup(String text,
String pos)
look a given text up in the userdict.
|
getInputType, getLocale, getOutputType, getState, inputType, name, outputType, powerOnSelfTest, shutdown
protected FSTLookup lexicon
protected TrainedLTS lts
protected boolean removeTrailingOneFromPhones
protected AllophoneSet allophoneSet
protected Pattern punctuationPosRegex
protected Pattern unpronounceablePosRegex
public JPhonemiser(String propertyPrefix) throws IOException, MaryConfigurationException
public JPhonemiser(String componentName, MaryDataType inputType, MaryDataType outputType, String allophonesProperty, String userdictProperty, String lexiconProperty, String ltsProperty) throws IOException, MaryConfigurationException
componentName
- componentNameinputType
- inputTypeoutputType
- outputTypeallophonesProperty
- allophonesPropertyuserdictProperty
- userdictPropertylexiconProperty
- lexiconPropertyltsProperty
- ltsPropertyIOException
- IOExceptionMaryConfigurationException
- MaryConfigurationExceptionpublic JPhonemiser(String componentName, MaryDataType inputType, MaryDataType outputType, String allophonesProperty, String userdictProperty, String lexiconProperty, String ltsProperty, String removetrailingonefromphonesProperty) throws IOException, MaryConfigurationException
componentName
- componentNameinputType
- inputTypeoutputType
- outputTypeallophonesProperty
- allophonesPropertyuserdictProperty
- userdictPropertylexiconProperty
- lexiconPropertyltsProperty
- ltsPropertyremovetrailingonefromphonesProperty
- removetrailingonefromphonesPropertyIOException
- IOExceptionMaryConfigurationException
- MaryConfigurationExceptionpublic void startup() throws Exception
MaryModule
startup
in interface MaryModule
startup
in class InternalModule
Exception
- Exceptionpublic MaryData process(MaryData d) throws Exception
InternalModule
d
. Subclasses need to make sure that the
process()
method is thread-safe, because in server-mode, it will be called from different threads at the same
time. A sensible way to do this seems to be not to use any global or static variables, or to use them read-only.
process
in interface MaryModule
process
in class InternalModule
d
- doutputType()
encapsulating the processing result.
This method just returns its input. Subclasses should override this.
Exception
- Exceptionpublic String phonemise(String text, String pos, StringBuilder g2pMethod)
text
- the textual (graphemic) form of a word.pos
- the part-of-speech of the wordg2pMethod
- This is an awkward way to return a second String parameter via a StringBuilder. If a phonemisation of the text
is found, this parameter will be filled with the method of phonemisation ("lexicon", ... "rules").public String lexiconLookup(String text, String pos)
text
- textpos
- pospublic String userdictLookup(String text, String pos)
text
- textpos
- pospublic AllophoneSet getAllophoneSet()
protected Map<String,List<String>> readLexicon(String lexiconFilename) throws IOException
lexiconFilename
- lexiconFilenameIOException
- IOExceptionprotected void setPunctuationPosRegex()
pos
attribute matches the pattern.protected void setUnpronounceablePosRegex()
pos
attribute matches the pattern.public boolean isPosPunctuation(String pos)
setPunctuationPosRegex()
, determine whether a given POS string is classified as
punctuationpos
- the POS tagNullPointerException
- if the regex pattern is null (because it hasn't been set during module startup)public boolean isUnpronounceable(String pos)
public boolean maybePronounceable(String text, String pos)
text
- the text of the tokenpos
- the POS tag of the tokenCopyright © 2000–2016 DFKI GmbH. All rights reserved.