public class LexiconCreator extends Object
grapheme | ' a l - l o - p h o n e s | (optional-part-of-speech)
Hereby, the allophones must correspond to a
defined allophone set, given in the constructor. The file's encoding is expected to be UTF-8. Subclasses of LexiconCreator can
override prepareLexicon() to provide data in this format.AllophoneSet
Modifier and Type | Field and Description |
---|---|
protected AllophoneSet |
allophoneSet |
protected int |
context |
protected boolean |
convertToLowercase |
protected String |
fstFilename |
protected String |
lexiconFilename |
protected org.apache.log4j.Logger |
logger |
protected String |
ltsFilename |
protected boolean |
predictStress |
Constructor and Description |
---|
LexiconCreator(AllophoneSet allophoneSet,
String lexiconFilename,
String fstFilename,
String ltsFilename)
Initialise a new lexicon creator.
|
LexiconCreator(AllophoneSet allophoneSet,
String lexiconFilename,
String fstFilename,
String ltsFilename,
boolean convertToLowercase,
boolean predictStress,
int context)
Initialize a new lexicon creator.
|
Modifier and Type | Method and Description |
---|---|
protected void |
compileFST() |
protected void |
compileLTS() |
void |
createLexicon() |
static void |
main(String[] args) |
protected void |
prepareLexicon()
This base implementation does nothing.
|
protected void |
testFST() |
protected void |
testLTS() |
protected org.apache.log4j.Logger logger
protected AllophoneSet allophoneSet
protected String lexiconFilename
protected String fstFilename
protected String ltsFilename
protected boolean convertToLowercase
protected boolean predictStress
protected int context
public LexiconCreator(AllophoneSet allophoneSet, String lexiconFilename, String fstFilename, String ltsFilename)
allophoneSet
- this specifies the set of phonetic symbols that can be used in the lexicon, and provides the locale of the
lexiconlexiconFilename
- where to find the plain-text lexiconfstFilename
- where to create the compressed lexicon FST fileltsFilename
- where to create the letter-to-sound prediction tree.public LexiconCreator(AllophoneSet allophoneSet, String lexiconFilename, String fstFilename, String ltsFilename, boolean convertToLowercase, boolean predictStress, int context)
allophoneSet
- this specifies the set of phonetic symbols that can be used in the lexicon, and provides the locale of the
lexiconlexiconFilename
- where to find the plain-text lexiconfstFilename
- where to create the compressed lexicon FST fileltsFilename
- where to create the letter-to-sound prediction tree.convertToLowercase
- if true, Letter to sound rules built with this lexicon creator will convert graphemes to lowercase before
prediction, using the locale given in the allophone set.predictStress
- if true, letter-to-sound rules will predict stress.context
- the number of characters to the left and to the right of the current character will be used as predictive
features.protected void prepareLexicon() throws IOException
IOException
- IOExceptionprotected void compileFST() throws IOException
IOException
protected void testFST() throws IOException
IOException
protected void compileLTS() throws IOException
IOException
protected void testLTS() throws IOException, MaryConfigurationException
Copyright © 2000–2016 DFKI GmbH. All rights reserved.