public class DBHandler extends Object
Constructor and Description |
---|
DBHandler(String localeVal)
The constructor loads the database driver.
|
Modifier and Type | Method and Description |
---|---|
void |
addLocalePrefixToWikipediaTables()
Rename the Wikipedia tables adding the prefix locale: locale_text locale_page and locale_revision.
|
boolean |
askIfDeletingTable(String table)
Ask the user if the table should be deleted
|
boolean |
checkWikipediaTables()
check if tables: locale_text, locale_page and locale_revision exist.
|
void |
closeDBConnection() |
void |
createAndLoadWikipediaTables(String textFile,
String pageFile,
String revisionFile)
This function creates text, page and revision tables loading them from text files.
|
void |
createDataBaseSelectionTable()
Creates dbselectionTable
|
boolean |
createDBConnection(String host,
String db,
String user,
String passwd)
The
createDBConnection method creates the database connection. |
boolean |
createEmptyWikipediaTables()
This function creates empty text, page and revision tables (without locale prefix).
|
void |
createSelectedSentencesTable(String stopCriterion,
String featDefFileName,
String covDefConfigFileName)
Creates a selectedSentencesTable.
|
void |
createTablesDescriptionTable()
This table contains information about tables in the DB, specially for selected sentences tables.
|
void |
createWikipediaCleanTextTable() |
void |
deleteWikipediaTables()
Delete the Wikipedia tables: locale_text, locale_page and locale_revision tables.
|
String |
getCleanText(int id) |
String |
getCleanTextTableName() |
String |
getDBSelectionSentence(int id)
Get a sentence from a locale_dbselection table.
|
String |
getDBselectionTableName() |
byte[] |
getFeatures(int id) |
byte[][] |
getFeaturesBulk(int[] ids)
Bulk load a set of features as identified by their IDs.
|
HashMap<Integer,byte[]> |
getFeaturesSet(int ini,
int end,
int[] idList) |
int[] |
getIdListOfSelectedSentences(String actualTableName,
String condition)
Get a list of id's from a selected sentences table.
NOTE: use the actual table name: local + tableName + selectedsentences |
int[] |
getIdListOfType(String table,
String condition)
Get a list of id's
|
String[] |
getIds(String field,
String table)
Get a list of ids from field in table.
|
Pair<int[],byte[][]> |
getIdsAndFeatureVectors(String table,
String condition)
For the set of sentences identified by table and condition, retrieve from Mysql both the sentence ids and the corresponding
features.
|
ArrayList<String> |
getListOfTables()
Get the list of tables for this locale
|
HashMap<String,Integer> |
getMostFrequentWords(int numWords,
int maxFrequency)
Get the most frequent words and its frequency in a HashMap.
|
ArrayList<String> |
getMostFrequentWordsArray(int numWords,
int maxFrequency)
Get the most frequent words sorted by frequency (descendent) in an ArrayList.
|
int |
getNumberOfReliableSentences() |
int |
getNumberOfWords(int maxFrequency)
Get number of words in the wordList table.
|
String |
getSelectedSentence(String tableName,
int id)
Get a
|
String |
getSelectedSentencesTableName() |
String[] |
getTableDescription(String tableName)
Get the description of the tableName
|
String |
getTextFromWikiPage(String id,
int minPageLength,
StringBuilder old_id,
PrintWriter pw) |
int[] |
getUnprocessedTextIds()
This function will select just the unprocessed cleanText records.
|
String |
getWordListTableName() |
void |
insertCleanText(String text,
String page_id,
String text_id) |
void |
insertSelectedSentence(int dbselection_id,
boolean unwanted)
With the dbselection_id get first the sentence and then insert it in the locale_selectedSentences table.
|
void |
insertSentence(String sentence,
byte[] features,
boolean reliable,
boolean unknownWords,
boolean strangeSymbols,
int cleanText_id)
Insert processed sentence in dbselection
|
void |
insertWordList(HashMap<String,Integer> wordList)
Creates a wordList table, if already exists deletes it and creates a new to insert current wordList.
|
void |
loadPagesWithMWDumper(String xmlFile,
String lang,
String host,
String db,
String user,
String passwd)
Use mwdumper for extracting pages from a XML wikipedia dump file.
|
static void |
main(String[] args) |
String |
mysqlEscapeCharacters(String str)
The following characteres should be escaped: \0 An ASCII 0 (NUL) character.
|
void |
printWordList(String fileName,
String order,
int numWords,
int maxFrequency) |
void |
setDBTable(String table) |
void |
setSelectedSentencesTableName(String name)
By default the name of the selected sentence is "locale + _selectedSentences" with this function the name can be changed,
the locale prefix will be kept and the suffix "_selectedSentences".
|
void |
setSentenceRecord(int id,
String field,
boolean fieldValue)
Set a sentence record field as true/false in dbselection table.
|
void |
setTableDescription(String tableName,
String description,
String stopCriterion,
String featuresDefinitionFileName,
String covDefConfigFileName)
Set a description for table = name, it checks if the table tablesDescription exist, if not it creates it.
|
void |
setUnwantedSentenceRecord(String actualTableName,
int id,
boolean fieldValue)
This function updates the unwanted field as true/false of dbselection TABLE and selectedSentencesTable TABLE.
|
void |
setWordListTable(String table) |
boolean |
tableExist(String tableName) |
public DBHandler(String localeVal)
localeVal
- database language.public void setSelectedSentencesTableName(String name)
name
- namepublic String getSelectedSentencesTableName()
public String getCleanTextTableName()
public String getWordListTableName()
public String getDBselectionTableName()
public boolean createDBConnection(String host, String db, String user, String passwd)
createDBConnection
method creates the database connection.host
- a String
value. The database host e.g. 'localhost'.db
- a String
value. The database to connect to.user
- a String
value. Database user that has excess to the database.passwd
- a String
value. The 'secret' password.public void loadPagesWithMWDumper(String xmlFile, String lang, String host, String db, String user, String passwd) throws Exception
xmlFile
- xml dump filelang
- locale languagehost
- hostdb
- dbuser
- userpasswd
- passwdException
- Exceptionpublic boolean askIfDeletingTable(String table)
table
- tablepublic void createTablesDescriptionTable()
public void createDataBaseSelectionTable()
public void createSelectedSentencesTable(String stopCriterion, String featDefFileName, String covDefConfigFileName)
stopCriterion
- stopCriterionfeatDefFileName
- featDefFileNamecovDefConfigFileName
- covDefConfigFileNamepublic void createAndLoadWikipediaTables(String textFile, String pageFile, String revisionFile)
textFile
- textFilepageFile
- pageFilerevisionFile
- revisionFilepublic boolean createEmptyWikipediaTables()
public void addLocalePrefixToWikipediaTables()
public void deleteWikipediaTables()
public boolean checkWikipediaTables()
public void createWikipediaCleanTextTable()
public boolean tableExist(String tableName)
tableName
- tableNamepublic String[] getIds(String field, String table)
field
- fieldtable
- tablepublic int[] getUnprocessedTextIds()
public ArrayList<String> getListOfTables()
public void setDBTable(String table)
public void setWordListTable(String table)
public void insertSentence(String sentence, byte[] features, boolean reliable, boolean unknownWords, boolean strangeSymbols, int cleanText_id)
sentence
- text of the sentence.features
- features if sentences is reliable.reliable
- true/false.unknownWords
- true/false.strangeSymbols
- true/false.cleanText_id
- the id of the cleanText this sentence comes from.public void insertSelectedSentence(int dbselection_id, boolean unwanted)
dbselection_id
- dbselection_idunwanted
- unwantedpublic void insertWordList(HashMap<String,Integer> wordList)
wordList
- wordListpublic void closeDBConnection()
public int getNumberOfReliableSentences()
public int[] getIdListOfType(String table, String condition)
table
- cleanText, wordList, dbselection (no need to add locale) (NOTE: this function does not work for the
selectedSentences table, for this table use the function getIdListOfSelectedSentences ).condition
- reliable, unknownWords, strangeSymbols, selected, unwanted = true/false (combined are posible:
"reliable=true and unwanted=false"); or condition=null for querying without condition.public Pair<int[],byte[][]> getIdsAndFeatureVectors(String table, String condition)
table
- tablecondition
- conditionpublic int[] getIdListOfSelectedSentences(String actualTableName, String condition)
actualTableName
- = locale_tableName_selectedSentencescondition
- unwanted=true/falsepublic int getNumberOfWords(int maxFrequency)
maxFrequency
- max frequency of a word to be considered in the list, if maxFrequency=0 it will retrieve all the words with
frequency≥1.public HashMap<String,Integer> getMostFrequentWords(int numWords, int maxFrequency)
numWords
- max number of words to retrieve, if numWords=0 then it will retrieve all the words in the list in descending
order of frequency.maxFrequency
- max frequency of a word to be considered in the list, if maxFrequency=0 it will retrieve all the words with
frequency≥1.public ArrayList<String> getMostFrequentWordsArray(int numWords, int maxFrequency)
numWords
- max number of words to retrieve, if numWords=0 then it will retrieve all the words in the list in descending
order of frequency.maxFrequency
- max frequency of a word to be considered in the list, if maxFrequency=0 it will retrieve all the words with
frequency≥1.public void printWordList(String fileName, String order, int numWords, int maxFrequency)
fileName
- file to write the listorder
- word or frequencynumWords
- max number of words, if numWords=0 then it will retrieve all the words in the list.maxFrequency
- max frequency of a word to be considered in the list, if maxFrequency=0 it will retrieve all the words with
frequency≥1.public String getDBSelectionSentence(int id)
id
- dbselection (no need to add locale)public String getSelectedSentence(String tableName, int id)
tableName
- tableNameid
- idpublic String getTextFromWikiPage(String id, int minPageLength, StringBuilder old_id, PrintWriter pw)
public String getCleanText(int id)
public void setSentenceRecord(int id, String field, boolean fieldValue)
id
- idfield
- reliable, unknownWords, strangeSymbols, selected or unwanted = true/falsefieldValue
- true/false (as string)public void setUnwantedSentenceRecord(String actualTableName, int id, boolean fieldValue)
actualTableName
- including local and _selectedSentencesid
- id in dbselection tablefieldValue
- true/falsepublic void setTableDescription(String tableName, String description, String stopCriterion, String featuresDefinitionFileName, String covDefConfigFileName)
tableName
- the name of the table, it can not be nulldescription
- if no description set to nullstopCriterion
- if no stopCriterion set to nullfeaturesDefinitionFileName
- if no featuresDefinitionFileName set to nullcovDefConfigFileName
- if no covDefConfigFileNamen set to nullpublic String[] getTableDescription(String tableName)
tableName
- tableNamepublic byte[] getFeatures(int id)
public byte[][] getFeaturesBulk(int[] ids)
ids
- a sorted array of feature IDs.public String mysqlEscapeCharacters(String str)
str
- strCopyright © 2000–2016 DFKI GmbH. All rights reserved.