public class FeatureDefinition extends Object
| Modifier and Type | Field and Description |
|---|---|
static String |
BYTEFEATURES |
static String |
CONTINUOUSFEATURES |
static String |
EDGEFEATURE |
static String |
EDGEFEATURE_END |
static String |
EDGEFEATURE_START |
static String |
FEATURESIMILARITY |
static String |
NULLVALUE |
static String |
SHORTFEATURES |
static char |
WEIGHT_SEPARATOR |
| Constructor and Description |
|---|
FeatureDefinition(BufferedReader input,
boolean readWeights)
Create a feature definition object, reading textual data from the given BufferedReader.
|
FeatureDefinition(ByteBuffer bb)
Create a feature definition object, reading binary data from the given byte buffer.
|
FeatureDefinition(DataInput input)
Create a feature definition object, reading binary data from the given DataInput.
|
| Modifier and Type | Method and Description |
|---|---|
boolean |
contains(FeatureDefinition other)
Determine whether this FeatureDefinition is a superset of, or equal to, another FeatureDefinition.
|
FeatureVector |
createEdgeFeatureVector(int unitIndex,
boolean start)
Create a feature vector that marks a start or end of a unit.
|
static int |
diff(FeatureVector v1,
FeatureVector v2)
Compares two feature vectors in terms of how many discrete features they have in common.
|
boolean |
equals(Object obj)
Determine whether two feature definitions are equal, regarding both the actual feature definitions and the weights.
|
boolean |
featureEquals(FeatureDefinition other)
Determine whether two feature definitions are equal, with respect to number, names, and possible values of the three kinds
of features (byte-valued, short-valued, continuous).
|
String |
featureEqualsAnalyse(FeatureDefinition other)
An extension of the previous method.
|
void |
generateAllDotDescForWagon(PrintWriter out)
Export this feature definition in the "all.desc" format which can be read by wagon.
|
void |
generateAllDotDescForWagon(PrintWriter out,
Set<String> featuresToIgnore)
Export this feature definition in the "all.desc" format which can be read by wagon.
|
void |
generateFeatureWeightsFile(PrintWriter out)
Print this feature definition plus weights to a .txt file
|
String[] |
getByteFeatureNameArray()
Get names of byte features
|
String[] |
getContinuousFeatureNameArray()
Get names of continuous features
|
int |
getFeatureIndex(String featureName)
Translate between a feature name and a feature index.
|
int[] |
getFeatureIndexArray(String[] featureName)
Translate between an array of feature names and an array of feature indexes.
|
String |
getFeatureName(int index)
Translate between a feature index and a feature name.
|
String[] |
getFeatureNameArray()
Get names of all features
|
String[] |
getFeatureNameArray(int[] index)
Translate between an array of feature indexes and an array of feature names.
|
String |
getFeatureNames()
List all feature names, separated by white space, in their order of definition.
|
byte |
getFeatureValueAsByte(int featureIndex,
String value)
For the feature with the given index number, translate its String value to its byte value.
|
byte |
getFeatureValueAsByte(String featureName,
String value)
For the feature with the given name, translate its String value to its byte value.
|
short |
getFeatureValueAsShort(int featureIndex,
String value)
For the feature with the given name, translate its String value to its short value.
|
short |
getFeatureValueAsShort(String featureName,
String value)
For the feature with the given name, translate its String value to its short value.
|
String |
getFeatureValueAsString(int featureIndex,
int value)
For the feature with the given index number, translate its byte or short value to its String value.
|
String |
getFeatureValueAsString(String featureName,
FeatureVector fv)
Simple access to string-based features.
|
float[] |
getFeatureWeights() |
int |
getNumberOfByteFeatures()
Get the number of byte features.
|
int |
getNumberOfContinuousFeatures()
Get the number of continuous features.
|
int |
getNumberOfFeatures()
Get the total number of features.
|
int |
getNumberOfShortFeatures()
Get the number of short features.
|
int |
getNumberOfValues(int featureIndex)
Get the number of possible values for the feature with the given index number.
|
String[] |
getPossibleValues(int featureIndex)
Get the list of possible String values for the feature with the given index number.
|
String[] |
getShortFeatureNameArray()
Get names of short features
|
float |
getSimilarity(int featureIndex,
byte i,
byte j)
To get a similarity between two feature values
|
float |
getWeight(int featureIndex)
For the feature with the given index, return the weight.
|
String |
getWeightFunctionName(int featureIndex)
Get the name of any weighting function associated with the given feature index.
|
boolean |
hasFeature(String name)
Indicate whether the feature definition contains the feature with the given name
|
boolean |
hasFeatureValue(int featureIndex,
String featureValue)
Query a feature as identified by the given featureIndex as to whether the given featureValue is a known value of that
feature.
|
boolean |
hasFeatureValue(String featureName,
String featureValue)
Query a feature as identified by the given featureName as to whether the given featureValue is a known value of that
feature.
|
boolean |
hasSimilarityMatrix(int featureIndex)
true, if given feature index contains similarity matrix
|
boolean |
hasSimilarityMatrix(String featureName)
true, if given feature name contains similarity matrix
|
boolean |
isByteFeature(int index)
Determine whether the feature with the given index number is a byte feature.
|
boolean |
isByteFeature(String featureName)
Determine whether the feature with the given name is a byte feature.
|
boolean |
isContinuousFeature(int index)
Determine whether the feature with the given index number is a continuous feature.
|
boolean |
isContinuousFeature(String featureName)
Determine whether the feature with the given name is a continuous feature.
|
boolean |
isShortFeature(int index)
Determine whether the feature with the given index number is a short feature.
|
boolean |
isShortFeature(String featureName)
Determine whether the feature with the given name is a short feature.
|
FeatureVector |
readFeatureVector(int currentUnitIndex,
ByteBuffer bb)
Create a feature vector consistent with this feature definition by reading the data from the byte buffer.
|
FeatureVector |
readFeatureVector(int currentUnitIndex,
DataInput input)
Create a feature vector consistent with this feature definition by reading the data from the given input.
|
FeatureDefinition |
subset(String[] featureNamesToDrop)
Create a new FeatureDefinition that contains a subset of the features in this.
|
String |
toFeatureString(FeatureVector fv)
Convert a feature vector into a String representation.
|
FeatureVector |
toFeatureVector(int unitIndex,
byte[] bytes,
short[] shorts,
float[] floats) |
FeatureVector |
toFeatureVector(int unitIndex,
String featureString)
Create a feature vector consistent with this feature definition by reading the data from a String representation.
|
void |
writeBinaryTo(DataOutput out)
Write this feature definition in binary format to the given output.
|
void |
writeTo(PrintWriter out,
boolean writeWeights)
Export this feature definition in the text format which can also be read by this class.
|
public static final String BYTEFEATURES
public static final String SHORTFEATURES
public static final String CONTINUOUSFEATURES
public static final String FEATURESIMILARITY
public static final char WEIGHT_SEPARATOR
public static final String EDGEFEATURE
public static final String EDGEFEATURE_START
public static final String EDGEFEATURE_END
public static final String NULLVALUE
public FeatureDefinition(BufferedReader input, boolean readWeights) throws IOException
input - a BufferedReader from which a textual feature definition can be read.readWeights - a boolean indicating whether or not to read weights from input. If weights are read, they will be normalized so
that they sum to one.IOException - if a reading problem occurspublic FeatureDefinition(DataInput input) throws IOException
input - a DataInputStream or a RandomAccessFile from which a binary feature definition can be read.IOException - if a reading problem occurspublic FeatureDefinition(ByteBuffer bb) throws IOException
bb - a byte buffer from which a binary feature definition can be read.IOException - if a reading problem occurspublic void writeBinaryTo(DataOutput out) throws IOException
out - a DataOutputStream or RandomAccessFile to which the FeatureDefinition should be written.IOException - if a problem occurs while writing.public int getNumberOfFeatures()
public int getNumberOfByteFeatures()
public int getNumberOfShortFeatures()
public int getNumberOfContinuousFeatures()
public float getWeight(int featureIndex)
featureIndex - featureIndexpublic float[] getFeatureWeights()
public String getWeightFunctionName(int featureIndex)
featureIndex - featureIndexpublic String getFeatureName(int index)
index - a feature index, as could be used to access a feature value in a FeatureVector.IndexOutOfBoundsException - if index<0 or index>getNumberOfFeatures()public String[] getFeatureNameArray(int[] index)
index - an array of feature indexes, as could be used to access a feature value in a FeatureVector.IndexOutOfBoundsException - if any of the indexes is <0 or >getNumberOfFeatures()public String[] getFeatureNameArray()
public String[] getByteFeatureNameArray()
public String[] getShortFeatureNameArray()
public String[] getContinuousFeatureNameArray()
public String getFeatureNames()
public boolean hasFeature(String name)
name - the feature name in question, e.g. "next_next_phone"public boolean hasFeatureValue(String featureName, String featureValue)
featureName - featureNamefeatureValue - featureValuepublic boolean hasFeatureValue(int featureIndex,
String featureValue)
featureIndex - featureIndexfeatureValue - featureValuepublic boolean isByteFeature(String featureName)
featureName - featureNamepublic boolean isByteFeature(int index)
index - indexpublic boolean isShortFeature(String featureName)
featureName - featureNamepublic boolean isShortFeature(int index)
index - indexpublic boolean isContinuousFeature(String featureName)
featureName - featureNamepublic boolean isContinuousFeature(int index)
index - indexpublic boolean hasSimilarityMatrix(int featureIndex)
featureIndex - featureIndexpublic boolean hasSimilarityMatrix(String featureName)
featureName - featureNamepublic float getSimilarity(int featureIndex,
byte i,
byte j)
featureIndex - featureIndexi - ij - jpublic int getFeatureIndex(String featureName)
featureName - a valid feature nameIllegalArgumentException - if the feature name is unknown.public int[] getFeatureIndexArray(String[] featureName)
featureName - an array of valid feature namesIllegalArgumentException - if one of the feature names is unknown.public int getNumberOfValues(int featureIndex)
featureIndex - the index number of the feature.IndexOutOfBoundsException - if featureIndex < 0 or featureIndex ≥ getNumberOfByteFeatures() + getNumberOfShortFeatures().public String[] getPossibleValues(int featureIndex)
featureIndex - the index number of the feature.IndexOutOfBoundsException - if featureIndex < 0 or featureIndex ≥ getNumberOfByteFeatures() + getNumberOfShortFeatures().public String getFeatureValueAsString(int featureIndex, int value)
featureIndex - the index number of the feature.value - the feature value. This must be in the range of acceptable values for the given feature.IndexOutOfBoundsException - if featureIndex < 0 or featureIndex ≥ getNumberOfByteFeatures() + getNumberOfShortFeatures()IndexOutOfBoundsException - if value is not a legal value for this featurepublic String getFeatureValueAsString(String featureName, FeatureVector fv)
featureName - featureNamefv - fvpublic byte getFeatureValueAsByte(String featureName, String value)
featureName - the name of the feature.value - the feature value. This must be among the acceptable values for the given feature.IllegalArgumentException - if featureName is not a valid feature name, or if featureName is not a byte-valued feature.IllegalArgumentException - if value is not a legal value for this featurepublic byte getFeatureValueAsByte(int featureIndex,
String value)
featureIndex - the name of the feature.value - the feature value. This must be among the acceptable values for the given feature.IllegalArgumentException - if featureName is not a valid feature name, or if featureName is not a byte-valued feature.IllegalArgumentException - if value is not a legal value for this featurepublic short getFeatureValueAsShort(String featureName, String value)
featureName - the name of the feature.value - the feature value. This must be among the acceptable values for the given feature.IllegalArgumentException - if featureName is not a valid feature name, or if featureName is not a short-valued feature.IllegalArgumentException - if value is not a legal value for this featurepublic short getFeatureValueAsShort(int featureIndex,
String value)
featureIndex - the name of the feature.value - the feature value. This must be among the acceptable values for the given feature.IllegalArgumentException - if featureName is not a valid feature name, or if featureName is not a short-valued feature.IllegalArgumentException - if value is not a legal value for this featurepublic boolean featureEquals(FeatureDefinition other)
other - the feature definition to compare topublic String featureEqualsAnalyse(FeatureDefinition other)
other - otherpublic boolean equals(Object obj)
equals in class Objectobj - the feature definition to compare tofeatureEquals(FeatureDefinition)public boolean contains(FeatureDefinition other)
Specifically,
other - FeatureDefinitionpublic FeatureDefinition subset(String[] featureNamesToDrop)
featureNamesToDrop - array of Strings containing the names of the features to drop from the new FeatureDefinitionpublic FeatureVector toFeatureVector(int unitIndex, String featureString)
unitIndex - an index number to assign to the feature vectorfeatureString - the string representation of a feature vector.IllegalArgumentException - if the feature values listed are not consistent with the feature definition.toFeatureString(FeatureVector)public FeatureVector toFeatureVector(int unitIndex, byte[] bytes, short[] shorts, float[] floats)
public FeatureVector readFeatureVector(int currentUnitIndex, DataInput input) throws IOException
input - a DataInputStream or RandomAccessFile to read the feature values from.currentUnitIndex - currentUnitIndexIOException - IOExceptionpublic FeatureVector readFeatureVector(int currentUnitIndex, ByteBuffer bb) throws IOException
currentUnitIndex - currentUnitIndexbb - a byte buffer to read the feature values from.IOException - IOExceptionpublic FeatureVector createEdgeFeatureVector(int unitIndex, boolean start)
unitIndex - index of the unitstart - true creates a start vector, false creates an end vector.public String toFeatureString(FeatureVector fv)
fv - a feature vector which must be consistent with this feature definition.IllegalArgumentException - if the feature vector is not consistent with this feature definitionIndexOutOfBoundsException - if any value of the feature vector is not consistent with this feature definitionpublic void writeTo(PrintWriter out, boolean writeWeights)
out - the destination of the datawriteWeights - whether to write weights before every linepublic void generateAllDotDescForWagon(PrintWriter out)
out - the destination of the datapublic void generateAllDotDescForWagon(PrintWriter out, Set<String> featuresToIgnore)
out - the destination of the datafeaturesToIgnore - a set of Strings containing the names of features that wagon should ignore. Can be null.public void generateFeatureWeightsFile(PrintWriter out)
out - the destination of the datapublic static int diff(FeatureVector v1, FeatureVector v2)
v1 - A feature vector.v2 - Another feature vector to compare v1 with.Copyright © 2000–2016 DFKI GmbH. All rights reserved.