public class FeatureDefinition extends Object
Modifier and Type | Field and Description |
---|---|
static String |
BYTEFEATURES |
static String |
CONTINUOUSFEATURES |
static String |
EDGEFEATURE |
static String |
EDGEFEATURE_END |
static String |
EDGEFEATURE_START |
static String |
FEATURESIMILARITY |
static String |
NULLVALUE |
static String |
SHORTFEATURES |
static char |
WEIGHT_SEPARATOR |
Constructor and Description |
---|
FeatureDefinition(BufferedReader input,
boolean readWeights)
Create a feature definition object, reading textual data from the given BufferedReader.
|
FeatureDefinition(ByteBuffer bb)
Create a feature definition object, reading binary data from the given byte buffer.
|
FeatureDefinition(DataInput input)
Create a feature definition object, reading binary data from the given DataInput.
|
Modifier and Type | Method and Description |
---|---|
boolean |
contains(FeatureDefinition other)
Determine whether this FeatureDefinition is a superset of, or equal to, another FeatureDefinition.
|
FeatureVector |
createEdgeFeatureVector(int unitIndex,
boolean start)
Create a feature vector that marks a start or end of a unit.
|
static int |
diff(FeatureVector v1,
FeatureVector v2)
Compares two feature vectors in terms of how many discrete features they have in common.
|
boolean |
equals(Object obj)
Determine whether two feature definitions are equal, regarding both the actual feature definitions and the weights.
|
boolean |
featureEquals(FeatureDefinition other)
Determine whether two feature definitions are equal, with respect to number, names, and possible values of the three kinds
of features (byte-valued, short-valued, continuous).
|
String |
featureEqualsAnalyse(FeatureDefinition other)
An extension of the previous method.
|
void |
generateAllDotDescForWagon(PrintWriter out)
Export this feature definition in the "all.desc" format which can be read by wagon.
|
void |
generateAllDotDescForWagon(PrintWriter out,
Set<String> featuresToIgnore)
Export this feature definition in the "all.desc" format which can be read by wagon.
|
void |
generateFeatureWeightsFile(PrintWriter out)
Print this feature definition plus weights to a .txt file
|
String[] |
getByteFeatureNameArray()
Get names of byte features
|
String[] |
getContinuousFeatureNameArray()
Get names of continuous features
|
int |
getFeatureIndex(String featureName)
Translate between a feature name and a feature index.
|
int[] |
getFeatureIndexArray(String[] featureName)
Translate between an array of feature names and an array of feature indexes.
|
String |
getFeatureName(int index)
Translate between a feature index and a feature name.
|
String[] |
getFeatureNameArray()
Get names of all features
|
String[] |
getFeatureNameArray(int[] index)
Translate between an array of feature indexes and an array of feature names.
|
String |
getFeatureNames()
List all feature names, separated by white space, in their order of definition.
|
byte |
getFeatureValueAsByte(int featureIndex,
String value)
For the feature with the given index number, translate its String value to its byte value.
|
byte |
getFeatureValueAsByte(String featureName,
String value)
For the feature with the given name, translate its String value to its byte value.
|
short |
getFeatureValueAsShort(int featureIndex,
String value)
For the feature with the given name, translate its String value to its short value.
|
short |
getFeatureValueAsShort(String featureName,
String value)
For the feature with the given name, translate its String value to its short value.
|
String |
getFeatureValueAsString(int featureIndex,
int value)
For the feature with the given index number, translate its byte or short value to its String value.
|
String |
getFeatureValueAsString(String featureName,
FeatureVector fv)
Simple access to string-based features.
|
float[] |
getFeatureWeights() |
int |
getNumberOfByteFeatures()
Get the number of byte features.
|
int |
getNumberOfContinuousFeatures()
Get the number of continuous features.
|
int |
getNumberOfFeatures()
Get the total number of features.
|
int |
getNumberOfShortFeatures()
Get the number of short features.
|
int |
getNumberOfValues(int featureIndex)
Get the number of possible values for the feature with the given index number.
|
String[] |
getPossibleValues(int featureIndex)
Get the list of possible String values for the feature with the given index number.
|
String[] |
getShortFeatureNameArray()
Get names of short features
|
float |
getSimilarity(int featureIndex,
byte i,
byte j)
To get a similarity between two feature values
|
float |
getWeight(int featureIndex)
For the feature with the given index, return the weight.
|
String |
getWeightFunctionName(int featureIndex)
Get the name of any weighting function associated with the given feature index.
|
boolean |
hasFeature(String name)
Indicate whether the feature definition contains the feature with the given name
|
boolean |
hasFeatureValue(int featureIndex,
String featureValue)
Query a feature as identified by the given featureIndex as to whether the given featureValue is a known value of that
feature.
|
boolean |
hasFeatureValue(String featureName,
String featureValue)
Query a feature as identified by the given featureName as to whether the given featureValue is a known value of that
feature.
|
boolean |
hasSimilarityMatrix(int featureIndex)
true, if given feature index contains similarity matrix
|
boolean |
hasSimilarityMatrix(String featureName)
true, if given feature name contains similarity matrix
|
boolean |
isByteFeature(int index)
Determine whether the feature with the given index number is a byte feature.
|
boolean |
isByteFeature(String featureName)
Determine whether the feature with the given name is a byte feature.
|
boolean |
isContinuousFeature(int index)
Determine whether the feature with the given index number is a continuous feature.
|
boolean |
isContinuousFeature(String featureName)
Determine whether the feature with the given name is a continuous feature.
|
boolean |
isShortFeature(int index)
Determine whether the feature with the given index number is a short feature.
|
boolean |
isShortFeature(String featureName)
Determine whether the feature with the given name is a short feature.
|
FeatureVector |
readFeatureVector(int currentUnitIndex,
ByteBuffer bb)
Create a feature vector consistent with this feature definition by reading the data from the byte buffer.
|
FeatureVector |
readFeatureVector(int currentUnitIndex,
DataInput input)
Create a feature vector consistent with this feature definition by reading the data from the given input.
|
FeatureDefinition |
subset(String[] featureNamesToDrop)
Create a new FeatureDefinition that contains a subset of the features in this.
|
String |
toFeatureString(FeatureVector fv)
Convert a feature vector into a String representation.
|
FeatureVector |
toFeatureVector(int unitIndex,
byte[] bytes,
short[] shorts,
float[] floats) |
FeatureVector |
toFeatureVector(int unitIndex,
String featureString)
Create a feature vector consistent with this feature definition by reading the data from a String representation.
|
void |
writeBinaryTo(DataOutput out)
Write this feature definition in binary format to the given output.
|
void |
writeTo(PrintWriter out,
boolean writeWeights)
Export this feature definition in the text format which can also be read by this class.
|
public static final String BYTEFEATURES
public static final String SHORTFEATURES
public static final String CONTINUOUSFEATURES
public static final String FEATURESIMILARITY
public static final char WEIGHT_SEPARATOR
public static final String EDGEFEATURE
public static final String EDGEFEATURE_START
public static final String EDGEFEATURE_END
public static final String NULLVALUE
public FeatureDefinition(BufferedReader input, boolean readWeights) throws IOException
input
- a BufferedReader from which a textual feature definition can be read.readWeights
- a boolean indicating whether or not to read weights from input. If weights are read, they will be normalized so
that they sum to one.IOException
- if a reading problem occurspublic FeatureDefinition(DataInput input) throws IOException
input
- a DataInputStream or a RandomAccessFile from which a binary feature definition can be read.IOException
- if a reading problem occurspublic FeatureDefinition(ByteBuffer bb) throws IOException
bb
- a byte buffer from which a binary feature definition can be read.IOException
- if a reading problem occurspublic void writeBinaryTo(DataOutput out) throws IOException
out
- a DataOutputStream or RandomAccessFile to which the FeatureDefinition should be written.IOException
- if a problem occurs while writing.public int getNumberOfFeatures()
public int getNumberOfByteFeatures()
public int getNumberOfShortFeatures()
public int getNumberOfContinuousFeatures()
public float getWeight(int featureIndex)
featureIndex
- featureIndexpublic float[] getFeatureWeights()
public String getWeightFunctionName(int featureIndex)
featureIndex
- featureIndexpublic String getFeatureName(int index)
index
- a feature index, as could be used to access a feature value in a FeatureVector.IndexOutOfBoundsException
- if index<0 or index>getNumberOfFeatures()public String[] getFeatureNameArray(int[] index)
index
- an array of feature indexes, as could be used to access a feature value in a FeatureVector.IndexOutOfBoundsException
- if any of the indexes is <0 or >getNumberOfFeatures()public String[] getFeatureNameArray()
public String[] getByteFeatureNameArray()
public String[] getShortFeatureNameArray()
public String[] getContinuousFeatureNameArray()
public String getFeatureNames()
public boolean hasFeature(String name)
name
- the feature name in question, e.g. "next_next_phone"public boolean hasFeatureValue(String featureName, String featureValue)
featureName
- featureNamefeatureValue
- featureValuepublic boolean hasFeatureValue(int featureIndex, String featureValue)
featureIndex
- featureIndexfeatureValue
- featureValuepublic boolean isByteFeature(String featureName)
featureName
- featureNamepublic boolean isByteFeature(int index)
index
- indexpublic boolean isShortFeature(String featureName)
featureName
- featureNamepublic boolean isShortFeature(int index)
index
- indexpublic boolean isContinuousFeature(String featureName)
featureName
- featureNamepublic boolean isContinuousFeature(int index)
index
- indexpublic boolean hasSimilarityMatrix(int featureIndex)
featureIndex
- featureIndexpublic boolean hasSimilarityMatrix(String featureName)
featureName
- featureNamepublic float getSimilarity(int featureIndex, byte i, byte j)
featureIndex
- featureIndexi
- ij
- jpublic int getFeatureIndex(String featureName)
featureName
- a valid feature nameIllegalArgumentException
- if the feature name is unknown.public int[] getFeatureIndexArray(String[] featureName)
featureName
- an array of valid feature namesIllegalArgumentException
- if one of the feature names is unknown.public int getNumberOfValues(int featureIndex)
featureIndex
- the index number of the feature.IndexOutOfBoundsException
- if featureIndex < 0 or featureIndex ≥ getNumberOfByteFeatures() + getNumberOfShortFeatures().public String[] getPossibleValues(int featureIndex)
featureIndex
- the index number of the feature.IndexOutOfBoundsException
- if featureIndex < 0 or featureIndex ≥ getNumberOfByteFeatures() + getNumberOfShortFeatures().public String getFeatureValueAsString(int featureIndex, int value)
featureIndex
- the index number of the feature.value
- the feature value. This must be in the range of acceptable values for the given feature.IndexOutOfBoundsException
- if featureIndex < 0 or featureIndex ≥ getNumberOfByteFeatures() + getNumberOfShortFeatures()IndexOutOfBoundsException
- if value is not a legal value for this featurepublic String getFeatureValueAsString(String featureName, FeatureVector fv)
featureName
- featureNamefv
- fvpublic byte getFeatureValueAsByte(String featureName, String value)
featureName
- the name of the feature.value
- the feature value. This must be among the acceptable values for the given feature.IllegalArgumentException
- if featureName is not a valid feature name, or if featureName is not a byte-valued feature.IllegalArgumentException
- if value is not a legal value for this featurepublic byte getFeatureValueAsByte(int featureIndex, String value)
featureIndex
- the name of the feature.value
- the feature value. This must be among the acceptable values for the given feature.IllegalArgumentException
- if featureName is not a valid feature name, or if featureName is not a byte-valued feature.IllegalArgumentException
- if value is not a legal value for this featurepublic short getFeatureValueAsShort(String featureName, String value)
featureName
- the name of the feature.value
- the feature value. This must be among the acceptable values for the given feature.IllegalArgumentException
- if featureName is not a valid feature name, or if featureName is not a short-valued feature.IllegalArgumentException
- if value is not a legal value for this featurepublic short getFeatureValueAsShort(int featureIndex, String value)
featureIndex
- the name of the feature.value
- the feature value. This must be among the acceptable values for the given feature.IllegalArgumentException
- if featureName is not a valid feature name, or if featureName is not a short-valued feature.IllegalArgumentException
- if value is not a legal value for this featurepublic boolean featureEquals(FeatureDefinition other)
other
- the feature definition to compare topublic String featureEqualsAnalyse(FeatureDefinition other)
other
- otherpublic boolean equals(Object obj)
equals
in class Object
obj
- the feature definition to compare tofeatureEquals(FeatureDefinition)
public boolean contains(FeatureDefinition other)
Specifically,
other
- FeatureDefinitionpublic FeatureDefinition subset(String[] featureNamesToDrop)
featureNamesToDrop
- array of Strings containing the names of the features to drop from the new FeatureDefinitionpublic FeatureVector toFeatureVector(int unitIndex, String featureString)
unitIndex
- an index number to assign to the feature vectorfeatureString
- the string representation of a feature vector.IllegalArgumentException
- if the feature values listed are not consistent with the feature definition.toFeatureString(FeatureVector)
public FeatureVector toFeatureVector(int unitIndex, byte[] bytes, short[] shorts, float[] floats)
public FeatureVector readFeatureVector(int currentUnitIndex, DataInput input) throws IOException
input
- a DataInputStream or RandomAccessFile to read the feature values from.currentUnitIndex
- currentUnitIndexIOException
- IOExceptionpublic FeatureVector readFeatureVector(int currentUnitIndex, ByteBuffer bb) throws IOException
currentUnitIndex
- currentUnitIndexbb
- a byte buffer to read the feature values from.IOException
- IOExceptionpublic FeatureVector createEdgeFeatureVector(int unitIndex, boolean start)
unitIndex
- index of the unitstart
- true creates a start vector, false creates an end vector.public String toFeatureString(FeatureVector fv)
fv
- a feature vector which must be consistent with this feature definition.IllegalArgumentException
- if the feature vector is not consistent with this feature definitionIndexOutOfBoundsException
- if any value of the feature vector is not consistent with this feature definitionpublic void writeTo(PrintWriter out, boolean writeWeights)
out
- the destination of the datawriteWeights
- whether to write weights before every linepublic void generateAllDotDescForWagon(PrintWriter out)
out
- the destination of the datapublic void generateAllDotDescForWagon(PrintWriter out, Set<String> featuresToIgnore)
out
- the destination of the datafeaturesToIgnore
- a set of Strings containing the names of features that wagon should ignore. Can be null.public void generateFeatureWeightsFile(PrintWriter out)
out
- the destination of the datapublic static int diff(FeatureVector v1, FeatureVector v2)
v1
- A feature vector.v2
- Another feature vector to compare v1 with.Copyright © 2000–2016 DFKI GmbH. All rights reserved.