Name Classifier

Usage

The program consists of two modules, a RDF_processor and a model. It also contains two demonstration scripts, build_model.py, which can be used to create a model from serialized hash arrays or RDF triples and predict.py which provides predictions for a stored model.

The model file contains a prebuilt model which can be used by the predict script

`RDF_processor`

`RDF_processor.parse_identifiers(ident_file, object)`

Constructs and stores a set of <subjects> from ident_file with RDF:type == object using the Redland Python bindings. The format of the RDF file should be:

<subject> RDF:type <object>.

ident_file: Name of the RDF turtle file to be parsed.
object: URI of the parsing <object>

`RDF_processor.map(map_file, balance=True)`

Constructs and stores an array of <object> strings for FOAF:Name predicates, and a corresponding identifier array describing the <subjects>'s presence in the stored set of subjects.

The format of the RDF file should be:

<subject> FOAF:Name <object>.

map_file: Name of the RDF turtle file to be parsed & mapped
balance: If True, balances the arrays by downsampling the more prevelant category.

`RDF_processor.hash(mapping_size=1000)`

Tokenises the array of object strings and hashes them using mmh3 to create and store a scipy_dok sparse matrix.

mapping_size: The range of the hashes, between [-mapping_size, +mapping_size]

`RDF_processor.shuffle()`

Shuffles the subject, features and identifier arrays.

`RDF_processor.get_features()`

Returns the current feature array.

`RDF_processor.get_identifiers()`

Returns the array of identifiers.

`RDF_processor.get_subject()`

Returns the array of subject strings.

`log_reg`

`log_reg.log_reg(size, batch_size=1000, alpha=0.2, C=0.0)`

size: Number of features in the array to be modelled
batch_size: The maximum size of each batch
alpha: The learning rate for batch SGD
C: The L2 regularization term

`log_reg.fit(X, Y)`

Fits dataset X to target Y by minimizing the logistic cost function using Mini-batch Gradient Descent with L2 regularization.

X: The array to be fitted. Of shape (n_samples, n_features)
Y: The target array for X. Of shape (n_samples)

`log_reg.predict(X)`

Predicts the value of X using the fitted model.

X: Value to be fitted

`log_reg.score(X, Y)`

Returns the mean successful prediction rate for X against targets Y on the fitted model.

X: The array to be predicted
Y: The targets to be compared against

Requirements

Python 2.7
numpy
scipy
mmh3
Redland Python bindings
cPickle

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
README.md		README.md
build_model.py		build_model.py
model		model
model.py		model.py
parse.py		parse.py
predict.py		predict.py
requirements.txt		requirements.txt
sample.feat		sample.feat
sample.subj		sample.subj
sample.type		sample.type

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Name Classifier

Usage

`RDF_processor`

`RDF_processor.parse_identifiers(ident_file, object)`

`RDF_processor.map(map_file, balance=True)`

`RDF_processor.hash(mapping_size=1000)`

`RDF_processor.shuffle()`

`RDF_processor.get_features()`

`RDF_processor.get_identifiers()`

`RDF_processor.get_subject()`

`log_reg`

`log_reg.log_reg(size, batch_size=1000, alpha=0.2, C=0.0)`

`log_reg.fit(X, Y)`

`log_reg.predict(X)`

`log_reg.score(X, Y)`

Requirements

About

Releases

Packages

Contributors 2

Languages

ivoryw/name-identifier

Folders and files

Latest commit

History

Repository files navigation

Name Classifier

Usage

RDF_processor

RDF_processor.parse_identifiers(ident_file, object)

RDF_processor.map(map_file, balance=True)

RDF_processor.hash(mapping_size=1000)

RDF_processor.shuffle()

RDF_processor.get_features()

RDF_processor.get_identifiers()

RDF_processor.get_subject()

log_reg

log_reg.log_reg(size, batch_size=1000, alpha=0.2, C=0.0)

log_reg.fit(X, Y)

log_reg.predict(X)

log_reg.score(X, Y)

Requirements

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

`RDF_processor`

`RDF_processor.parse_identifiers(ident_file, object)`

`RDF_processor.map(map_file, balance=True)`

`RDF_processor.hash(mapping_size=1000)`

`RDF_processor.shuffle()`

`RDF_processor.get_features()`

`RDF_processor.get_identifiers()`

`RDF_processor.get_subject()`

`log_reg`

`log_reg.log_reg(size, batch_size=1000, alpha=0.2, C=0.0)`

`log_reg.fit(X, Y)`

`log_reg.predict(X)`

`log_reg.score(X, Y)`

Packages