[go: nahoru, domu]

Skip to content

Small project that cleans a text, computes its entropy and creates new texts from it.

License

Notifications You must be signed in to change notification settings

laz08/Natural-language-entropy

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

4 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Natural language entropy study

About

This little project is product of a class assignment, coded around February 2016.

It is composed only by two files:

  • EntropyUtils.py, which is the core and has all the important methods.
  • tester.py, which allows a direct route to use the methods defined in EntropyUtils.py from the terminal.

How it works

Although briefly done, almost every method defined in EntropyUtils is explained above its header.

The most interesting functions are stated here.

Clean text

  • cleanInputText(text) cleans an input text so that it does not have special characters.

Entropies computation

  • getEntropy(text) computes the entropy for a variable X whose values are letters from a given text. That is, H(X).
  • getJointEntropy(text) computes the joint entropy of a pair of vars: X, Y. That is, H(X, Y).
  • getConditionalEntropyOfLetter(text) computes the entropy of a variable Y, where its value is a random letter from the given text. That is, H(Y|x[i]).
  • getConditionalEntropy(text) computes the conditional entropy of two variables from the given text. That is, H(X|Y).

You can also check the functions using checkProposition(text).

New texts creator

You can create two (random) texts taking the given text as an example:

  • createNewTextSameLetterFreq(text), creates a text with the same letters' frequencies as the original text.
  • createNewTextSameJointEntropy(text), creates a text with the same joint entropy as the original text.

About

Small project that cleans a text, computes its entropy and creates new texts from it.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages