Pre-weighted words (optional; comma-separated):
We'll try to grab the text as fast as we can.
The more text there is, the more time it will take.
— The algorithm cleans up the input text so that it can be analyzed.
— Then, it finds the frequency of each word in the cleaned up text.
— Each word is assigned a score based on a simple TF-IDF analysis.
— Based on the scores of the words within them, a score is calculated for each sentence.
— So that longer sentences aren't favored and shorter sentences aren't punished, the sentence scores are then normalized by length.
— The sentences are sorted by their scores.
— Finally, depending on what type of output is asked for, the program spits out the results.
The program tries to pick out the sentences of an input text that are most representative of the text as a whole; that is to say, find the essence of a text.
Project Gutenberg is an excellent resource for full books in the public domain.
Try pasting text from any of these links into the input box above:
Python, web.py, NLTK, JQuery, 1140.css, sexybuttons, vim, and Adobe Photoshop.
I'm interested in computational linguistics. It's interesting to consider what exactly makes a sentence important, and if it's even possible to find an objective measure of 'meaningfulness'.