Ali Arsalan Kazmi
England and Wales Police & Crime Commissioner Elections, 2012
Norfolk Policing Survey: "Making a Difference"
Analysis of survey
Everyday language is a part of the human organism and is no less complicated than it.
Natural Language
Synoynmy
Polysemy
Domain specific words
Words influenced by History/Culture
Obtained from Dr. Beatriz's lecture on Text Mining
Objective: Apply operations to 'reshape' data into a format suitable for Text Mining.
Standardisation
Stopwords Removal
Thesaurus
This is a vital issue and more bobbies must be on the beat. Increase the beat.
Police needs to solve the problem. Car parks have been taken over by bicycles! Please increase Police beat too!
I am a cyclist and I am proud of it!
This is a vital issue and more bobbies must be on the beat. Increase the beat.
Police needs to solve the problem. Car parks have been taken over by bicycles! Please increase Police beat too!
I am a cyclist and I am proud of it!
vital issue more police must be on beat increase beat
police needs solve problem car parks taken over by bicycle please increase police beat
bicyclist proud
Text Preprocessing objectives:
Text Preprocessing Techniques:
Objective: Apply operations to generate representations of Textual data.
Words | Document 1 | Document 2 | Document 3 |
---|---|---|---|
beat | 1 | 1 | 0 |
bicycle | 0 | 1 | 0 |
bicyclist | 0 | 0 | 1 |
car | 0 | 1 | 0 |
increase | 1 | 1 | 0 |
issue | 1 | 0 | 0 |
... | ... | ... | ... |
Words | Document 1 | Document 2 | Document 3 |
---|---|---|---|
beat | 2 | 1 | 0 |
bicycle | 0 | 1 | 0 |
bicyclist | 0 | 0 | 1 |
car | 0 | 1 | 0 |
increase | 1 | 1 | 0 |
issue | 1 | 0 | 0 |
... | ... | ... | ... |
Words | Document 1 | Document 2 | Document 3 |
---|---|---|---|
beat | 0.528 | 0.176 | 0 |
bicycle | 0 | 0.4771 | 0 |
bicyclist | 0 | 0 | 0.4771 |
car | 0 | 0.4771 | 0 |
increase | 0.176 | 0.176 | 0 |
issue | 0.4771 | 0 | 0 |
... | ... | ... | ... |
Words | beat | bicycle | bicyclist | car | increase | issue | ... |
---|---|---|---|---|---|---|---|
beat | 5 | 1 | 0 | 0 | 3 | 2 | ... |
bicycle | 1 | 1 | 0 | 1 | 1 | 0 | ... |
bicyclist | 0 | 0 | 1 | 0 | 0 | 0 | ... |
car | 1 | 1 | 0 | 1 | 1 | 0 | ... |
increase | 3 | 1 | 0 | 1 | 2 | 1 | ... |
issue | 2 | 0 | 0 | 0 | 1 | 1 | ... |
... | ... | ... | ... | ... | ... | ... | ... |
Word # | Words | Document 1 | Document 2 | Document 3 | ... |
---|---|---|---|---|---|
1 | beat | 2 | 1 | 0 | ... |
2 | bicycle | 0 | 1 | 0 | ... |
3 | bicyclist | 0 | 0 | 1 | ... |
4 | car | 0 | 1 | 0 | ... |
5 | increase | 1 | 1 | 0 | ... |
6 | issue | 1 | 0 | 0 | ... |
... | ... | ... | ... | ... | ... |
4573 | zeal | 1 | 0 | 0 | ... |
Word # | Words | Document 1 | Document 2 | Document 3 | ... |
---|---|---|---|---|---|
1 | beat | 2 | 1 | 0 | ... |
2 | bicycle | 0 | 1 | 0 | ... |
3 | bicyclist | 0 | 0 | 1 | ... |
4 | car | 0 | 1 | 0 | ... |
5 | increase | 1 | 1 | 0 | ... |
6 | issue | 1 | 0 | 0 | ... |
... | ... | ... | ... | ... | ... |
4573 | zeal | 1 | 0 | 0 | ... |
Word # | Words | Document 1 | Document 2 | Document 3 | ... |
---|---|---|---|---|---|
1 | beat | 2 | 1 | 0 | ... |
2 | bicycle | 0 | 1 | 0 | ... |
3 | bicyclist | 0 | 0 | 1 | ... |
4 | car | 0 | 1 | 0 | ... |
5 | increase | 1 | 1 | 0 | ... |
6 | issue | 1 | 0 | 0 | ... |
... | ... | ... | ... | ... | ... |
4573 | zeal | 1 | 0 | 0 | ... |
Depends on the Text Mining operation to be applied and type of data in Term-Document Matrix
Term Frequency lowerbounds
Term Frequency X Inverse Document Frequency lowerbounds
Preset Sparsity level for Terms
etc.
Organise data
Simplify data
Organise data
Simplify data
Clustering algorithms detect relationships inherent in data by detecting statistical patterns
Usually represent patterns by using distance measures
Place data in a Euclidean space
Clustering algorithms detect relationships inherent in data by detecting statistical patterns
Usually represent patterns by using distance measures
Place data in a Euclidean space
Use Similarity measures to identify groups, intra-group and inter-group linkages
Similar data tend to lie close to each other
But
Clustering Objectives:
After Clustering:
Clustering Graphics:
Supervised Approach
Unsupervised Approach
Unsupervised + Heuristics Approach
Generally:
Results of Sentiment Analysis + Phrase clouds = Negative Phrase Cloud
Utilise:
Utilise: