Ali Arsalan Kazmi (Data Analyst | Consultant)



I am a data analyst, a lifelong student of Philosophy and Logic, an appreciator of the Arts, Literature, and Culture — a researcher. Some times I may also be found behind books of Psychology and Sociology.

As a data analyst, I have a passion for creating 360° views of data, harnessing newfound powers of computers to help make data analyses efficient and timely. Of particular interest to me are automating analyses, obtaining data from unconventional sources (Web, Social Media, etc.), generating effective visualisations, and reproducible research.

My analytics repertoire includes:

In Philosophy, I am interested in Philosophy of Language, Moral Philosophy, Philosophy of Religion, Philosophy of Art, Philosophy of Science, Philosophy of AI, and Philosophy of Statistics. I am currently in the process of reading Plato and Wittgenstein. In regards to Literature and Culture, I find the revealed scriptures, especially those of the Abrahamic faiths, to be awe-inspiring for raising important questions. I was born in Pakistan, but lived most of my life in the Middle East. I had to, then, travel to Europe for higher education. Noticing the cultural differences in the three regions has sparked in me a great interest for Anthropology.

In my free time, I am either working on analytics' projects, studying online courses, reading books, or spending time with family.


November 2014

I studied an online course at Coursera, Getting and Cleaning Data. The course taught ways to acquire data from various mediums, such as the world wide web, APIs, databases, and in various other formats (including, but not limited to CSV, TXT, XLS). It also taught how such data could be tidied up into a format that facilitates application of statistical algorithms, whilst also considering what needs to be done to make the process of sharing this data easier.

My graded coursework can be found here, and my certificate here.

University of East Anglia

I studied for an MSc in Knowledge Discovery and Data Mining; courses studied for this degree included Applied Statistics, Artifical Intelligence, Databases, Information Retrieval, and Data Mining.

Software and computing languages I trained on and employed for courseworks during this time included R, IBM SPSS Modeler, PostgreSQL, SWI Prolog, and Latex.

Whilst studying at the Computing and Mathematical Sciences department for my degree, I also developed an interest in Philosophy, and attended lectures and workshops for Philosophy of Language and Moral Philosophy courses.

For my dissertation, I worked on a project for the OPCCN to analyse textual data from surveys. My efforts were recognised by the Norfolk PCC in helping them gain insights from the unstructured data.

EnCase® Forensic Guidance Software

I attended a course named EnCase® Computer Forensics I, taught by Guidance Software. The course involved real-life simulations using Encase® V7 to identify, gather, investigate, analyse, and preserve evidence (that may be related to crime) stored on digital devices, in a manner that is legally admissible.

The course taught the Computer Forensic methodology, techniques for preserving digital evidence, how to analyse file signatures, how to extract and preservce evidence, etc.

University of Greenwich

I studied for a BSc in Computer Science; sub-modules for this degree included Logical Foundations, Analytics Methods for Computing, Computer Programming, Computer Forensics, Programming Distributed Components, etc.

For my dissertation, I worked on a project for automatically detecting cyber attacks using Temporal log traces and applying Artifical Neural Networks to train a model for detection.

Pakistan International School, Saudi Arabia

I completed my O levels and A levels from Pakistan International School, in Riyadh. My GCSEs included Mathematics, Computing, Physics, Chemistry, and English; my A Levels included Mathematics, Physics, and Computing.


(Currently) Data Analyst at Aimia

My role at Aimia, currently, is to assist proprietary clients in Business Intelligence, beginning from extraction of data from databases, manipulating the data, analysing it, and presenting insights to drive decision-making.

At present, I am working on automating the entire process of data extraction, analyses, and presentation, aiming to improve productivity for Aimia's clients, and also increase time for analysts to invest into deeper analytics.

  • Presentation to introduce R to colleagues at an off site meeting

Text Miner for OPCCN

As part of my dissertation for MSc Knowledge Discovery and Data Mining, I worked on analysing textual data gathered by the Norfolk Police Authority via a survey, conducted prior to the election of the PCC for Norfolk in November 2012.

For analysing the qualitative data, I utilised R and applied the following analytical techniques:

  • Document Clustering
  • Associative Word Clouds
  • Sentiment Analyses
  • Geo-Spatial analysis (to map negative and positive word clouds to areas of crime)

I then converted my analyses in R into a shiny application with a user-friendly GUI for the OPCC to use in future analyses.

  • Blog post about the application
  • Code used to build the application
  • Presentation given to the OPCCN after completion of project

Extracting and Visualising Twitter Data

Inspired by a blog post, and NASA's night time image of the world, I extracted tweet data from Twitter using R and tried visualising the data on a map of the world. I had expected it to be comparable with NASA's image. Although my expectations were not fulfilled, the work accomplished was fair enough for some spare time on a weekend.

  • Visualisation (takes some time to load due to the size of the image)

Exploring the Glass data set

This was part of a series of activities from the book Applied Predictive Modelling, which focused entirely on exploring and transforming data. The data set used in this activity is the Glass Identification data, available from UCI Machine Learning Repository.

Data Mining on the Insurance Company Benchmark (COIL 2000) Dataset

This activity was carried out for the Data Mining module of my MSc degree. It comprised of training a statistical model by training on a data set provided for the COIL 2000 competition, such that the model does not overfit/uderfit and is able to predict with accuracy customers who are potential buyers of a mobile caravan policy.

The imbalance of the target class (mobile caravan policy buyers vs. non-buyers) made the task much challenging, and as would be done in such cases, Feature Selection was made prior to training the model using different algorithms, from Bayesian models to Random Forest.

  • Links currently unavailable

Developing a Hotel Review Classification System to detect Spam reviews

For a sub-module of my MSc degree, Information Retrieval, I developed a Hotel Review Classification system, using the techniques described in the research paper of M. Ott et. al (2011), titled Finding Deceptive Opinion Spam by Any Stretch of the Imagination.

This project had the goal of developing techniques from Information Retrieval and Machine Learning in order to classify such fake/spam reviews in the context of Hotel reviewing.

  • Links currently unavailable

Developing a Database for the Pierian Games

I worked on this project for my Database manipulation sub-module from my MSc degree: The Pierian Games were a hypothetical gaming event modeled on the London Olympics. Competitors from all over the world were allowed to partake in the event for different competitions, such as 100 m Sprint, Swim, etc. The goal of this project was to develop a database that would record details of all events, the tickets issued or cancelled, and details of customers, using triggers, stored procedures, and user input as acquired from a web page.

Database technology used for this project was PostgreSQL, and Java was used to set up a web page for querying the developed database.

  • Links currently unavailable

An Investigation of the use of Artificial Neural Networks in the automatic detection of cyber-attacks using temporal network log traces

I worked on this project for my BSc dissertation: Modern organisations depend on the use of Networks (WLANs and LANs) to conduct and support communication over large geographical distances, and to access the wealth of information sealed in the World Wide Web. However, such organisations are also constantly plagued by Network attacks, and these must be dealt with to ensure smooth communication and to benefit from the WWW.

The continual emergence of novel attacks puts the conventional remedies, such as firewalls and intrusion detection tools, at a disadvantage, as there would exist no rule of thumb to successfully pre-determine novel attacks. To such problems, where decisions ought to be made from minimum information, Artificial Neural Networks (ANNs) can be applied. The application of ANNs to such tasks was investigated in this project.

  • Links currently unavailable


Online Courses

Inasmuch as I find spare time, I attend online courses to further my learning and technical capabilities. I am registered at Coursera, Udemy, and Stanford Online, although I am more frequent a participant at Coursera. I may not apply for a certificate while attending a course.

  • Certificate for Getting and Cleaning Data from John Hopkins University
  • Data Manipulation in R with Dplyr
  • Reporting with R Markdown

Online Data Science Competitions

I have been participating unofficially in online Data Science competitions at Kaggle, particularly those involving Qualitative Data Analyses. Soon in my spare time, I hope to start participating officially, by submitting my models and getting them ranked in comparison to other models.


Infrequently, I write blog posts covering any projects I have undertaken for Data Science. More often than this, however, I share blog posts from other blogs.

Amazon wishlist

A (ridiculously) long list of books and other items on my wishlist, which I keep buying from time to time.

Curriculum Vitae