Sunday, August 9, 2015

Charts in Python (Visualization of DATA in Python )

As a continuation of the previous post regarding the Voting data analysis , we will be creating some Graphs and charts based on the result we got .

Note:  Library to be used is matplotlib, which helps is creating some graphical charts based on our data.

Modules to be downloaded/imported  are - numpy and pyplot .
A little brief behind using these two modules :
·         pyplot is one way to plot graph data with Matplotlib. It's modelled on the way charting works in another popular commercial program, MATLab.
·         importmatplotlib.pyplot as plt
·         NumPy is a module providing lots of numeric functions for Python
·         importnumpyasnp

Bar Chart :

We will be using the library matplotlib to create some graphical charts based on our data.
·         Inline Charts:
%matplotlib inline

... the commands mentioned above instructs IPython that we want charts to be created in  "inline style" inside our notebook and not in a separate window..


Codes used in creating the charts are as follows:


Explanation :

We captured radish variety and its count in two arrays
names = []
votes = []
Then  we have created a range of indexes for the X values in the graph, one entry for each item in the "counts" dictionary (i.e. len(counts)), numbered 0,1,2,3,etc. This will spread out the graph bars evenly across the X axis on the plot.
np.arange is a NumPy function like the range() function in Python, only the result it produces is a "NumPy array". 
plt.bar() creates a bar graph, using the "x" values as the X axis positions and the values in the votes array (i.e. the vote counts) as the height of each bar.


The Output :



Final Code for Bar Chart :
import matplotlib.pyplot as plt
import numpy as np
# Create an empty dictionary for associating radish names
# with vote counts
counts = {}
fraud=[]
# Create an empty list with the names of everyone who voted
voted = []
# Clean up (munge) a string so it's easy to match against other     strings
def clean_string(s):
    return s.strip().capitalize().replace("  "," ")
# Check if someone has voted already and return True or False
def has_already_voted(name):
    if name in voted:
        fraud.append(name)
        return True
    return False
# Count a vote for the radish variety named 'radish'
def count_vote(radish):
    if not radish in counts:
        # First vote for this variety
        counts[radish] = 1
    else:
        # Increment the radish count
        counts[radish] = counts[radish] + 1
def max_voted_radish(stats):
    return[key for key,val in stats.iteritems() if val == max(stats.values())]
def min_voted_radish(stats):
    return[key for key,val in stats.iteritems() if val == min(stats.values())]

for line in open("radishsurvey.txt"):
    line = line.strip()
    name, vote = line.split(" - ")
    name = clean_string(name)
    vote = clean_string(vote)

    if not has_already_voted(name):
        count_vote(vote)
    voted.append(name)
         
names = []
votes = []
# Split the dictionary of name:votes into two lists, one for names and one for vote count
for radish in counts:
    names.append(radish)
    votes.append(counts[radish])
mxpos= votes.index(max(votes))+1
mnpos= votes.index(max(votes))+1
# The X axis can just be numbered 0,1,2,3...
x = np.arange(len(counts))

plt.bar(x, votes)
plt.xticks(x + 0.5, names, rotation=90)
plt.yticks(np.arange(0,max(votes)+20,10))
plt.ylabel('Votes')
plt.xlabel('Voters')
plt.title('Leader Board')
plt.annotate('max vote '+str(max(votes)), xy=(0.5+0.5*mxpos, max(votes)), xytext=(2+0.5*mxpos, max(votes)+5),
            arrowprops=dict(facecolor='red', shrink=0.05),
            )

Output:

Pie Chart :

Code :

import matplotlib.pyplot as plt
import numpy as np
from pylab import *
# Create an empty dictionary for associating radish names
# with vote counts
counts = {}
fraud=[]
# Create an empty list with the names of everyone who voted
voted = []
# Clean up (munge) a string so it's easy to match against other     strings
def clean_string(s):
    return s.strip().capitalize().replace("  "," ")
# Check if someone has voted already and return True or False
def has_already_voted(name):
    if name in voted:
        fraud.append(name)
        return True
    return False
# Count a vote for the radish variety named 'radish'
def count_vote(radish):
    if not radish in counts:
        # First vote for this variety
        counts[radish] = 1
    else:
        # Increment the radish count
        counts[radish] = counts[radish] + 1
def max_voted_radish(stats):
    return[key for key,val in stats.iteritems() if val == max(stats.values())]
def min_voted_radish(stats):
    return[key for key,val in stats.iteritems() if val == min(stats.values())]

for line in open("radishsurvey.txt"):
    line = line.strip()
    name, vote = line.split(" - ")
    name = clean_string(name)
    vote = clean_string(vote)

    if not has_already_voted(name):
        count_vote(vote)
    voted.append(name)
         
names = []
votes = []
# Split the dictionary of name:votes into two lists, one for names and one for vote count
for radish in counts:
    names.append(radish)
    votes.append(counts[radish])
vts=[(float(x)/float(sum(votes)))*100.0 for x in votes]
sizes = vts
cs=cm.Set1(np.arange(40)/40.)
expl=[]
for i in xrange(len(vts)):
    if vts[i]==max(vts):
        expl.append(0.1)
    else:
        expl.append(0)

plt.pie(sizes, explode=expl, labels=names, colors=cs,
        autopct='%1.1f%%', shadow=True, startangle=90)
# Set aspect ratio to be equal so that pie is drawn as a circle.
plt.axis('equal')

plt.show()

Output :





No comments :

Post a Comment