Sunday, August 9, 2015

Python In Action --->

The Vote Counting Problem

In a world where every moment is being captured in making a story from insurmountable  information we have yet another innovation from Guido Van Rossum a software named as Python in helping us doing the same with an ease.
Let us try and figure out in a nut shell how Python helps us in extracting and presenting  information
Find below a real life example using  a data set called radishsurvey.txt.

Scenario :  We are trying to figure out some information's regarding a person's choice on the variety of radish.

Introduction :

The file is a text file which contains 300 lines of a raw data from a survey , where  each line consists of a name, a hyphen, then a radish variety .

Note: The default directory is being used to save the raw data downloaded in txt format , this directory gets created when we install anaconda suit in the machine.



Targeting Result :

Our Target is to get some useful information out of this raw data.But  What information we are looking for to get out of this raw data. Let's say :
1.      What's the most popular radish variety?
2.      What are the least popular?
3.      Did anyone vote twice?

What we  need to know  -

Well first thing first , This is a Text data which means we must have knowledge of Working With Strings 
.
Array :

·         First of all we will process the raw data and store two information in the form of strings in to two different  variable . Lets say, one is name and another one is vote  .

-

So in the above example we use the code :


·         We stripped out each line from the text file
·         Split the line by “ – “ and stored it in two variables name and vote
·         Finally printed our result .

Now , let's make these two variables as array and put the data over there for some future use.


Output:


Perform Checking for Duplication :

let's quickly perform some operation on these arrays to check if there are any duplication.


Output:


So from the list we can infer there are fraud voters / duplicate values by taking the Example of “Red Kings” , which is repeated three times .
So definitely there is something wrong.
Let's investigate in some other ways as well !

Using Dictionary :


Output:


So here we have simply created one blank Dictionary named counts {} . Then we have put the two Things name and the voted radish variety . We only put the count of Each Variety in the Dictionary .
Here as well the output exhibits some flaws within !

Cleaning the data :
Let's Check something same with person’s name ( This time more cautiously , we will remove extra spaces from person’s name and also capitalize all the letters ) .


Output : 


A Haaa !!! So here is Phoebe barwell and Procopiozito who are the frauds …
So how actually we found them ? ..
·         We created one empty array named voted.
·         Then we went through each line of the text and took out the names of the voters from the line Then performed the cleanup by using capitalize() keyword to make all the names in Capital letters , and replace() keyword to replace extra spaces between First name and Surname .


 Use Of Function :

Now ,Lets use some user defined functions to make the lines of code shorter.


Output:


 So , We got our Fresh Result .


Conclusion:

Now ,Lets make the program more efficient so that it can give us the answer for the following questions :
1.      What's the most popular radish variety?
2.      What are the least popular?
3.      Did anyone vote twice?
4.      All voting result.

Code :

# Create an empty dictionary for associating radish names
# with vote counts
counts = {}
fraud=[]
# Create an empty list with the names of everyone who voted
voted = []
# Clean up (munge) a string so it's easy to match against other     strings
def clean_string(s):
    return s.strip().capitalize().replace("  "," ")
# Check if someone has voted already and return True or False
def has_already_voted(name):
    if name in voted:
        fraud.append(name)
        return True
    return False
# Count a vote for the radish variety named 'radish'
def count_vote(radish):
    if not radish in counts:
        # First vote for this variety
        counts[radish] = 1
    else:
        # Increment the radish count
        counts[radish] = counts[radish] + 1
def max_voted_radish(stats):
    return[key for key,val in stats.iteritems() if val == max(stats.values())]
def min_voted_radish(stats):
    return[key for key,val in stats.iteritems() if val == min(stats.values())]

for line in open("radishsurvey.txt"):
    line = line.strip()
    name, vote = line.split(" - ")
    name = clean_string(name)
    vote = clean_string(vote)

    if not has_already_voted(name):
        count_vote(vote)
    voted.append(name)
if len(fraud)==1:
    print("--------------------------------------")
    print("There is only one fraud :" +fraud[0])
    print("--------------------------------------")
elif len(fraud)>1:
    print("--------------------------------------")
    print("There are total "+ str(len(fraud))+" frauds And they are : ")
    print("--------------------------------------")
    for i in xrange(len(fraud)):
        print(str(i+1)+" . "+fraud[i])
        
#print(max(counts.iterkeys(), key=lambda k: counts[k]))
x=(max_voted_radish(counts))
print("--------------------------------------")
print("The most popular radish variety")
print("--------------------------------------")
i=1
for elem in x:
    print(str(i)+" . "+elem +" With Total Vote : "+str(counts[elem]))
    i=i+1
y=(min_voted_radish(counts))
print("--------------------------------------")
print("The least popular radish variety")
print("--------------------------------------")
i=1
for elem in y:
    print(str(i)+" . "+elem +" With Total Vote : "+str(counts[elem]))
    i=i+1
print("--------------------------------------")
print("The Leader Board")
print("--------------------------------------")
for name in counts:
    print(name + ": " + str(counts[name]))

Output : 



2 comments :