The Vote Counting Problem
In a world where every moment is being
captured in making a story from insurmountable
information we have yet another innovation from Guido Van Rossum a
software named as Python in helping us doing the same with an ease.
Let us try and figure out in a nut
shell how Python helps us in extracting and presenting information
Scenario
: We are trying to figure out some
information's regarding a person's choice on the variety of radish.
Introduction :
The file is a text file which contains
300 lines of a raw data from a survey , where each line consists of a name, a hyphen, then a
radish variety .
Note: The default directory is being
used to save the raw data downloaded in txt format , this directory gets
created when we install anaconda suit in the machine.
Use Of Function :
So , We got our Fresh Result .
Targeting Result :
Our Target is to get some useful
information out of this raw data.But
What information we are looking for to get out of this raw data. Let's say
:
1. What's the most popular radish variety?
2. What are the least popular?
3. Did anyone vote twice?
What we
need to know -
Well first thing first , This is a
Text data which means we must have knowledge of Working With Strings
.
.
Array :
·
First of all we
will process the raw data and store two information in the form of strings in to
two different variable . Lets say, one is
name and another one is vote .
So in the above example we use the
code :
·
We stripped out
each line from the text file
·
Split the line by
“ – “ and stored it in two variables name and vote
·
Finally printed
our result .
Now
, let's make these two variables as array and put the data over there for some
future use.
Output:
Perform Checking for Duplication :
let's quickly perform some operation
on these arrays to check if there are any duplication.
Output:
So from the list we can infer there
are fraud voters / duplicate values by taking the Example of “Red Kings” ,
which is repeated three times .
So definitely there is something
wrong.
Let's investigate in some other ways
as well !
Using Dictionary :
Output:
So here we have simply created one
blank Dictionary named counts {} . Then we have put the two Things name and the
voted radish variety . We only put the count of Each Variety in the Dictionary
.
Here as well the output exhibits some
flaws within !
Cleaning the data :
Let's Check something same with
person’s name ( This time more cautiously , we will remove extra spaces from
person’s name and also capitalize all the letters ) .
Output :
A Haaa !!!
So here is Phoebe barwell and Procopiozito who are the frauds …
So how actually we found them ? ..
·
We created one
empty array named voted.
·
Then we went
through each line of the text and took out the names of the voters from the
line Then performed the cleanup by using capitalize() keyword to make all the
names in Capital letters , and replace() keyword to replace extra spaces
between First name and Surname .
Now ,Lets use some user defined
functions to make the lines of code shorter.
Output:
Conclusion:
Now ,Lets make the program more efficient
so that it can give us the answer for the following questions :
1. What's the most popular radish variety?
2. What are the least popular?
3. Did anyone vote twice?
4. All voting result.
Code :
# Create an empty dictionary for associating radish names # with vote counts counts = {} fraud=[] # Create an empty list with the names of everyone who voted voted = [] # Clean up (munge) a string so it's easy to match against other strings def clean_string(s): return s.strip().capitalize().replace(" "," ") # Check if someone has voted already and return True or False def has_already_voted(name): if name in voted: fraud.append(name) return True return False # Count a vote for the radish variety named 'radish' def count_vote(radish): if not radish in counts: # First vote for this variety counts[radish] = 1 else: # Increment the radish count counts[radish] = counts[radish] + 1 def max_voted_radish(stats): return[key for key,val in stats.iteritems() if val == max(stats.values())] def min_voted_radish(stats): return[key for key,val in stats.iteritems() if val == min(stats.values())] for line in open("radishsurvey.txt"): line = line.strip() name, vote = line.split(" - ") name = clean_string(name) vote = clean_string(vote) if not has_already_voted(name): count_vote(vote) voted.append(name) if len(fraud)==1: print("--------------------------------------") print("There is only one fraud :" +fraud[0]) print("--------------------------------------") elif len(fraud)>1: print("--------------------------------------") print("There are total "+ str(len(fraud))+" frauds And they are : ") print("--------------------------------------") for i in xrange(len(fraud)): print(str(i+1)+" . "+fraud[i]) #print(max(counts.iterkeys(), key=lambda k: counts[k])) x=(max_voted_radish(counts)) print("--------------------------------------") print("The most popular radish variety") print("--------------------------------------") i=1 for elem in x: print(str(i)+" . "+elem +" With Total Vote : "+str(counts[elem])) i=i+1 y=(min_voted_radish(counts)) print("--------------------------------------") print("The least popular radish variety") print("--------------------------------------") i=1 for elem in y: print(str(i)+" . "+elem +" With Total Vote : "+str(counts[elem])) i=i+1 print("--------------------------------------") print("The Leader Board") print("--------------------------------------") for name in counts: print(name + ": " + str(counts[name]))
Output :
Well structured, Informative presentation.
ReplyDeletenice one!!!
ReplyDelete