Monday, July 13, 2015

Our way the R Way !



What is behind statistics, analytics and visualizations that today's brightest data scientists and business leaders rely on,to make powerful decisions.

The answer would be R

It so happened that before exploring the field of data science the letter R was just an element of the set of consonants ..but at present unlike the older times it stands for the revolution in the world of data science

The Statistical programming language that data experts from all over the world use for everything, staring from mapping broad social and marketing trends on mind to developing financial and climate models that drive our economies and communities.

But what exactly is R ? and where did R start from ?
It started from New Zealand where two professors Robert Gentleman and Ross Ihaka who wanted a better statistical platform for their students, so they created one modeled after the statistical language S and named it as R , or the Open Source R. They along with many others kept working on and using R creating new tools for the same and finding new applications everyday.

Thanks to the world wide community effort, R kept growing with thousands of user created libraries built to enhance the functionality of R and grab quality validation support from the industry recognized leaders.

Every field uses R in present times as R easily helps users to  interpret , interact with and visualize Data at ease

R continues to shape the future of statistical analysis and data sciences with every passing moment.

Along with my friends i too started on to venture R in the world of data science, the stepping stone being a hands onlearning course from a website called DataCamp on R and there after solving a problem to predict the percentage of survivors of a historic ship wreck - The Titanic Wreck in Kaggle.

This is to notify that the data set considered for doing the analysis is fabricated.
praxis.ac.in

Sunday, July 5, 2015

Big Data in Small Words. . .

The first question which raises in ones mind on hearing the word Big Data is how big is "BIG"

Big data might be petabytes (1,024 terabytes) or exabytes (1,024 petabytes) of data consisting of billions to trillions of records of millions of people—all from different sources (e.g. Web, sales, customer contact center, social media, mobile data and so on).













Hence it can be concluded that Big data is similar to small data but bigger in size.

An important examples would be:

Wallmart handles more than 1 million customer transaction every hour.

The 3 major characteristics of Big Data:

Volume: Facebook ingests 500 terabytes of new data every hour

Velocity: High-frequency stock trading algorithms reflect market changes within micro seconds

Variety: Big data isn't just numbers,dates and strings. It is also 3D data, audio file, video file, unstructured text, log files and social media

Big data may be:

Structured: Traditional data warehousing
Semi-structured: Text Mining
Unstructured: Video Surveillance

Type of Tools used in Big Data:

Where is processing is hosted?
-Distributed Servers/Cloud (Amazon EC2)

Where is data stored?
-Distributed Storage (Amazon S3)

What is the programming model?
- Distributed Processing ( MapReduce)

How is data stored and indexed?
-High-performance schema-free databases (MongoDB)

What operations are performed on data?
-Analytic Processing

Some important points:

Big data helps us in capturing the social media explosion in present times.

Unlike the traditional way of analyzing a subset of information collected in samples , here we analyse the entire data set, which results into better business decision both strategic and operational.

Big Data - The Indian Way

A Hyderabad based analytics firm named Modak Analytics had built India’s first Big Data-based electoral repository system.

The company brought together data of 81.4 crore Indian voters that’s 18 terabytes of data  which includes 10 TB in .pdf format.

Mentioned below are a few insights from their report:

Of the 13.4 crore voters in Uttar Pradesh, the country’s biggest State by number of voters, at least 1.2 crore people have Ram somewhere in their name.

In Andhra Pradesh, the name Srinivas is spelt 600 different ways. About three lakh women in Gujarat have Gita Ben as their first name, while Bihar is home for 3.27 lakh women with Sita as their first name and an almost equal number of women named Geeta. Ramesh seems to be the most common first name across the nation.

The other names that are quite popular are: Lakshmi (19.28 lakh, Andhra Pradesh), Fernandes (81,000, Goa), Shankar (11.41 lakh) and Patil (24 lakh, Maharashtra).

Two longest names for voters are registered in Andhra Pradesh – E Janake Sathya Surya Vijaya Durga Maheshvari in Sangareddy constituency and Venkata Sathya Suriya Maitreyi Kumari Toleti in Narsapur constituency.

In Chhattisgarh, the age of a voter is marked as 19,545 years, while 64 voters in AP has ‘0’ years of age.

Thus I conclude with the words of Andrew McAfee "The world is one big data problem."

Request you to visit the following links:

http://datasceptre.blogspot.in/
datadosage.blogspot.in
http://analyticsyatra.blogspot.in/



http://praxis.ac.in/