Sunday, August 9, 2015

Airline Route :

In this blog we shall try and figure out how Python helps us in extracting information from CSV Files.
Note: CSV-Comma-separated values is a way of expressing structured data in flat text files:
Python comes with a CSV module which provides one way to easily work with CSV- Files

Scenario: 

In the example stated below we shall try and figure out from a freely available airline data set available at  OpenFlights data page in a CSV format comprising the details on the number of airports a city has.

 We will now import and open the CSV file in Python with the help of the following commands 

stated below:


Output : 


Explanation:

We open the CSV file (airports.dat) in python and fetch the data row wise , but as we see the country name is on the fourth column we have to pull out row[3] and appended it in an array country=[]. ** row here temporary array .

Now, let's try to make a program which will fetch the airport names for some explicitly defined countries .Lets say ,In Our example we want to see the names of the airport present in 
“Australia”.


Output :


Explanation:

At first we have created one empty Dictionary airport={}
Secondly each row from the CSV file is imported into a variable line ( which is one array) . As we know in our array the third column contains the country name and the first column contains the airport name , we further tried to check if the dictionary already contains the country name as key in it , if not then using if – else we first created one key as a new country ( else part ). Or if the dictionary already had the country name as a key in it we would just appended the value to it using if part .
Since we  wanted to just see the names of the airport situated in “Australia” ,we printed the values assigned to the dictionary key= “Australia” .
The other countries mentioned in the raw data file can be checked in similar fashion.


Airline Route Histogram

In order to accomplish the task of calculating the distance that needs to be traversed in a certain flight schedule we shall plot a histogram showing the distribution of distances over each flight schedule.
In order to perform this operation we have to adhere to the following steps :
·         Read the airports file (airports.dat) and build a dictionary mapping the unique airport ID to the geographical coordinates (latitude & longitude.) This allows you to look up the location of each airport by its ID.
·         Read the routes file (routes.dat) and get the IDs of the source and destination airports. Look up the latitude and longitude based on the ID. Using those coordinates, calculate the length of the route and append it to a list of all route lengths.
Requirement : Calculating geographic distances is a bit tricky because the earth is a sphere. The distance we measure is the "great circle distance".
In order to calculate "great circle distance"., we have to import a module named geo_distance” , a pre-built function geo_distance.distance() helps us in calculating the distance of each airline route.

Let's now have a quick look how the function geo_distance.distace() works :


Output:


The final code:

import numpy as np
import matplotlib.pyplot as plt
latitudes = {}
longitudes = {}
f = open("airports.dat")
for row in csv.reader(f):
    airport_id = row[0]
    latitudes[airport_id] = float(row[6])
    longitudes[airport_id] = float(row[7])
distances = []
f = open("routes.dat")
for row in csv.reader(f):
    source_airport = row[3]
    dest_airport = row[5]
    if source_airport in latitudes and dest_airport in latitudes:
        source_lat = latitudes[source_airport]
        source_long = longitudes[source_airport]
        dest_lat = latitudes[dest_airport]
        dest_long = longitudes[dest_airport]
        distances.append(geo_distance.distance(source_lat,source_long,dest_lat,dest_long))
plt.hist(distances, 100, facecolor='r')
plt.xlabel("Distance (km)")
plt.ylabel("Number of flights")

Explanation :

We need to read the airport file (airports.dat) and build a dictionary mapping the unique airport ID to the geographical coordinates (latitude & longitude.)
We need to read the routes file (routes.dat) and get the IDs of the source and destination airports. And then look up for the latitude and longitude based on the ID . Finally, using these coordinates, calculate the length of the route and append it to a list distances = []of all route lengths.
At last a histogram is plotted based on the route lengths, to show the distribution of different flight distances.
Output : 




2 comments :

  1. It'sVery informative blog and useful article thank you for sharing with us , keep posting learn more
    Hadoop admin Online Course

    ReplyDelete
  2. Best casinos in the US - Mapyro
    Where to stay 광주광역 출장샵 near casinos? · 김천 출장마사지 Borgata: Borgata Hotel Casino & Spa. The 하남 출장마사지 only other resort I know is at Renaissance Pointe. · The Cosmopolitan of 목포 출장마사지 Las 나주 출장마사지 Vegas

    ReplyDelete