Data Management and Visualization — Week 2

Dipika Parbatsinh Pawar
3 min readAug 24, 2020

--

My name is Dipika Pawar and this blog is a part of Data Management and Visualization course on coursera. The submission of the assignments will be in the form of blogs. So I choose medium as my medium of assignment. Click here for blog 1

Assignment 2:

In assignment 2, we have to first choose the framework for coding like python/SaS. So choose python for coding. Then we have to write code through which we can load the dataset in the memory and calculate frequency distribution of chosen variables.

Code:

Code Description:

Step 1: Import necessary libraries

Step 2: Read data file

Step 3: Count the variable rows and display along with variable column

Output:

213
16
counts for Country — Data availble for which Country
Paraguay 1
United States 1
Cambodia 1
China 1
Aruba 1
..
Dominica 1
French Polynesia 1
Liberia 1
Kuwait 1
Gabon 1
Name: country, Length: 213, dtype: int64

percentages for Country — Data availble for which Country
Paraguay 0.004695
United States 0.004695
Cambodia 0.004695
China 0.004695
Aruba 0.004695

Dominica 0.004695
French Polynesia 0.004695
Liberia 0.004695
Kuwait 0.004695
Gabon 0.004695
Name: country, Length: 213, dtype: float64

counts for Income of person — country wise average income per person
12505.2125447354 1
27110.731590755 1
5188.9009351913 1
276.200412964869 1
15822.1121405699 1
..
4495.04626152988 1
2221.18566404139 1
1324.19490626644 1
1383.40186887912 1
320.771889948584 1
Name: incomeperperson, Length: 191, dtype: int64

percentages for Income of person — country wise average income per person
12505.2125447354 0.004695
27110.731590755 0.004695
5188.9009351913 0.004695
276.200412964869 0.004695
15822.1121405699 0.004695

4495.04626152988 0.004695
2221.18566404139 0.004695
1324.19490626644 0.004695
1383.40186887912 0.004695
320.771889948584 0.004695
Name: incomeperperson, Length: 191, dtype: float64

counts for Alcohol consumption — in liters
8.7 1
.74 1
.32 1
8.65 1
3.92 1
..
4.98 1
3.99 1
4.39 1
8.84 1
6.28 1
Name: alcconsumption, Length: 181, dtype: int64
percentages for Alcohol consumption — in liters
8.7 0.004695
.74 0.004695
.32 0.004695
8.65 0.004695
3.92 0.004695

4.98 0.004695
3.99 0.004695
4.39 0.004695
8.84 0.004695
6.28 0.004695
Name: alcconsumption, Length: 181, dtype: float64

counts for Life expectancy –in years
81.012 1
78.371 1
73.979 2
68.846 1
74.576 1
..
76.835 1
48.398 1
75.632 1
51.088 1
68.494 1
Name: lifeexpectancy, Length: 190, dtype: int64

percentages for Life expectancy –in years
81.012 0.004695
78.371 0.004695
73.979 0.009390
68.846 0.004695
74.576 0.004695

76.835 0.004695
48.398 0.004695
75.632 0.004695
51.088 0.004695
68.494 0.004695
Name: lifeexpectancy, Length: 190, dtype: float64

Conclusion:

  • Here my variables are not categorical. That’s why the percentage is unusable.

Extra:

  • Find the number of empty shells in the data

--

--

Dipika Parbatsinh Pawar

Student at Ahmedabad University| Data Science Enthusiast| Backend Developer