Importing all necessary libraries¶
In [1]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import plotly.express as px
import seaborn as sns
import plotly.io as pio
from scipy.stats import pearsonr
pio.renderers.default = 'notebook'
Loading the Data¶
In [2]:
pesticides=pd.read_csv('pesticides.csv')
yield_df=pd.read_csv('yield_df.csv')
In [3]:
pesticides.sample(10)
Out[3]:
Domain | Area | Element | Item | Year | Unit | Value | |
---|---|---|---|---|---|---|---|
2802 | Pesticides Use | New Caledonia | Use | Pesticides (total) | 2010 | tonnes of active ingredients | 32.60 |
1697 | Pesticides Use | Guyana | Use | Pesticides (total) | 2005 | tonnes of active ingredients | 270.78 |
3691 | Pesticides Use | Sweden | Use | Pesticides (total) | 2003 | tonnes of active ingredients | 2549.00 |
2943 | Pesticides Use | Occupied Palestinian Territory | Use | Pesticides (total) | 2016 | tonnes of active ingredients | 1645.42 |
1479 | Pesticides Use | French Polynesia | Use | Pesticides (total) | 2003 | tonnes of active ingredients | 44.33 |
962 | Pesticides Use | Colombia | Use | Pesticides (total) | 2013 | tonnes of active ingredients | 54563.38 |
4057 | Pesticides Use | Ukraine | Use | Pesticides (total) | 1999 | tonnes of active ingredients | 32272.25 |
1504 | Pesticides Use | Gambia | Use | Pesticides (total) | 2001 | tonnes of active ingredients | 237.00 |
2580 | Pesticides Use | Mauritius | Use | Pesticides (total) | 2015 | tonnes of active ingredients | 628.52 |
832 | Pesticides Use | China, Hong Kong SAR | Use | Pesticides (total) | 1991 | tonnes of active ingredients | 95.00 |
In [4]:
pesticides.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 4349 entries, 0 to 4348 Data columns (total 7 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Domain 4349 non-null object 1 Area 4349 non-null object 2 Element 4349 non-null object 3 Item 4349 non-null object 4 Year 4349 non-null int64 5 Unit 4349 non-null object 6 Value 4349 non-null float64 dtypes: float64(1), int64(1), object(5) memory usage: 238.0+ KB
This Data set contains 28242 entries with 7 columns¶
In [5]:
yield_df.sample(10)
Out[5]:
Unnamed: 0 | Area | Item | Year | hg/ha_yield | average_rain_fall_mm_per_year | pesticides_tonnes | avg_temp | |
---|---|---|---|---|---|---|---|---|
8145 | 8145 | Egypt | Sweet potatoes | 1994 | 247138 | 51.0 | 4283.00 | 21.10 |
3129 | 3129 | Brazil | Yams | 1995 | 91587 | 1761.0 | 92967.22 | 27.52 |
15324 | 15324 | Indonesia | Rice, paddy | 2012 | 51360 | 2702.0 | 1597.00 | 25.59 |
352 | 352 | Angola | Wheat | 2010 | 10526 | 1010.0 | 40.00 | 24.44 |
28073 | 28073 | Zimbabwe | Wheat | 1991 | 58912 | 657.0 | 6753.00 | 20.95 |
3457 | 3457 | Brazil | Potatoes | 1999 | 164604 | 1761.0 | 127585.00 | 17.66 |
22184 | 22184 | Pakistan | Soybeans | 1997 | 12942 | 494.0 | 16936.00 | 23.76 |
17378 | 17378 | Kazakhstan | Maize | 2012 | 51877 | 250.0 | 8674.58 | 2.50 |
17564 | 17564 | Kenya | Potatoes | 2007 | 200000 | 630.0 | 1578.00 | 16.60 |
18077 | 18077 | Madagascar | Rice, paddy | 1992 | 20869 | 1513.0 | 101.06 | 19.76 |
In [6]:
yield_df.info()
<class 'pandas.core.frame.DataFrame'> RangeIndex: 28242 entries, 0 to 28241 Data columns (total 8 columns): # Column Non-Null Count Dtype --- ------ -------------- ----- 0 Unnamed: 0 28242 non-null int64 1 Area 28242 non-null object 2 Item 28242 non-null object 3 Year 28242 non-null int64 4 hg/ha_yield 28242 non-null int64 5 average_rain_fall_mm_per_year 28242 non-null float64 6 pesticides_tonnes 28242 non-null float64 7 avg_temp 28242 non-null float64 dtypes: float64(3), int64(3), object(2) memory usage: 1.7+ MB
Exploratory Data Analysis and Visualization (Interactive Charts) using Plotly¶
Topics:¶
Top ten Countries used the highest amount of pesticides in the year of 2016
Pesticide usage in the number one country.
Application of pesticides in Bangladesh from 1990 to 2016.
Correlation Analysis (if there any kind of relationship between Pesticide application and the Yield of Crops)
Pesticide application and the Yields of some dominant crops in Bangladesh.
No.4 Analysis of Rice (The main dominating crop in Bangladesh)
Uneven distribution of Yields and Pesticide application in some past Years
Some Basic Analysis.
1. Top ten Countries used the highest amount of pesticides in the year of 2016¶
In [7]:
ten_pesticides_country=pesticides[pesticides.Year==2016][['Area','Value']].sort_values(by='Value',ascending=False).head(10)
In [8]:
fig=px.bar(ten_pesticides_country,
y='Area',
x='Value',
title='Top ten countries which used the highest amount of pesticides in 2016',
color=ten_pesticides_country.Area)
fig.update_layout(
yaxis_title='Country',
xaxis_title='Tonnes of active ingredients',
width=1200,
height=700
)
fig.show()
China(Main land) used the highest amount of Pesticides in the year of 2016¶
2. Pesticides Application in China (Different regions) from 1990 to 2016¶
In [9]:
pesticides_china=pesticides[pesticides.Area.isin(['China, Hong Kong SAR', 'China, Macao SAR','China, mainland', 'China, Taiwan Province of'])][['Area','Year','Value']]
In [10]:
fig=px.bar(
pesticides_china,
x='Year',
y='Value',
color='Area',
title='Pesticides Usage in China from 1990 to 2016',
)
fig.update_layout(
yaxis_title='Tonnes of active ingredients',
width=1200,
height=700
)
fig.show()
Mainland of China used the highest amount of Pesticides compared to other regions¶
3. Pesticides Usage over the past Years in BD (from 1990 to 2013.)¶
In [11]:
yield_df=yield_df.drop(columns='Unnamed: 0',inplace=False,axis=1)
yield_df['hg/ha_yield']=yield_df['hg/ha_yield']/10000
yield_df=yield_df.rename(columns={'hg/ha_yield':'ton/ha_yield'})
yield_bd=yield_df[yield_df.Area=='Bangladesh']
In [12]:
yield_pest_bd=yield_bd.groupby('Year')[['ton/ha_yield','pesticides_tonnes']].sum()
In [13]:
fig=px.bar(
yield_pest_bd,
x=yield_pest_bd.index,
y='pesticides_tonnes',
color='ton/ha_yield',
title='Pesticides Usage over the past Years in BD (from 1990 to 2013.'
)
fig.update_layout(yaxis_title='Pesticide Dose (Tonnes)',
width=1200,
height=700)
fig.show()
The application of pesticides has increased exponentially over the past years¶
4. Correlation Analysis (if there any kind of relationship between Pesticide application and the Yield of Crops)¶
In [14]:
corr_df=yield_df.corr(numeric_only=True)
corr_df
Out[14]:
Year | ton/ha_yield | average_rain_fall_mm_per_year | pesticides_tonnes | avg_temp | |
---|---|---|---|---|---|
Year | 1.000000 | 0.091630 | -0.003798 | 0.140930 | 0.014409 |
ton/ha_yield | 0.091630 | 1.000000 | 0.000962 | 0.064085 | -0.114777 |
average_rain_fall_mm_per_year | -0.003798 | 0.000962 | 1.000000 | 0.180984 | 0.313040 |
pesticides_tonnes | 0.140930 | 0.064085 | 0.180984 | 1.000000 | 0.030946 |
avg_temp | 0.014409 | -0.114777 | 0.313040 | 0.030946 | 1.000000 |
Correlation Heatmap¶
In [15]:
plt.figure(figsize=(12,6))
sns.heatmap(
data=corr_df,
cmap='Greys'
)
plt.title('Correlations of the Numerical variables')
Out[15]:
Text(0.5, 1.0, 'Correlations of the Numerical variables')
3d scatter plot¶
In [16]:
fig=px.scatter_3d(yield_df.sample(2000),
x='pesticides_tonnes',
y='Year',
z='ton/ha_yield',
color='Item',
size='ton/ha_yield',
hover_data=['Item'],
hover_name='Area',
title='A Scatterplot of Crops Yield vs Pesticide usage from (1990 to 2013) '
)
fig.update_layout(
scene=dict(
yaxis_title='Year',
xaxis_title='Pesticides Dose (Tonnes)',
zaxis_title='Yield (tonne/Ha)',),
width=1200,
height=700
)
fig.show()
2d scatter plot¶
In [17]:
fig=px.scatter(yield_df.sample(2000),
x='pesticides_tonnes',
y='ton/ha_yield',
color='Item',
size='ton/ha_yield',
hover_data=['Item'],
hover_name='Area',
title='Crops Yield vs Pesticide usage from (1990 to 2013'
)
fig.update_layout(
yaxis_title='Yield (Tonne/Ha)',
xaxis_title='Pesticides Dose (Tonnes)',
width=1200,
height=700
)
fig.show()
In [18]:
correlation_AVG= yield_df[['ton/ha_yield','pesticides_tonnes']].corr()
print(correlation_AVG)
ton/ha_yield pesticides_tonnes ton/ha_yield 1.000000 0.064085 pesticides_tonnes 0.064085 1.000000
Some countries use less amount of pesticides and get higher Crop yields and vice versa. Brazil uses a significant amount of pesticides, but tonne/ha yield is not that much compared to other countries like Switzerland, Germany, the United Kingdom, etc.¶
Here the correlation coefficient between ton/ha_yield and pesticides_tonnes is approximately 0.0641. Which is very close to 0, indicating a very weak positive linear relationship between pesticide usage and crop yield.¶
4.1 Correlation Analysis In case of Bangladesh¶
3d Scatterplot¶
In [19]:
fig=px.scatter_3d(yield_bd,
x='pesticides_tonnes',
y='Year',
z='ton/ha_yield',
color='Item',
size='ton/ha_yield',
hover_data=['Item'],
hover_name='Area',
title='Crops Yield vs Pesticide usage in BD (1990 to 2013)'
)
fig.update_layout(
scene=dict(
yaxis_title='Year',
xaxis_title='Pesticides Dose (Tonnes)',
zaxis_title='Yield (tonne/Ha)',),
width=1200,
height=700
)
fig.show()
2d Scatterplot¶
In [20]:
fig=px.scatter(yield_bd,
x='pesticides_tonnes',
y='ton/ha_yield',
size='pesticides_tonnes',
color='pesticides_tonnes',
hover_data=['Item'],
hover_name='Area',
title='Crops Yield vs Pesticide usage in BD (1990 to 2013')
fig.update_layout(
yaxis_title='Yield (Tonne/Ha)',
xaxis_title='Pesticides Dose (Tonnes)',
width=1200,
height=700)
fig.show()
In [21]:
correlation_BD= yield_bd[['ton/ha_yield','pesticides_tonnes']].corr()
print(correlation_BD)
ton/ha_yield pesticides_tonnes ton/ha_yield 1.00000 0.14386 pesticides_tonnes 0.14386 1.00000
Same goes here, the correlation coefficient is 0.14386 Which is close to 0, indicating a weak positive linear relationship between pesticide usage and crop yield.¶
5. Pesticide application and the Yields of some dominant crops in Bangladesh.¶
Line Chart (3d)¶
In [22]:
fig=px.line_3d(yield_bd,
x='pesticides_tonnes',
z=yield_bd['ton/ha_yield'],
y='Year',
hover_data=['Year'],
color='Item',
title='Pesticides usage on the yield of some dominent crops in Bangladesh from 1990 to 2013')
fig.update_layout(
width=1200,
height=700
)
fig.show()
Line Chart(2d)¶
In [23]:
fig=px.line(yield_bd,
x='pesticides_tonnes',
y=yield_bd['ton/ha_yield'],
hover_data=['Year'],
color='Item',
title='Pesticides usage on the yield of some dominent crops in Bangladesh from 1990 to 2013')
fig.update_layout(
yaxis_title='Yield (Tonne/Ha)',
xaxis_title='Pesticides Dose (Tonnes)',
width=1200,
height=700
)
fig.show()
Potatoes tonne/ha yield is very high compared to other major crops. There was some sort of uneven distribution of pesticides and the yield of crops from 2008 to 2012¶
6. No.5 Analysis for Rice (The main dominating crop in Bangladesh)¶
In [24]:
yield_bd_rice=yield_bd[yield_bd.Item=='Rice, paddy']
In [25]:
fig=px.line(yield_bd_rice,
x='pesticides_tonnes',
y='ton/ha_yield',
title='Effects of pesticides usage on Rice yield in Bangladesh from 1990 to 2013',
hover_data=['Year']
)
fig.update_layout(
yaxis_title='Yield (Tonne/Ha)',
xaxis_title='Pesticides Dose (Tonnes)',
width=1200,
height=700
)
fig.show()
7. Uneven distribution of Crop Yields and Pesticide application in some of the past Years (2008 to 20012)¶
In [26]:
yield_8_9_10_11_12=yield_df[yield_df.Year.isin(range(2008,2013))]
yield_8_9_10_11_12_bd=yield_8_9_10_11_12[yield_8_9_10_11_12.Area=='Bangladesh']
yield_pests_8to12=yield_8_9_10_11_12_bd.groupby('Year')[['ton/ha_yield','pesticides_tonnes']].sum()
In [27]:
fig=px.bar(yield_pests_8to12,y='pesticides_tonnes',
x=yield_pests_8to12.index,
title='Pesticides usage and yield of crops from 2008 to 20012',
color='ton/ha_yield'
)
fig.update_layout(
yaxis_title='Pesticides Dose (Tonnes)',
xaxis_title='Year',
width=1200,
height=700
)
fig.show()
Both the application of pesticide and yield of crops were unevenly distributed. There could be some problem with the Data entry in those years¶
7.1 However, the distribution ascended in some of the previous years (2000 to 2007)¶
In [28]:
yield_2000to2007=yield_df[yield_df.Year.isin([2000,2001,2002,2004,2005,2006,2007])]
yield_2000to2007bd=yield_2000to2007[yield_2000to2007.Area=='Bangladesh']
yield_2000to2007bd=yield_2000to2007bd.groupby('Year')[['ton/ha_yield','pesticides_tonnes']].sum()
In [29]:
fig=px.bar(yield_2000to2007bd,y='pesticides_tonnes',
x=yield_2000to2007bd.index,
title='Pesticides usage from 2000 to 2007 in BD',
color='ton/ha_yield'
)
fig.update_layout(
yaxis_title='Pesticides dose (Tonnes)',
xaxis_title='Year',
width=1200,
height=700
)
8. Some Basic analysis¶
Top ten Rice Producing Countries in 2013¶
In [30]:
rice_df=yield_df[yield_df.Item=='Rice, paddy']
rice_df=rice_df[rice_df.Year==2013]
rice_df
top_ten_rice_df_13=rice_df.groupby('Area')['ton/ha_yield'].sum()
top_ten_rice_df_13=top_ten_rice_df_13.sort_values(ascending=False)
top_ten_rice_df_13=top_ten_rice_df_13.head(10)
In [31]:
fig=px.bar(
top_ten_rice_df_13,
x=top_ten_rice_df_13.index,
y=top_ten_rice_df_13,
title='Top ten Rice producing countries in 2013',
color=top_ten_rice_df_13.index
)
fig.update_layout(
xaxis_title='Country',
yaxis_title='Yield (Tonne/Ha)',
width=1200,
height=700
)
fig.show()
Ten highest crops (Maize, Potatoes, Rice, Sorghum, Soybeans, Wheat, Cassava, Sweet potatoes, Plantains and others, Yams) yielding countries from 1990 to 2013¶
In [32]:
highest_crops_yield=yield_df.groupby('Area')['ton/ha_yield'].sum()
highest_crops_yield=highest_crops_yield.sort_values(ascending=False)
ten_highest_crops_yield=highest_crops_yield.head(10)
In [33]:
fig=px.bar(
ten_highest_crops_yield,
x=ten_highest_crops_yield.index,
y=ten_highest_crops_yield,
title='Ten highest crop yielding Countries (1990 to 2013)',
color=ten_highest_crops_yield.index
)
fig.update_layout(
xaxis_title='Country',
yaxis_title='Yield (Tonne/Ha)',
width=1200,
height=700
)
fig.show()