Exploratory Data Analysis of Crop Yields and Applications of Pesticides¶

-By Fahmid Hasan (B.Sc.Ag(Hons), Bangaldesh Agricutural University, Mymensingh)¶

Importing all necessary libraries¶

In [1]:
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import plotly.express as px
import seaborn as sns
import plotly.io as pio
from scipy.stats import pearsonr
pio.renderers.default = 'notebook'  

Loading the Data¶

In [2]:
pesticides=pd.read_csv('pesticides.csv')
yield_df=pd.read_csv('yield_df.csv')

A quick look at the data sets¶

This data set contains 4349 entries with 7 columns¶

In [3]:
pesticides.sample(10)
Out[3]:
Domain Area Element Item Year Unit Value
2802 Pesticides Use New Caledonia Use Pesticides (total) 2010 tonnes of active ingredients 32.60
1697 Pesticides Use Guyana Use Pesticides (total) 2005 tonnes of active ingredients 270.78
3691 Pesticides Use Sweden Use Pesticides (total) 2003 tonnes of active ingredients 2549.00
2943 Pesticides Use Occupied Palestinian Territory Use Pesticides (total) 2016 tonnes of active ingredients 1645.42
1479 Pesticides Use French Polynesia Use Pesticides (total) 2003 tonnes of active ingredients 44.33
962 Pesticides Use Colombia Use Pesticides (total) 2013 tonnes of active ingredients 54563.38
4057 Pesticides Use Ukraine Use Pesticides (total) 1999 tonnes of active ingredients 32272.25
1504 Pesticides Use Gambia Use Pesticides (total) 2001 tonnes of active ingredients 237.00
2580 Pesticides Use Mauritius Use Pesticides (total) 2015 tonnes of active ingredients 628.52
832 Pesticides Use China, Hong Kong SAR Use Pesticides (total) 1991 tonnes of active ingredients 95.00
In [4]:
pesticides.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 4349 entries, 0 to 4348
Data columns (total 7 columns):
 #   Column   Non-Null Count  Dtype  
---  ------   --------------  -----  
 0   Domain   4349 non-null   object 
 1   Area     4349 non-null   object 
 2   Element  4349 non-null   object 
 3   Item     4349 non-null   object 
 4   Year     4349 non-null   int64  
 5   Unit     4349 non-null   object 
 6   Value    4349 non-null   float64
dtypes: float64(1), int64(1), object(5)
memory usage: 238.0+ KB

This Data set contains 28242 entries with 7 columns¶

In [5]:
yield_df.sample(10)
Out[5]:
Unnamed: 0 Area Item Year hg/ha_yield average_rain_fall_mm_per_year pesticides_tonnes avg_temp
8145 8145 Egypt Sweet potatoes 1994 247138 51.0 4283.00 21.10
3129 3129 Brazil Yams 1995 91587 1761.0 92967.22 27.52
15324 15324 Indonesia Rice, paddy 2012 51360 2702.0 1597.00 25.59
352 352 Angola Wheat 2010 10526 1010.0 40.00 24.44
28073 28073 Zimbabwe Wheat 1991 58912 657.0 6753.00 20.95
3457 3457 Brazil Potatoes 1999 164604 1761.0 127585.00 17.66
22184 22184 Pakistan Soybeans 1997 12942 494.0 16936.00 23.76
17378 17378 Kazakhstan Maize 2012 51877 250.0 8674.58 2.50
17564 17564 Kenya Potatoes 2007 200000 630.0 1578.00 16.60
18077 18077 Madagascar Rice, paddy 1992 20869 1513.0 101.06 19.76
In [6]:
yield_df.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 28242 entries, 0 to 28241
Data columns (total 8 columns):
 #   Column                         Non-Null Count  Dtype  
---  ------                         --------------  -----  
 0   Unnamed: 0                     28242 non-null  int64  
 1   Area                           28242 non-null  object 
 2   Item                           28242 non-null  object 
 3   Year                           28242 non-null  int64  
 4   hg/ha_yield                    28242 non-null  int64  
 5   average_rain_fall_mm_per_year  28242 non-null  float64
 6   pesticides_tonnes              28242 non-null  float64
 7   avg_temp                       28242 non-null  float64
dtypes: float64(3), int64(3), object(2)
memory usage: 1.7+ MB

Exploratory Data Analysis and Visualization (Interactive Charts) using Plotly¶

Topics:¶

  1. Top ten Countries used the highest amount of pesticides in the year of 2016

  2. Pesticide usage in the number one country.

  3. Application of pesticides in Bangladesh from 1990 to 2016.

  4. Correlation Analysis (if there any kind of relationship between Pesticide application and the Yield of Crops)

  5. Pesticide application and the Yields of some dominant crops in Bangladesh.

  6. No.4 Analysis of Rice (The main dominating crop in Bangladesh)

  7. Uneven distribution of Yields and Pesticide application in some past Years

  8. Some Basic Analysis.

1. Top ten Countries used the highest amount of pesticides in the year of 2016¶

In [7]:
ten_pesticides_country=pesticides[pesticides.Year==2016][['Area','Value']].sort_values(by='Value',ascending=False).head(10)
In [8]:
	
fig=px.bar(ten_pesticides_country,
           y='Area',
           x='Value',
           title='Top ten countries which used the highest amount of pesticides in 2016',
           color=ten_pesticides_country.Area)
fig.update_layout(
    yaxis_title='Country',
    xaxis_title='Tonnes of active ingredients',
    width=1200,
    height=700
)
fig.show()

China(Main land) used the highest amount of Pesticides in the year of 2016¶

2. Pesticides Application in China (Different regions) from 1990 to 2016¶

In [9]:
pesticides_china=pesticides[pesticides.Area.isin(['China, Hong Kong SAR', 'China, Macao SAR','China, mainland', 'China, Taiwan Province of'])][['Area','Year','Value']]
In [10]:
fig=px.bar(
    pesticides_china,
    x='Year',
    y='Value',
    color='Area',
    title='Pesticides Usage in China from 1990 to 2016',
)
fig.update_layout(
    yaxis_title='Tonnes of active ingredients',
    width=1200,
    height=700
)
fig.show()

Mainland of China used the highest amount of Pesticides compared to other regions¶

3. Pesticides Usage over the past Years in BD (from 1990 to 2013.)¶

In [11]:
yield_df=yield_df.drop(columns='Unnamed: 0',inplace=False,axis=1)
yield_df['hg/ha_yield']=yield_df['hg/ha_yield']/10000
yield_df=yield_df.rename(columns={'hg/ha_yield':'ton/ha_yield'})
yield_bd=yield_df[yield_df.Area=='Bangladesh']
In [12]:
yield_pest_bd=yield_bd.groupby('Year')[['ton/ha_yield','pesticides_tonnes']].sum()
In [13]:
fig=px.bar(
    yield_pest_bd,
    x=yield_pest_bd.index,
    y='pesticides_tonnes',
    color='ton/ha_yield',
    title='Pesticides Usage over the past Years in BD (from 1990 to 2013.'
)

fig.update_layout(yaxis_title='Pesticide Dose (Tonnes)',
                  width=1200,
                  height=700)
fig.show()

The application of pesticides has increased exponentially over the past years¶

4. Correlation Analysis (if there any kind of relationship between Pesticide application and the Yield of Crops)¶

In [14]:
corr_df=yield_df.corr(numeric_only=True)
corr_df
Out[14]:
Year ton/ha_yield average_rain_fall_mm_per_year pesticides_tonnes avg_temp
Year 1.000000 0.091630 -0.003798 0.140930 0.014409
ton/ha_yield 0.091630 1.000000 0.000962 0.064085 -0.114777
average_rain_fall_mm_per_year -0.003798 0.000962 1.000000 0.180984 0.313040
pesticides_tonnes 0.140930 0.064085 0.180984 1.000000 0.030946
avg_temp 0.014409 -0.114777 0.313040 0.030946 1.000000

Correlation Heatmap¶

In [15]:
plt.figure(figsize=(12,6))
sns.heatmap(
    data=corr_df,
    cmap='Greys'
)
plt.title('Correlations of the Numerical variables')
Out[15]:
Text(0.5, 1.0, 'Correlations of the Numerical variables')
No description has been provided for this image

3d scatter plot¶

In [16]:
fig=px.scatter_3d(yield_df.sample(2000),
               x='pesticides_tonnes',
               y='Year',
               z='ton/ha_yield',
               color='Item',
               size='ton/ha_yield',
               hover_data=['Item'],
               hover_name='Area',
               title='A Scatterplot of Crops Yield vs Pesticide usage from (1990 to 2013) '
               )
fig.update_layout(
    scene=dict(
    yaxis_title='Year',
    xaxis_title='Pesticides Dose (Tonnes)',
    zaxis_title='Yield (tonne/Ha)',),
    width=1200,
    height=700
)
fig.show()

Potato has the highest Tonne/Ha yield.¶

Pesticides usage is very high in Brazil compared to other countries¶

2d scatter plot¶

In [17]:
fig=px.scatter(yield_df.sample(2000),
               x='pesticides_tonnes',
               y='ton/ha_yield',
               color='Item',
               size='ton/ha_yield',
               hover_data=['Item'],
               hover_name='Area',
               title='Crops Yield vs Pesticide usage from (1990 to 2013'
               )
fig.update_layout(
    yaxis_title='Yield (Tonne/Ha)',
    xaxis_title='Pesticides Dose (Tonnes)',
    width=1200,
    height=700
)
fig.show()
In [18]:
correlation_AVG= yield_df[['ton/ha_yield','pesticides_tonnes']].corr()
print(correlation_AVG)
                   ton/ha_yield  pesticides_tonnes
ton/ha_yield           1.000000           0.064085
pesticides_tonnes      0.064085           1.000000

Some countries use less amount of pesticides and get higher Crop yields and vice versa. Brazil uses a significant amount of pesticides, but tonne/ha yield is not that much compared to other countries like Switzerland, Germany, the United Kingdom, etc.¶

Here the correlation coefficient between ton/ha_yield and pesticides_tonnes is approximately 0.0641. Which is very close to 0, indicating a very weak positive linear relationship between pesticide usage and crop yield.¶

4.1 Correlation Analysis In case of Bangladesh¶

3d Scatterplot¶

In [19]:
fig=px.scatter_3d(yield_bd,
               x='pesticides_tonnes',
               y='Year',
               z='ton/ha_yield',
               color='Item',
               size='ton/ha_yield',
               hover_data=['Item'],
               hover_name='Area',
               title='Crops Yield vs Pesticide usage in BD (1990 to 2013)'
               )
fig.update_layout(
    scene=dict(
    yaxis_title='Year',
    xaxis_title='Pesticides Dose (Tonnes)',
    zaxis_title='Yield (tonne/Ha)',),
    width=1200,
    height=700
)
fig.show()

2d Scatterplot¶

In [20]:
fig=px.scatter(yield_bd,
               x='pesticides_tonnes',
               y='ton/ha_yield',
               size='pesticides_tonnes',
               color='pesticides_tonnes',
               hover_data=['Item'],
               hover_name='Area',
               title='Crops Yield vs Pesticide usage in BD (1990 to 2013')

fig.update_layout(
    yaxis_title='Yield (Tonne/Ha)',
    xaxis_title='Pesticides Dose (Tonnes)',
    width=1200,
    height=700)
fig.show()
In [21]:
correlation_BD= yield_bd[['ton/ha_yield','pesticides_tonnes']].corr()
print(correlation_BD)
                   ton/ha_yield  pesticides_tonnes
ton/ha_yield            1.00000            0.14386
pesticides_tonnes       0.14386            1.00000

Same goes here, the correlation coefficient is 0.14386 Which is close to 0, indicating a weak positive linear relationship between pesticide usage and crop yield.¶

5. Pesticide application and the Yields of some dominant crops in Bangladesh.¶

Line Chart (3d)¶

In [22]:
fig=px.line_3d(yield_bd,
            x='pesticides_tonnes',
            z=yield_bd['ton/ha_yield'],
            y='Year',
            hover_data=['Year'],
            color='Item',
            title='Pesticides usage on the yield of some dominent crops in Bangladesh from 1990 to 2013')

fig.update_layout(
    width=1200,
    height=700
)
fig.show()

Line Chart(2d)¶

In [23]:
fig=px.line(yield_bd,
            x='pesticides_tonnes',
            y=yield_bd['ton/ha_yield'],
            hover_data=['Year'],
            color='Item',
            title='Pesticides usage on the yield of some dominent crops in Bangladesh from 1990 to 2013')

fig.update_layout(
    yaxis_title='Yield (Tonne/Ha)',
    xaxis_title='Pesticides Dose (Tonnes)',
    width=1200,
    height=700
)
fig.show()

Potatoes tonne/ha yield is very high compared to other major crops. There was some sort of uneven distribution of pesticides and the yield of crops from 2008 to 2012¶

6. No.5 Analysis for Rice (The main dominating crop in Bangladesh)¶

In [24]:
yield_bd_rice=yield_bd[yield_bd.Item=='Rice, paddy']
In [25]:
fig=px.line(yield_bd_rice,
            x='pesticides_tonnes',
            y='ton/ha_yield',
            title='Effects of pesticides usage on Rice yield in Bangladesh from 1990 to 2013',
            hover_data=['Year']
            )
fig.update_layout(
    yaxis_title='Yield (Tonne/Ha)',
    xaxis_title='Pesticides Dose (Tonnes)',
    width=1200,
    height=700
)
fig.show()

7. Uneven distribution of Crop Yields and Pesticide application in some of the past Years (2008 to 20012)¶

In [26]:
yield_8_9_10_11_12=yield_df[yield_df.Year.isin(range(2008,2013))]
yield_8_9_10_11_12_bd=yield_8_9_10_11_12[yield_8_9_10_11_12.Area=='Bangladesh']
yield_pests_8to12=yield_8_9_10_11_12_bd.groupby('Year')[['ton/ha_yield','pesticides_tonnes']].sum()
In [27]:
fig=px.bar(yield_pests_8to12,y='pesticides_tonnes',
            x=yield_pests_8to12.index,
            title='Pesticides usage and yield of crops from 2008 to 20012',
            color='ton/ha_yield'
            )
fig.update_layout(
    yaxis_title='Pesticides Dose (Tonnes)',
    xaxis_title='Year',
    width=1200,
    height=700
)
fig.show()

Both the application of pesticide and yield of crops were unevenly distributed. There could be some problem with the Data entry in those years¶

7.1 However, the distribution ascended in some of the previous years (2000 to 2007)¶

In [28]:
yield_2000to2007=yield_df[yield_df.Year.isin([2000,2001,2002,2004,2005,2006,2007])]
yield_2000to2007bd=yield_2000to2007[yield_2000to2007.Area=='Bangladesh']
yield_2000to2007bd=yield_2000to2007bd.groupby('Year')[['ton/ha_yield','pesticides_tonnes']].sum()
In [29]:
fig=px.bar(yield_2000to2007bd,y='pesticides_tonnes',
            x=yield_2000to2007bd.index,
            title='Pesticides usage from 2000 to 2007 in BD',
            color='ton/ha_yield'
            )
fig.update_layout(
    yaxis_title='Pesticides dose (Tonnes)',
    xaxis_title='Year',
    width=1200,
    height=700
    )

8. Some Basic analysis¶

Top ten Rice Producing Countries in 2013¶

In [30]:
rice_df=yield_df[yield_df.Item=='Rice, paddy']
rice_df=rice_df[rice_df.Year==2013]
rice_df
top_ten_rice_df_13=rice_df.groupby('Area')['ton/ha_yield'].sum()
top_ten_rice_df_13=top_ten_rice_df_13.sort_values(ascending=False)
top_ten_rice_df_13=top_ten_rice_df_13.head(10)
In [31]:
fig=px.bar(
    top_ten_rice_df_13,
    x=top_ten_rice_df_13.index,
    y=top_ten_rice_df_13,
    title='Top ten Rice producing countries in 2013',
    color=top_ten_rice_df_13.index
)
fig.update_layout(
    xaxis_title='Country',
    yaxis_title='Yield (Tonne/Ha)',
    width=1200,
    height=700
)
fig.show()

Ten highest crops (Maize, Potatoes, Rice, Sorghum, Soybeans, Wheat, Cassava, Sweet potatoes, Plantains and others, Yams) yielding countries from 1990 to 2013¶

In [32]:
highest_crops_yield=yield_df.groupby('Area')['ton/ha_yield'].sum()
highest_crops_yield=highest_crops_yield.sort_values(ascending=False)
ten_highest_crops_yield=highest_crops_yield.head(10)
In [33]:
fig=px.bar(
    ten_highest_crops_yield,
    x=ten_highest_crops_yield.index,
    y=ten_highest_crops_yield,
    title='Ten highest crop yielding Countries (1990 to 2013)',
    color=ten_highest_crops_yield.index
)

fig.update_layout(
xaxis_title='Country',
yaxis_title='Yield (Tonne/Ha)',
width=1200,
height=700
)
fig.show()