In our last project we used data to estimate systems of food demand
using different datasets. An output from that project was as set of
cfe.Regression
objects; these bundle together both data and the results
from the demand system estimation, and can be used for prediction as
well.
Here we’ll explore some of the uses of the cfe.Regression
class, using
an instance created previously (as in Project 3).
After having estimated a demand system using data from our favorite country, we can imagine different counterfactual scenarios. What if prices were different? What if we give a cash transfer to a household? What if school fees reduce the budget for food? What are the consequences of any of these for diet & nutrition?
If you don’t already have the latest version of the CFEDemands
package
installed, grab it, along with some dependencies:
!pip install -r requirements.txt
import pandas as pd
import cfe.regression as rgsn
We’ll get data from two places. First, basic data, including a food conversion table and recommended daily intakes table can be found in a google spreadsheet.
Here are addresses of google sheets for different dataframes for the case of Uganda:
InputFiles = {'Expenditures':('1yVLriVpo7KGUXvR3hq_n53XpXlD5NmLaH1oOMZyV0gQ','Expenditures (2019-20)'),
'Prices':('1yVLriVpo7KGUXvR3hq_n53XpXlD5NmLaH1oOMZyV0gQ','Prices'),
'HH Characteristics':('1yVLriVpo7KGUXvR3hq_n53XpXlD5NmLaH1oOMZyV0gQ','HH Characteristics'),
'FCT':('1yVLriVpo7KGUXvR3hq_n53XpXlD5NmLaH1oOMZyV0gQ','FCT'),
'RDI':('1yVLriVpo7KGUXvR3hq_n53XpXlD5NmLaH1oOMZyV0gQ','RDI'),}
from eep153_tools.sheets import read_sheets
import numpy as np
import pandas as pd
def get_clean_sheet(key,sheet=None):
df = read_sheets(key,sheet=sheet)
df.columns = [c.strip() for c in df.columns.tolist()]
df = df.loc[:,~df.columns.duplicated(keep='first')]
df = df.drop([col for col in df.columns if col.startswith('Unnamed')], axis=1)
df = df.loc[~df.index.duplicated(), :]
return df
# Get prices
p = get_clean_sheet(InputFiles['Prices'][0],
sheet=InputFiles['Prices'][1])
if 'm' not in p.columns: # Supply "market" indicator if missing
p['m'] = 1
p = p.set_index(['t','m'])
p.columns.name = 'j'
p = p.apply(lambda x: pd.to_numeric(x,errors='coerce'))
p = p.replace(0,np.nan)
fct = get_clean_sheet(InputFiles['FCT'][0],
sheet=InputFiles['FCT'][1])
fct = fct.set_index('j')
fct.columns.name = 'n'
fct = fct.apply(lambda x: pd.to_numeric(x,errors='coerce'))
################## RDI, if available (consider using US) #####################
rdi = get_clean_sheet(InputFiles['RDI'][0],
sheet=InputFiles['RDI'][1])
rdi = rdi.set_index('n')
rdi.columns.name = 'k'
An instance r
of cfe.Regression
can be made persistent with
r.to_pickle('my_result.pickle')
, which saves the instance “on disk”, and can be loaded using cfe.regression.read_pickle
. We use this method below to load data and demand system previously estimated for Uganda:
r = rgsn.read_pickle('uganda_2019-20.pickle') # Assumes you've already set this up e.g., in Project 3
Choose reference prices. Here we’ll choose a particular year, and average prices across markets. If you wanted to focus on particular market you’d do this differently.
# Reference prices chosen from a particular time; average across place.
# These are prices per kilogram:
pbar = p.xs('2019-20',level='t').mean()
pbar = pbar[r.beta.index] # Only use prices for goods we can estimate
Get food budget for all households, then find median budget:
import numpy as np
xhat = r.predicted_expenditures()
# Total food expenditures per household
xbar = xhat.groupby(['i','t','m']).sum()
# Reference budget
xref = xbar.quantile(0.5) # Household at 0.5 quantile is median
Get quantities of food by dividing expenditures by prices:
qhat = (xhat.unstack('j')/pbar).dropna(how='all')
# Drop missing columns
qhat = qhat.loc[:,qhat.count()>0]
qhat
Finally, define a function to change a single price in the vector
def my_prices(p0,p=pbar,j='Millet'):
"""
Change price of jth good to p0, holding other prices fixed.
"""
p = p.copy()
p.loc[j] = p0
return p
import matplotlib.pyplot as plt
%matplotlib notebook
use = 'Millet' # Good we want demand curve for
# Vary prices from 50% to 200% of reference.
scale = np.linspace(.5,2,20)
# Demand for Millet for household at median budget
plt.plot([r.demands(xref,my_prices(pbar[use]*s,pbar))[use] for s in scale],scale)
# Demand for Millet for household at 25% percentile
plt.plot([r.demands(xbar.quantile(0.25),my_prices(pbar[use]*s,pbar))[use] for s in scale],scale)
# Demand for Millet for household at 75% percentile
plt.plot([r.demands(xbar.quantile(0.75),my_prices(pbar[use]*s,pbar))[use] for s in scale],scale)
plt.ylabel(f"Price (relative to base of {pbar[use]:.2f})")
plt.xlabel(f"Quantities of {use} Demanded")
fig,ax = plt.subplots()
ax.plot(np.log(scale*xref),[r.expenditures(s*xref,pbar)/(scale*xref) for s in scale])
ax.set_xlabel(f'log budget (relative to base of {xref:.0f}')
ax.set_ylabel(f'Expenditure share')
ax.set_title('Engel Curves')