Skip to content

Crop Outliers for MI

Abhishek-lodha edited this page Jan 22, 2018 · 7 revisions

This explains the algorithm used for removing crop rate outliers from the transactional data collected through LOOP.


Steps:

  1. Construct Dataframe with ['Date', 'Aggregator', 'Market', 'Gaddidar', 'Farmer', 'Crop', 'Quantity', 'Price', 'Amount'] as column values after fetching data from database.

  2. Group by dataframe obtained in step 1 on ['Date', 'Market', 'Crop'] and aggregate Quantity by taking sum - Initial_Total_Quantity

  3. Apply get_statistics method on the above dataframe to compute Av_Rate , Total_quantity, Deviation, Max_deviation, STD, and ratios like D/Av, D/STD, STD/Av and calculate deviation_factor = D/Av * D/STD.

  4. Assign a Flag value to every row of the dataframe where :

  • Flag = 1: Okay
  • Flag = 2: MI Outlier
  • Flag = 3: Incorrect data. Ask admin
  • Flag = 4: No clue. Try other method. Don't send to MI.
  • Flag = 5: Iterate.
  1. Refer this to raise flags: here.

  2. Segregate flagged dataframe into multiple list :

  • combined_transactions_final_data - filtered where flag != 5
  • combined_transactions_iteration_data - filtered where flag = 5 and Deviation < max_deviation :- to remove the items where deviation is maximum.
  • combined_transactions_non_iteration_data - filtered where flag = 5 and Deviation = max_deviation :- No need to iterate over these rows further.
  • Update flag = 4 if remaining quantity < 60 % of Initial Total Quantity.
  1. Conditions:
  • If combined_transactions_iteration_data is not empty : Repeat step 6.
  • Else if FlagMax and FlagMin value for ['Date', 'Market', 'Crop'] is same and equal to 4 in combined_transactions_final_data, then move those rows in combined_transactions_non_iteration_data list.