-
Notifications
You must be signed in to change notification settings - Fork 8
Crop Outliers for MI
This explains the algorithm used for removing crop rate outliers from the transactional data collected through LOOP.
Steps:
-
Construct Dataframe with ['Date', 'Aggregator', 'Market', 'Gaddidar', 'Farmer', 'Crop', 'Quantity', 'Price', 'Amount'] as column values after fetching data from database.
-
Group by dataframe obtained in step 1 on ['Date', 'Market', 'Crop'] and aggregate Quantity by taking sum - Initial_Total_Quantity
-
Apply get_statistics method on the above dataframe to compute Av_Rate , Total_quantity, Deviation, Max_deviation, STD, and ratios like D/Av, D/STD, STD/Av and calculate deviation_factor = D/Av * D/STD.
-
Assign a Flag value to every row of the dataframe where :
- Flag = 1: Okay
- Flag = 2: MI Outlier
- Flag = 3: Incorrect data. Ask admin
- Flag = 4: No clue. Try other method. Don't send to MI.
- Flag = 5: Iterate.
-
Refer this to raise flags: here.
-
Segregate flagged dataframe into multiple list :
- combined_transactions_final_data - filtered where flag != 5
- combined_transactions_iteration_data - filtered where flag = 5 and Deviation < max_deviation :- to remove the items where deviation is maximum.
- combined_transactions_non_iteration_data - filtered where flag = 5 and Deviation = max_deviation :- No need to iterate over these rows further.
- Update flag = 4 if remaining quantity < 60 % of Initial Total Quantity.
- Conditions:
- If combined_transactions_iteration_data is not empty : Repeat step 6.
- Else if FlagMax and FlagMin value for ['Date', 'Market', 'Crop'] is same and equal to 4 in combined_transactions_final_data, then move those rows in combined_transactions_non_iteration_data list.