Causal Inference in Crop Trials
There has been much recent interest in studying Causal analyis and Causal reasoning in the data science and machine learning research communities. I think this is interesting because much of human knowledge is derived from observational studies. Machine Learning allows us to make Predictions and answer the question: what will happen?, whereas causal inference is about what would happen (in case of a certain action/intervention). In this post i provide some brief initial thoughts on how this sub-field can help answer useful questions that arise when conducting crop trials to optimize for crop quality, yield or resilience.
During experimentation and crop trial for examples, we maybe interested in understanding the outcomes (desired or undesired effects) associated with particular action(s) (also known as interventions) prescribed by the recipes (or policies) during the course of process execution. i.e we need to find correlations between recipe interventions and changes in crop state (which can ultimately be viewed, with some modicum of confidence, as statements of causation – i.e., statements about how certain steps in a recipe process cause certain changes in crop growth state).
This is going beyond statistical patterns and finding causal structures. i.e causal inference is about drawing conclusions about cause and effects. In case of Yield predictions, In future, rather than relying on the randomized control trial, we may have lots of observational data (from our experiments). We may use this data to
go beyound correlations and infer the causal relationships. e.g we can answer questions like 'What is effect of increasing the duration of light by one hour on the final yield' and Which plant species will likely have benefit from this kind of intervention (as every plant will respond differently)?.
This line of questioning may extract value from observational data and help us to go beyond the expected results as determined by the recipe. Overall, when dealing with time-variant or non-stationary data, having a deeper understanding of the data might allow us to build a conditional robustness to the datashift. Also this would lead to to a machinery that can more accurately predict the sequence of changes to crop state that would be effected by a process prescribed by the recipe (and thence determine if this sequence of state changes are in fact desired). In an ideal world, one would have access to a causal theory correlation process context, task sequence and outcome. Given that this is difficult in practical settings, the intent here would be to for example mine patterns from data that would approximate such a causal theory. However note that using machine learning algorithms that are designed to work on high-dimensional data to answer this type of causal questions is still an open research challenge.
Theory of causality provides a better alternative for finding the root causes of a problem. Causal process mining seeks to use the process execution logs to discover and quantify cause-effect relations. Causal Process Mining can answer the fundamental question: What changes, if implemented, will cause an improvement to the process? Existing process discovery techniques allow us to discover correlation but not causation. In causal analysis we try to develop an understanding that goes beyond the control-flow perspective. We are interested in understanding the outcomes (desired or undesired effects) associated with particular action(s) (interventions) taken during the course of process execution.