Wikipedia tells us that Experimental Philosophy (X-Phi) is:
an emerging field of philosophical inquiry that makes use of empirical data—often gathered through surveys which probe the intuitions of ordinary people—in order to inform research on philosophical questions. This use of empirical data is widely seen as opposed to a philosophical methodology that relies mainly on a priori justification, sometimes called "armchair" philosophy by experimental philosophers.
So what makes X-Phi experimental is the use of data rather than (presumably) data-less a priori reasoning. This is confusing. Even when employing 'pure reason', philosophers use data - if only the data of their senses, experience and consciousness. Would anyone deny that Descartes used data when he came up with the Cogito? That it was the data of his own experience doesn't make it less valid qua data.
So if the "armchair" philosopher is using data as well, what's the virtue of the X-Phi approach? It seems that they want to gather multiple data points (intuitions, perceptions) and specifically those of non-indoctrinated-into-philosophy regular shmoes. These can be used either to a) validate an armchair hypothesis or b) replace an armchair hypothesis with an 'experimental' one. The conceit is that consensus via multiple intuitions/perceptions has greater validity than a single POV and that 'untutored' or 'uneducated' intuitions/perceptions have more authenticity than armchair ones.
Let's look at this, starting with the approach. Does X-Phi seek to validate or supplant armchair conclusions? It seems clearly the latter. Validating armchair philosophical hypotheses is called studying and doing philosophy. When philosophers comment on whether a thinker's view accords with their own experience or intuition, they are in-/validating it. When a lecturer, teacher, professor, docent or quack like me brings such an idea to a classroom, coffee shop, blog or where ever non-philosophically trained people have access to it and asks whether it accords with their intuitions, she is validating the views of an armchair philosopher king against the yardstick of the great unwashed masses.
I don't see value in the approach of experimental philosophy unless it seeks to supplant armchair hypothesizing with 'crowdsourced' hypothesizing. Which commits them to prioritizing both quantity (many vs. one) and some notion of the authenticity of the common person's POV. Let us grant the latter: once one has gone down the rabbit hole, there is no coming back. Philosophical education and training estranges one from one's own authentic intuitions by drenching them in theoretical constructs (or some other such obfuscation). We want to pursue X-Phi because we want a volume of intuitions/perceptions, a statistically significant representation from the right population.
I'm not going to criticize X-Phi's approach to intuitions. If you want that, check this out. I want to look at its experimental approach. You don't have to be a scientist to understand Design of Experiments (DoE). (We bust that bad boy out in the Six Sigma wild west of business too!) If you want to test the effect of a variable you set up an experiment to hold all things constant except that variable. You have the null hypothesis which assumes that there is no measurable difference between state A of the variable and state B. The experiment should be designed to isolate the variable in such a way that the same measure can be taken when state A obtains v. state B. If a statistically significant difference is indicated, you have a data point against the null hypothesis. Collect enough of those and you have a theory.
So let's take a look at a "celebrated finding" from X-Phi known as the "Knobe effect" for Joshua Knobe. I'll let you watch the video below (and check out this page) for the details but I encourage you to consider my abstracted formulation first. I think the content of the study might have an effect on the outcome - above the formal structure of the experiment. Participants are presented with two scenarios in which an agent T goes ahead with an action A knowing in advance that A will i) harm or ii) help X. The intent of A is something else Y. The question is: in doing A for the purpose Y, if T knows A will harm X, does T harm X intentionally? Conversely, if T knows that A will help X, does T help X intentionally?
So if that wasn't clear, let me present it this way: Agent T takes action A to accomplish Y. In the two scenarios:
- #1: T knows that A will harm X
- #2: T knows that A will help X
The question posed to participants is: in #1 did T intentionally harm X and in case #2 did T intentionally help X? The goal of the experiment is to get at people's intuitions around intentionality with respect to harm and help. And the results of this experiment certainly suggest a difference (statistical analysis aside). My question is whether this is a well constructed experiment in the fashion that we understand it.
First question is whether the same measure is being taken at the end of the study. There are actually two separate questions being asked: did T intentionally harm X and did T intentionally help X. These are not the same measurement. The experiment seeks to hold everything variable except the words "harm" and "help". In this case, the null hypothesis would be that there is no difference in outcome when you use the word "harm" vs. the word "help" in the same sentence or situation. This seems to me to be patently false from the get go which should call into question the null hypothesis and the entire experiment.
The second issue I see is that what is being measured is unclear. Is it participant intuition about intention, foreknowledge, harm or help? All of the above? The third is the very, very heavily determined nature of the situation. Again, I'll leave it to you to follow the details below, but I can imagine other experimental content set up in the same formal structure that would have different outcomes. If true, what may be learnt from this is something more about how people feel about the specific agent than harm and help.
Suppose you structure the experiment this way. T does A for Y. In scenario #1, T knows that A will harm X and in scenario #2, T does not know that A will harm X. You then ask the same question in both situations: did T intentionally harm X? Your null hypothesis is that knowing and not knowing about harm before taking action don't have any effect on 3rd party judgments of intention, which you are pretty sure will be falsifiable (based on your own intuitions). If you find that people think knowing in advance is correlated with intention, then you have a reasonable claim for saying that foreknowledge is a prerequisite for ascribing intention. If on the other hand you discover that people think T acts intentionally out of failure to investigate or willful ignorance, your null hypothesis is validated and you need to refine the experiment.
Variable selection in an experiment is critical. This experiment is the equivalent of saying 'T ____ X' where ____ could be just about any verb in the English language. Because verbs have different meanings, substituting one for another isn't tweaking a variable, it's changing the experiment. Which is not to say the Knobe effect (whatever it is) isn't interesting; it just isn't an experimental outcome that can be used to (in)validate a hypothesis. It's just an interesting phenomenon.