The application of blind and double-blind tests is thought by a small, but vocal, minority in the audio community to be the supreme evaluation standard for detecting audible differences in audio systems. It is true that some types of audio systems are well suited for blind and double-blind A/B or A/B/X type tests. A/B and A/B/X tests are useful in scenarios when the two audio signals being compared are simple in nature. For example, telephone company engineers have routinely used, and continue to use, A/B and A/B/X tests to evaluate improvements in voice circuit quality.     However, we must realize and understand that a test that is suitable for one type of audio system might not be suitable for another. It is worth noting that the same company (the Bell Telephone System) that was responsible for the invention and implementation of telephone service was the same company that was responsible for the invention and implementation of home stereophonic audio systems.    It is even more interesting to note that while A/B and A/B/X tests were found to be appropriate for evaluating voice quality improvements on bandwidth-limited telephone circuits, subjective, non-blind listening tests based on careful listening, evaluator training and realistic home listening conditions were the scientific standards for the evaluation of stereophonic audio systems.         
It should not be too difficult to understand that a testing methodology that is appropriate for evaluating simple band-limited monophonic signals would most probably not be appropriate for evaluating complex stereophonic signals that cover the full range of human hearing and which are designed to convey aural, spatial and tactile information. Telephone systems are audio systems, but they are audio systems which are primarily designed to convey clear voice communication. Stereophonic systems are audio systems, but they are audio systems which are designed to convey a weighty, complex, realistic illusion of a three-dimensional music concert performance.
Origins Of Blind Stereophonic Audio Testing
A paper published by Jon Boley and Michael Lester in the proceedings of the 127th Convention of the Audio Engineering Society, October 2009, stated:
"ABX tests have been around for decades and provide a simple, intuitive means to determine if there is an audible difference between two signals."
Within the audio engineering community, the ABX methodology has become the standard psychoacoustic test for determining if an audible difference exists between two signals." 
The first statement is true if the signals are very simple in nature, especially if they are monophonic signals. The second statement is questionable since both founders of the ABX audio testing religion wrote ten-year follow-up papers lamenting the widespread unacceptance of ABX testing by audio engineers and the audio press.  
Ethan Winer, at his "Audio Myths, Artifact Audibility and Comb Filtering" workshop presented at the 127th Convention of the Audio Engineering Society in October 2009 stated:
"Double blind tests are the gold standard in every field of science."
"It amazes me when some people claim that double blind testing is not valid for assessing audio gear." 
What is truly amazing is that some people would stray so far from the scientifically valid subjective listening evaluation procedures developed at Bell Telephone Laboratories and other electronics firms that participated in the invention and early development of home stereophonic systems (e.g. General Electric, Radio Corporation of America, etc.).
Another amazing feature of the Winer presentation is that he included a staged purse-snatching demonstation (at time 9:56) to illustrate the unreliability of short term visual memory. None of the audience members could accurately identify the "purse-snatcher", even though some were sure that they could. The purpetrator was only in the room for 10 seconds. Mr. Winer later contradicts himself (at time 27:50) by advocating the use of an audio evaluation test that uses short term aural memory.
As far as I have been able to determine, the seminal papers in the application of ABX methodology to stereophonic systems are a paper presented by Stanley Lip****z, Ph.D. and Dr. John Vandekooy Ph.D. in 1980 to the 65th Convention of the Audio Engineering Society in London and a paper presented by David Clark in 1981 to the 69th Convention of the Audio Engineering Society in Los Angeles.
Drs. Lip****z and Vanderkooy stated:
"In order for subjective tests to be meaningful to others, the following should be observed...The test must be blind or preferably double-blind. To implement such tests we advocate the use of A/B switchboxes." 
Mr. Clark stated:
"Listening tests used to evaluate audio equipment can seldom be considered scientific tests".
"A system for practical implementation of double-blind audiobility tests is described. The controller is a self-contained unit, designed to provide setup and operational convenience while giving the user maximum sensitivity to detect differences." 
The contoller that Mr. Clark mentioned was an "ABX Comparator" system that he and some associates were marketing through the "ABX Company".
It is curious to note that neither of these seminal papers present a discussion of how the proposed ABX methodology relates to the evaluation of the primary performance metrics of stereophonic sound systems, such as:
1. Optimization of sound stage width,
2. Optimization of sound stage depth,
3. Stable stereo image placement,
6. Dynamics (dynamic range),
7. Tactile impact,
8. Sonic realism.
Whereas the founders of stereophonic audio systems emphasized listener education and ear training with music, Mr. Clark proposed a different training paradign for increasing the resolution sensitivity of listeners ([18, p. 332]):
"Great improvements in resolution can be achieved if the listener knows what to listen for. Sensitizing tests can use pink noise, sine waves, or pulses as appropriate to hear a difference. Sometimes an artificially enhanced distortion can be produced by reducing feedback or connecting multiple devices in series for distortion buildup. The listener is then more able to hear the difference in music."
Ten years after writing this, Mr. Clark, in a paper presented to the 91st Audio Engineering Society Convention in 1991 stated:
"Ten years ago [in 1982] the present author presented a paper to the AES on double-blind testing using the A/B/X technique. For the next five years, a device to conveniently implement this test was commercially available. It was thought by the author and his associates that general use of this system would resolve "The Great Debate" of whether or not small differences in audio components were audible." 
In the same paper, Mr. Clark stated:
"It becomes an ethical and perhaps legal question when it is claimed that improved sound quality is delivered despite failure of tests to prove it.
This would be less of an issue if the number of engineers who dismiss double-blind test results were small, but this is not the case. As Chairman of an AES Workshop on Esoteric Audio in 1988, I asked, by a show of hands, who in the audience believed that different gain-matched amplifiers of modern design sound different from each other. It was stated that all would measure good in conventional tests and all were operated below clipping or other gross distortion levels. Approximately 70% of the audience indicated they believed the amplifiers would likely sound different. This is an amazing response from members of an engineering society which failed to support the claim." 
Ten years after his seminal double-blind audio testing advocacy paper was published, Dr. Lip****z presented a paper to the AES 8th International Conference in 1991 in which he stated:
"It is now ten years since my initial involvement in the controversy surrounding double-blind subjective testing in audio and twelve years since this subject first hit the headlines with the Quad power amplifier comparison challenge in England.
A lot of water has passed under the bridge in the intervening years, but our hopes of a decade ago, that the validity of the method would be generally accepted by the audio press and adopted wherever feasible, have not been realized." 
At the same 1990 AES conference where Dr. Lip****z lamented the lack of widespread acceptance of blind testing in stereophonic system evaluation, Tom Nousaine, in a paper entitled "The Great Debate: Is Anyone Winning?", was decidely more upbeat:
"This paper simply presents a compilation of the twenty two blind and double blind listening tests [from 1978 to 1990] of power amplifiers for which numerical results have been published. There is a rather large collection of data which contains some surprising information and ultimately confirms that one side of the debate seems to have a commanding lead."