The Impact of 3D Reproduction Systems on Envelopment

Following on from the last post it was outlined how the addition of height channels could improve the listening experience. Research is starting to gather pace in this area.

Two of the major players in the 3D sound arena Dolby Atmos  and Auro 3D  are having commercial success with their systems, however the audio rendering methods behind these systems are not disclosed.

Much of the scientific research carried out so far has focussed on the overall ‘sound quality’ or the overall ‘spatial quality’. It is hard to assess a system fully just by asking about the overall spatial quality since spatial sound is multidimensional.

It is necessary to target specific attributes in order to reveal more about how the listening experience is being affected. Looking to horizontal surround systems important attributes have been identified. Like ‘spatial impression’ which can be separated into two components ‘apparent source width’ related to how broad or ‘width’ of the frontal soundstage and ‘envelopment’ related to how surrounded or immersed in the sound scene the listener is. Envelopment has been cited as one of the main attributes driving multichannel development.

Envelopment has traditionally been associated with sound form the horizontal. One researcher attempted to provide other tags  citing ‘engulfment’ as a better description for the sensation of sound from a surround system which includes the height component. Further, classic concert hall acoustics have always advocated that the highest rated halls provide strong reflections from lateral directions. Schroeder the famous acoustician stated that lateral reflections were the most important, but that it would be unwise to totally absorb ceiling reflections.

Coming back round to 3D reproduction systems is envelopment an appropriate descriptor to use in 3D systems ? and is envelopment only influenced by the surround channels in the horizontal at the listener’s ear height?

More to follow……………………….


Future Spatial Audio

From the dawn of sound reproduction systems there has been a desire to try and create a more realistic listening experience. This has been achieved through the development of stereo, 5.1, and then the addition of other formats like 7.1, primarily to reproduce more accurately the original sound field.

However early mixing strategies for surround were naively abused in terms of music placing instruments around a listener just because it could be done, this quickly sounded tacky, although some productions have used the strategy to good effect by taking a musician’s perspective, for example mixing the musical surround scene from a drummer’s position. However more conservative use of surround mixing uses the front channels for the main instrumentation as in stereo mixing, and the rear surround channels for subtle room effects only. This provides a more natural listening experience devoid of distracting sounds from the rear channels.

Similar approaches are found in sound mixing for film, with room effects being used to give the impression of an environment, other techniques for post production may use the surround channels for special effects for example footsteps behind, immersing the listener in a battle scene, or flying debris from explosions.

These previous examples demonstrate that different scenarios require different uses of the surround format, with a number of approaches being developed over time.

It could be said that the current reproduction formats have reached maturity, and that extending these systems by adding more speakers in the horizontal could only provide marginal benefits, so where to next?

Current trends have shown that there is only one other way to go with surround, with that being the addition of height channels, this is demonstrated by one of the major players in surround sound ‘Dolby’ marketing their ‘Atmos’ system which features height channels, however other exponents of surround technologies like Audyessy state that it is more important to focus on adding channels in the horizontal first, since they advocate adding in separate wide channels in front to complement the stereo set up before adding height channels.

Although surround systems with the inclusion of height is not a new concept, one of the first 3D systems being proposed as early as 1970 by Michael Gerzon using a spatial audio technology called Ambisonics, and also Thomlinson Holmann in the late 1980’s with his 10.2 system. The inclusion of height channels in surround opens up new questions how do the height channels get used effectively? And will the extra speakers placed above be worth the effort? The obvious uses for height would be to render airplanes, birds, thunder e.t.c, sources which we are used to hearing above us, however for music it maybe that the addition of the height channels will be able to enhance the room impression, and also possibly the frontal imaging. It remains to be seen how potential consumers will react to sound coming from above them in a surround system, since we have been conditioned to listening to sound coming only from the horizontal plane.

As stated at the start, the development of audio reproduction systems each time has provided a recognisable improvement, mono to stereo, stereo to 5.1 then 7.1 but it would seem that further addition of channels in the horizontal would not provide a significant difference, therefore can the addition of channels above provide a means to move spatial audio systems forward, or will the extra effort outweigh the benefits?


Choices

With the release of the new Protools9 this has provided more freedom to use other audio interfaces, as in the past Protools has only allowed the use of interfaces supplied by Avid or (Digidesign). Due to this new freedom to use almost any other manufacturers audio interface, and as there are many different choices, it was thought that it would be good to provide a summary of what was available within the home studio budget.

First up then is the Tascam US-122
Tascam US-122


Tascam started nearly 60 years ago by the Tani brothers, creating their first open reel to reel machine under the name of Teac, and are stil with us today offering the Tascam US-122MKII, which is a small format interface which features 2 XLR inputs or 2 line inputs, it also has 1 midi in/out, however the interface does not have any digital inputs or outputs though is USB powered meaning no power supply to carry about, and it can work at sample rates of upto 24bit 96Khz and has headphone input.A good start off interface, small and portable.

The interface costs £116

The next offering is the focusrite saffire pro 14
Focusrite Saffire Pro 14
The ISA mic preamp is said to have launched the Focusrite brand with the involvement of sir George Martin, and have continued their development from transistors to computers, with the Saffire Pro 14. This is a small format interface however uses firewire to connect to the computer, though similar to the Tascam US-122 can be supplied power using the firewire connection or alternatively using the supplied power adapter. Similar to the US-122 it can work upto 24bit 96Khz sample rates, connections include 2 XLR mic inputs and 2 line inputs, the focusrite unit also has stereo SPDIF digital connections as well, and 1 midi in/out, and 1 headphone output. The unit also comes supplied with focusrite software which includes compression, eq and reverb, however will only work as an audio unit or VST. So Protools is ruled out. In comparison to the US-122 the only real difference is the digital connection, and the price!
This interface costs £199

Tascam US-1800


Another offering from Tascam is the US-1800, this interface is a bigger format than that of the US-122, similar to the US-122 it offers sample rates of upto 24bit 96Khz however ups the auntie with 8 XLR inputs and 4 TRS inputs for recording guitar or keyboards, with digital inputs also offered in the form of SPDIF, however this being in the stereo form only. There are also 4 outputs which would allow two seperate stereo outputs for alternative monitoring. The interface also provides 1 midi in/out. Would have been nice to have ADAT inputs, but at this price point it seems a good buy.

The interface costs £282

The fourth offering is the Mackie Onyx Blackbird

Mackie Onyx Blackbird
Mackie were said to have been founded in the late 60’s by Greg Mackie and his partner Martin Schnieder, as TAPCO (Technical Audio Products) however starting in 1988 to create compact audio mixers,which then led to other products, and into the computer interface market we know today with the Onyx Blackbird.
This interface is aimed more at the engineer wanting to do more recording of live instruments rather than midi, as the interface does not have any midi capabilities, however connects to the computer via firewire, the interface features 8 XLR inputs or 8 line inputs. In order to increase the recording count the interface also has ADAT digital inputs which would allow the adidition of other mic preamps, further to this are 2 insert points allowing a processor to be inserted. In order to monitor whats being recorded there are 3 stereo outputs on TRS connections and also 2 headphone inputs with independent level controls, instead of only 1 offered on the Tascam US-1800. More professional digital sync capabilites are also available in the form of BNC connections for wordclock which are not available on the Tascam US-1800, Mackie also supply mixer software which allows routing of any input to any output and also channel linking to create stereo channels. This interface is more expensive than the Tascam however is aimed more at the recording engineer who wishes to record live instruments rather than midi, however the Tascam unit is cheaper and affords the best of both worlds with the same amount of XLR inputs.

The Mackie Oynx Blackbird costs £399

The final interface pushes the boat out slightly, however boasts some really useful features, however at a price.

Focusrite Liquid Saffire 56

Focusrite Liquid Saffire 56

To finish the roundup of the now elegable third party interfaces is the Focusrite Liquid Saffire, really this interface features a combination of the previous two interfaces in one with 8 mic inputs or 8 line inputs, though 2 of the channels features processing technology from the high end liquid channel, which uses emulations to give the signal the characteristics of several different vintage mic pre amps. Connection to the computer is via firewire, and digital connections include the SPDIF stereo format as well as 2 ADAT connections, which would allow you to connect external preamps to increase inputs,digital sync is provided on BNC connections. There is also a healthy 8 outputs as well as a stereo output on the TRS connection, 2 independent headphone outputs are also provided and a midi in/out. For the ultimate recordist, this interface allows recording at sample rates as high as 24bit 192Khz, higher than any of the other interfaces, however this will reduce the digital i/o count, but it’s there if needed you never know. Also different from all of the previous interfaces is the LED meters on the front of the interface allowing easy referencing of all channel levels.

I think this unit offers alot of quality, and one feature i like and would love to try out is the liquid channel feature which aims to bring the sound of 10 different vintage mic pre amps from Pultec, Neve, Telefunken. The only downside really is it’s thickness as it is a 2U unit which would take up more rack space, and of course the price!

This interface costs £599

This has not been an extensive review of the interfaces available, but hopefully has encompassed some of the choices which are available now to the Protools user. Be nice to hear of your experiences with any of these interfaces, or any other for that matter. Thank you!


Top 10 Gear Wish List

Here is my selection of equipment, which i would purchase should i win the lottery in the near future.
1 Prism Sound ADA-8XR
2 Avid D Control ES
3 Neve 1081R Remote Mic Pre

4 TC Electronic System 6000

5 Empirical Labs Distressor EL8S
6 API 2500 Stereo Mix buss Compressor
7 Thermionic Culture Fat Bustard Summing Mixer

8 Maselec MEA-2
9 Chandler Curve Bender EQ

10 Manley Vari MU Compressor


High Hopes For High Resolution

It would seem at the moment that there is great interest in using higher bit depths to improve sound quality,providing 24bit recordings as well as the current 16bit, as Apple have been in the news lately due to the fact that they are looking to use higher bit depths to increase the quality of their media in the ITunes store,this would increase the disk space to store, however the theory of higher bit depths does make perfect sense, though the age old question pops up, and that is, can the average music listener detect the difference? not only that but the average listener will probably be using those Apple earbuds sold with their device, which provide a very limited listening experience at best.

Of course as mentioned earlier high resolution media means more space is needed, which means methods of compression in order to reduce the storage space, however in most cases compressing data means losing some of the information in order to reduce the storage space, which in turn would reduce the quality of the media, so a method of reducing the data size has to be used which will provide linear encoding, enter MLP (Meridian Lossless Packing) this was invented by several researchers, most notably M Gerzon, P Craven, J Stuart.

The official paper on the technology can be found here however in short this method of encoding allowed effcient allocation of bits to the audio, for example music with a large dynamic range like classical music where the music goes from very quiet to loud passages fewer bits can be allocated for the quiet passages and more bits for louder passages which use more bandwidth, thus saving space by being more efficient.

However another high resolution format is the SACD format,which uses a different method to maintain high resolution audio, using a technology called DSD (Direct Stream Digital) which was invented by Philips and Sony, the technology uses very high samplerates 2.8Mhz compaired with the normal 44.1Khz, and instead of PCM (Pulse Code Modulation) PDM is used (Pulse Density Modulation) a comparison with that of PCM can be seen here. Korg have recently supported high resolution DSD handheld recording with their MR-2 device.

The two different methods MLP and SACD have been around for a while and haven’t really seen high demand mostly enjoyed by audiophiles rather than the everyday listener. However with the advent of HD and 3D films it would seem that these technologies might see more use, this is certainly the case with Dolby who utilise the MLP technology in their Dolby TrueHD content.

However these two methods are employed for disc playback and storage on DVD’s, though for download Apple has developed it’s own lossless codec ALAC (Apple Lossless Audio Codec) however Apple could have taken the easy route and used another lossless audio codec such as FLAC (Free Lossless Audio Codec) however this is an open source software and does not approve of DRM (Digital Rights Mangement) to stop file sharing, which Apple uses in it’s content. Apples codec uses Huffman coding which uses a lower number of bits for frequently occuring data, with FLAC using runlength coding and both codecs using golomb rice coding, in order to store information more efficiently without loosing data.

However i think these high resolution formats are well suited to multichannel applications where better seperation of sources, allow the listener to appreciate them better, and also in surround recording where high bit depths allow a better approximation of the acoustical environments and capture of the finer detail.

Apple are certainly not the first to offer high resolution material as high end audio manufacturers such as Linn and Bowers and Wilkins already offer selected artists in these formats. It remains to be seen if such high resolution formats take off as pointed out earlier high resolution formats have mostly been used by audiophiles,though it would be good if more people experienced these formats and hear what they are missing.


Subjective and objective measures in reproduced sound – Overview

After carrying out a study using both subjective and objective measurement techniques which started in July 2010,which involved investigating whether a certain angle of the rear channel microphones within a 5.1 recording rig, caused them to become less correlated. Without going into too much detail the degree of correlation has been shown previously by others that this leads to a better sense of feeling surrounded, or leads to a better degree of envelopment or spatial impression (the previous blog explains some of this) and the objective measurement method IACC (Interaural Cross Correlation) is said to relate well to the human perception of the afore mentioned attributes, with a low correlation being the most desirable.

It is essential in most surround listening situations to get the impression of being situated in the environment which is being portrayed on screen, or certainly when listening to classical music, as orchestras and the like use the acoustics of the space to enhance their sound. Not only this if there is more than one person taking part in the listening experience, it is necessary to increase the sweet spot listening area, and it would seem no more than now with the development of 3D televisions and more immersive media experience, including interactive gaming.

It has already been explained why there is a need to investigate how we can manipulate soundfields in order to enhance the feeling of being surrounded, however the afore mentioned attributes are subjective and not clear cut, so listening tests were carried out using recordings made using several different rear mic angles of a 5.1 microphone recording set up (The INA 5) which was used to record two different solo instruments, using 9 different mic azimuth angles from 90-170 degrees in graduations of 10 degrees, with reference to the centre channel mic at 0 degrees,the recording venue was a large rectangular room, with painted brick interior walls and carpet tile floor, the painted brick wall has a low absorption coefficient.

The recordings were then reproduced over a 5.1 system used in a subjective test in a semi anechoic chamber, where subjects rated the degree of envelopment, from these listening tests it was found that no differences existed between the samples(subjectively), however the objective measurements, which were made of the listening test material, played back over the 5.1 rig in the semi anechoic chamber, showed differences existed between the audio samples degree of correlation, these differences were larger than the just noticeable difference.

The objective measurements were carried out using a head and torso simulator, this is a model of a head and torso of a human with microphones placed where the ears are, the head and torso can be placed at the listening position and then used to record approximatley what the listener would hear if placed in the same position. From these recordings termed “Binaural Recordings” a measurement can be taken called the IACC “Interaural Cross Correlation” going back to what i was saying earlier, that this gives an indication of how similar each of the soundfields were at each ear, which has been shown to provide an indication of the degree of envelopment, or how surrounded one would feel, some of the mic angles showed variation in the amount of correlation, providing low correlation values,the head and torso was also used in the real room whilst making the 5.1 recordings, this would allow a comparison of the real versus the reproduced versions.

However the variation shown by the objective measurement was not picked up by the listeners during the subjective test, as the statistical analysis showed, however it could be seen when collecting the results that this was going to be the case. It was also interesting to notice how the reproduced recordings, using the 5.1 rig had a higher correlation when compaired with the real room, which was expected, demonstrating that something was lost in the recording/reproduction chain, and that the 5.1 rig was incapable of reproducing an identicle soundfield or …… the list could go on. It has also been said by others that the binaural measurement is flawed and does not take into account other factors which could also play a part, for example higher volume levels have also been shown to correlate well with envelopment, and the IACC does not account for this, also that towards lower frequencies there is no variation in the IACC measurements, however that low frequencies are said to be one of the main contributors to envelopment.

Having the subjective and objective measures allowed a chance to look for a correlation between the two, however in this study it has been found that there is no correlation between the subjective and objective, or a very weak one, however to totally dismiss the measurement technique would be fool hardy, as further investigation is needed. It was found in this study that as the rear microphones moved to different angles, certain angles did provide lower IACC measurements, caused by pointing into different directions in the recording venue, and that placing the head and torso in different positions in the room also gave different results as would be expected. It was also found during a pilot test using only two rear speakers of the 5.1 set up, that the IACC was lower for some angles, than when the frontal speakers were added to complete the 5.1 rig, demonstrating that the diffraction round the back of the head must cause a lower IACC than for the front.

It would be interesting to have time to investigate the IACC further, and possibly also use another measurement method alongside the IACC, as it would seem from this investigation that there are some aspects which don’t quite add up. It would also be interesting to carry out IACC measurements of a 5.1 system in a normal non semi anechoic room, to take into account the reflections from the room, and how this adds to the reproduction.

This is an overview of the work carried out without going into too much detail, and is intended to allow the author to think out loud, and also to allow interested parties to read.


Subjective and Objective Measurements in Reproduced Sound

Stereo sound was created to improve on the mono systems, to enable a more realistic listening experience, however the sound mostly comes from the front and so only provides a limited listening experience. However surround sound systems came about to provide further enhancements using speakers placed to the rear of the listener, these rear speakers being used for different purposes, for example in film sound they can be used to provide cues that sounds are coming from behind, or they can be used to provide added ambience that you would find in real environments, adding spatial properties to the listening experience where the listener can feel like they are actually within that space. In terms of music production the second method being the case.

In creation of environments for film or music listening, surround sound systems rely on the creation of reverberation from the rear speakers, such as cues heard in real life, where there is direct sound which reaches the listener first then indirect reflections from all surfaces surrounding the listener, however too much reverberation from the rear can be distracting for the listener, and depending on the time of arrival at the listening position can be perceived as an echo. Many different names have been given to different spatial properties of sound such as Spatial Impression which is an overall term and refers to two other attributes Envelopment which is said to happen between 80ms and infinity(providing a sense of being surrounded by the sound) and it is explained by Soulodre et al that it is an important part of good concert hall acoustics, and is therefore desirable in multichannel sound systems, and Apparent Source Width which happens between 0-80ms (Causing a broadening of the frontal sound stage width)

It is apparent to myself that experiencing envelopment in surround sound systems or at music concerts, the surround effect usually depends on rear sound that cannot be heard but when taken out of the listening experience, when surround mixing, can be really noticeable. It is also interesting how we can focus on the frontal sound, but subconsciously still perceive the surround effect or envelopment.

Much research into perceived notions of spatial properties has been carried out, much of this research has relied on subjective judgements from listeners, however this is time consuming and how reliable are the results? Some work has been carried out looking at objective measures of the spatial properties of sound, and over the next short period I will be looking at these in more detail, providing an impromptu review of some of the research done in this area.


Convergence

In the days before computers it was necessary for families for example to all come and sit round the tv at night in order to watch a film for example. This brought about problems due to the fact that one person would want to watch something else perhaps and a squabble would erupt. However now with computers there is the facility to watch tv online for example BBC IPlayer or 4OD from channel 4 as well as the conventional TV this has brought another access point within the family home as most homes now at least have a TV and computer. This example is a demonstration of convergence where the TV and computer have been combined, however this is just one demonstration of convergence.
It is also interesting to note how 10 years ago perhaps people had to buy several different items for example a mobile phone , camera, video camera , walkman , computer for the internet , this is no longer the case as all of these functions can be found within a mobile phone and with decent quality that would satisfy the average user. This is another example of convergence; the mobile phone example is one aspect of convergence which is bringing us closer to a one gadget does everything scenario, however could this be at the cost of quality. Due to these devices having several features combined, sometimes reliability comes into question because if your device fails then you lose all of these functions previously described, though when a device’s done the one job it was purchased for, say a camera if this broke you would still have your standalone video camera or DVD player.
However convergence also happens at a human level online an example of this can be found within the “Soundscapes project “carried out at Salford University whereby social networking was used which allowed participation from all over the world, however what we are interested in for this blog was the way people got involved to help enhance the project, drawing on the collective power of other individuals “convergence” and also demonstrates how word can spread. Convergence has also provided more methods of reaching people who do not even need to be sat at their computer, instead they could be reached via their mobile phone as stated earlier as more and more people connect via their mobile.
However convergence is not all bad for example the convergence of the internet with the TV has allowed us to view programs online whenever at high quality HD streams and whatever we want to watch.


Effects Of MP3

A recent subjective evaluation carried out by Amandine Pras and others, which was quite similar to Ruzanski, in which they used pop, metal, rock, contemporary, orchestra and opera. They also used five different bit rates 96, 128, 192, 256 and 320. They presented the participants with two versions at each bit rate and genre and asked which sample they preferred. After the test the participants were presented with a questionnaire with seven different sound criteria with descriptors that ranged from high frequency artefacts, reverberation artefacts, dynamic range etc to find out what the participant used to make their judgement, they were also asked if they were familiar with the sound sample. They found that listeners significantly preferred CD quality over bit rates between 96-192 and that the results between CD quality and bit rates of 256 and 320 were not significant results. They also found that there was a highly significant effect of expertise on preference.

The questionnaire they used to determine what the participants had used to discriminate between versions was interesting, they counted how many occurrences of the different descriptors had been used and high frequency artefacts had been used the most, with general distortion and spatial artefacts joint second.

They conclude that MP3 does introduce audible artefacts and that listener’s ability to detect these artefacts varies with listener expertise and genre which was also found by Ruzanski, they also say that trained listeners can discriminate between CD and bit rates between 96-128, however that for bit rates of 256-320 they could not discriminate between CD quality and the encoded version.

Finally the authors state that they would like to quantify the difference between CD quality and MP3 with respect to listening conditions.

I find these papers very interesting and also relevant to my study at the moment, as such thought i would share them with you.


Interesting Read

I have been reading a paper by Ruzanski “ On the Effects Of MP3 Encoding” he tests several bit rates from 32-192 Kbp/s and 5 different music genres from hip hop to classical. 7 participants take part aged between 19-30, each subject is played two sound samples one being the un- encoded WAV version the other the compressed version, if the participant picks the compressed version over the original an error score is given. The study carried out by Ruzanski is very interesting because he explains that certain genres of music can be encoded using lower bit rates without audible artefacts, as a piece of classical music in the nocturne genre in his test received multiple incorrect choices when choosing between the 64kbp/s sample and that of the WAV file. Similar results have been found in my current tests whereby a sample of double flute section encoded at bit rates of 48,128 and 192 compared to CD quality, subjects have commented that it was hard to identify differences between samples. Whilst with the other full range electronic piece by Tom Middleton was easier to identify, and several comments noted that the high frequencies in some of the samples were degraded.
Ruzanski also comments that using MP3 encoding can improve the sound of some genres of music, he describes how using bit rates of 128, 192 can remove enough high frequencies to present a cleaner piece of music , going on to say that the high frequencies would be perceived as noise, and would be removed by masking.
This is quite an interesting statement as I would never have thought about encoding in this way. However one of the participants in my test also stated that he wanted to give a higher score for the encoded sample than was provided on the questionnaire.