Wednesday, November 30, 2011
A 3D Display You Can Touch
"Are we getting closer to really effective volumetric 3D display technology? A new display, designed in Russia, uses cold fog and a laser projector to create a volumetric 3D image that you can touch. A tracking device (no, it's not a Kinect) is used to detect the user's hand and moves the virtual objects in response. There have been cold fog 3D displays before, but this one has a reasonable resolution and looks near to being a finished product that could be on sale soon. Estimated price? Between $4000 and $30,000."
HDCP falls to FPGA-based man-in-the-middle attack
|
Canon Image Stabilization demo
Fwd: REDucation
From: 3ALITY TECHNICA <3ALITY_TECHNICA@mail.vresp.com>
Date: Wed, Nov 30, 2011 at 11:24 AM
Subject: REDucation
To: sokol@videotechnology.com
|
Click to view this email in a browser If you no longer wish to receive these emails, please reply to this message with "Unsubscribe" in the subject line or simply click on the following link: Unsubscribe |
3ALITY TECHNICA 55 East Orange Grove Ave. Burbank, California 91502 US Read the VerticalResponse marketing policy. |
IMF for a Multi-Platform World
Posted: 30 Nov 2011 02:15 AM PST
Among other things, the looming arrival of the Interoperable Master Format (IMF) is illustrating that the digital media industry is now capable of moving "nimbly and quickly" to create technical standards to address and evolve the ways that it packages, moves, and protects precious content in the form of digital assets in a world where the technology used to do all that, and the very industry itself, is fundamentally changing at a startling rate. The term "nimbly and quickly" comes from Annie Chang, Disney's VP of Post-Production Technology who also chairs the SMPTE IMF work group (TC-35PM50).
Six Hollywood Studios through the University of Southern California Entertainment Technology Center (USC ETC) started to develop IMF in 2007, and in early 2011, they created an input document that the SMPTE IMF working group is now using as the basis of the IMF standardization effort. Over time, IMF has developed into an interchangeable, flexible master file format designed to allow content creators to efficiently disseminate a project's single master file to distributors and broadcasters across the globe.
Chang reports that progress has moved quickly enough for the work group to expect to finalize a basic version of the IMF standard in coming months, with draft documents possibly ready by early 2012 that focus on a core framework for IMF, and possibly a few of the potential modular applications that could plug into that framework.
Once that happens, content creators who have prepared for IMF will be in a position to start feeding all their distributors downstream far more effectively than has been the case until now in this world of seemingly unending formats. They will, according to Chang, be able to remove videotape from their production workflow, reduce file storage by eliminating the need for full linear versions of each edit or foreign language version of their content, and yet be able to take advantage of a true file-based workflow, including potentially automated transcoding, and much more.
The rollout will still need to be deliberate as various questions and unanticipated consequences and potential new uses of IMF begin to unfold. But that said, Chang emphasizes that the goal of being able to streamline and improve the head end of the process—creating a single, high quality, ultimate master for all versions is real and viable, and with a little more work and input, will be happening soon enough.
"Today, we have multiple versions, different resolutions, different languages, different frame rates, different kinds of HD versions, standard-definition versions, different aspect ratios—it's an asset management nightmare," she says, explaining why getting IMF right is so important to the industry.
"Everyone creates master files on tape or DPX frames or ProRes or others, and then they have to create mezzanine files in different formats for each distribution channel. IMF is designed to fix the first problem—the issue of too many file formats to use as masters."
Therefore, IMF stands to be a major boon for content creators who repeatedly and endlessly create different language versions of their material.
"For a ProRes QuickTime, you are talking about a full res version of a movie each time you have it in a new language," Chang says. "So 42 languages would be 42 QuickTime files. IMF is a standardized file solution built on existing standards that will allow people to just add the new language or whatever other changes they need to make to the existing master and send it down the chain more efficiently."
Chang emphasizes the word "flexible" in describing IMF, and the word "interoperable" in the name itself because, at its core, IMF allows content distributors to uniformly send everybody anything that is common, while strategically transmitting the differences only to where they need to go. In that sense, IMF is based on the same architectural concept as the Digital Cinema specification—common material wrapped together, combined with a streamlined way to package and distribute supplemental material. Eventually, each delivery will include an Output Profile List (OPL) to allow those transcoding on the other end a seamless way to configure the file as they format and push it into their distribution chain.
Unlike the DCI spec, however, IMF is not built of wholly new parts. Wherever possible, the file package will consist of existing pieces combined together in an MXF-flavored wrapper. This should, Chang hopes, make it easier for businesses across the industry to adapt without huge infrastructure changes in most cases as IMF comes to fruition.
"With IMF, we are using existing standards—a form of MXF (called MXF OP1A/AS-02) to wrap the files, and parts of the Digital Cinema format and other formats that many manufacturers already use," she says. "So, hopefully, there is not much of a learning curve. We hope that most of the big companies involved in the process won't be caught unaware, and will be able to make firmware or software upgrades to their systems in order to support IMF. Hopefully, companies will not have to buy all new equipment in order to use IMF.
"And with the concept of the Output Profile List (OPL), which essentially will be global instructions on output preferences for how to take an IMF file and do something with it relative to your particular distribution platform, companies that are doing transcoding right now will have an opportunity to use that to their advantage to better automate their processes. IMF has all the pieces of an asset management system and can use them all together to create standardized ways to create packages that fit into all sorts of other profiles. It's up to the content owners to take these OPL's and transcode the files. As they do now, they could do it in-house or take it to a facility. But if transcoding facilities get smart and use IMF to its potential, they can take advantage of IMF's capabilities to streamline their processes."
Chang says major technology manufacturers have been extremely supportive of the SMPTE standardization effort. Several, such as Avid, Harmonic, DVS, Amberfin, and others have actively participated and given input on the process, which is important because changes to existing editing, transcoding, and playback hardware and software, and the eventual creation of new tools for those tasks, will eventually need to happen as IMF proliferates. After all, as Chang says, "what good is a standard unless people use it?"
She emphasizes that manufacturer support is crucial for IMF, since it is meant to be a business-to-business tool for managing and distributing content, and not a standard for how consumers will view content. Therefore, outside of the SMPTE standardization effort, there is a plan to have manufacturers across the globe join in so-called "Plugfests" in 2012 to create IMF content out of draft documents, interchange them with each-other, and report on their findings.
As Chang suggests, "it's important to hit IMF from multiple directions since, after all, the first word in the name is 'interoperable.' " As a consequence of all these developments, it's reasonable to assume that IMF will officially be part of the industry's typical workflow chain where content distributors can start sending material to all sorts of platforms in the next year. Some studios and networks are already overhauling their infrastructures and workflow approaches to account for IMF's insertion into the process, and encoding houses and other post-production facilities should also, in most cases, have the information and infrastructure to adapt to the IMF world without any sort of fundamental shift. But the post industry will be somewhat changed by IMF, especially if some facilities or studios decide on processes for automating encoding at the front end of the process that changes their reliance on certain facilities currently doing that kind of work.
However, Chang adds, the broadcast industry specifically will probably have the most significant learning curve in terms of how best to dive into IMF since, unlike studios, which have been discussing their needs and pondering IMF since about 2006, the broadcast industry was only exposed more directly to IMF earlier this year when SMPTE took the process over. IMF was originally designed and intended as a higher bit-rate master (around 150-500MB/s for HD, possibly even lossless, according to Chang), but broadcasters normally use lower bit-rate files (more like 15-50MB/s).
"However, I feel that broadcasters would like to have that flexibility in versioning," Chang says. "But because they need different codecs and lower bit-rates, there is still discussion in SMPTE about what those codecs should be. Broadcasters are only now starting to evaluate what they need out of IMF, but there is still plenty of time for them to get involved."
Of course, as the explosion of mobile devices and web-enabled broadcasting on all sorts of platforms in a relatively short period of time illustrates, viewing platforms will inevitably change over time, and therefore, distribution of data will have to evolve, as well. As to the issue of whether IMF is relatively future-proofed, or merely the start of a continually evolving conversation, Chang is confident the standard can be in place for a long time because of its core framework—the primary focus to date. That framework contains composition playlists, general image data, audio data (unlimited tracks, up to 16 channels each), sub-titling/captioning data, any dynamic metadata needed, and so on.
Modular applications that could plug into that framework need to be further explored, Chang says, but the potential to allow IMF to accommodate new, higher compressed codecs, new or odd resolutions or frame rates, and all sorts of unique data for particular versions is quite high.
"The core framework we created with documents is something we tried to future proof," she says. "The question is the applications that might plug into that core framework (over time). We are trying to make it as flexible as possible so that if, in the future, even if you have some crazy new image codec that goes up to 16k or uses a new audio scheme, it will still plug into the IMF framework. So image, audio, or sub-titling could be constrained, for example, but as long as the sequence can be described by the composition playlist and the essence can be wrapped in the MFX Generic Container, the core framework should hold up for some time to come."
To connect with the SMPTE IMF effort, you can join the SMPTE 35PM Technology Committee, and then sign up as a member of 35PM50. The IMF Format Forum will have the latest news and discussions about the status of the IMF specification.
More information about IMF:
Overview of the IMF presentation at NAB 2011's Post Pit event.
Amberfin's Bruce Devlin SMPTE PDA Now Educational Webcasts, discussing MXF application designs coming out of the IMF Work Group.
By Michael Goldman, SMPTE Newswatch
Tuesday, November 29, 2011
eac3to - audio conversion tool
eac3to v3.24, freeware by madshi.net - can show information about audio, video, VOB/EVO/(M2)TS and MKV files - can decode and encode various audio formats - can remove dialog normalization from AC3, E-AC3, DTS and TrueHD tracks - can extract AC3 stream from Blu-Ray TrueHD/AC3 tracks - can extract TrueHD stream from Blu-Ray TrueHD/AC3 tracks - can extract DTS core from DTS-HD tracks - can remove DTS zero padding and repair outdated DTS-ES headers - can apply positive or negative audio delays - can reduce bitdepth of decoded audio data by using TPDF dithering - can resample decoded audio data (using SSRC or r8brain) - can apply/reverse PAL speedup on decoded audio data (SSRC/r8brain) - can demux video / audio tracks of EVO/VOB/(M2)TS and MKV sources - can list available titles of Blu-Ray and HD DVD discs - can extract Blu-Ray and HD DVD chapter information and subtitles - can mux MPEG2, VC-1 and h264 video tracks to Matroska - can remove pulldown flags from MPEG2, VC-1 and h264 video tracks eac3to sourcefile[+sourcefile2] [trackno:] [destfile|stdout] [-options] Examples: eac3to source.pcm destination.flac eac3to source.thd destination.flac destination.ac3 eac3to source.evo 1: chapters.txt 2: video.mkv 3: audio.flac 5: subtitle.sup eac3to feature_1.evo+feature_2.evo movie.mkv eac3to blurayMovieFolder movie.mkv Options: -448 use e.g. "192", "448" or "640" kbps for AC3 encoding -768 use "768" or "1536" kbps for DTS encoding -core extract the DTS core of a DTS-HD track +/-100ms apply a positive or negative audio delay +/-3dB apply a positive or negative audio gain (volume change) -0,1,2,3,4,5 remap the channels to the specified order -edit=0:00:00,0ms loops or removes audio data at the specified runtime -silence/-loop forces usage of silence (or looping) for audio edits -down6 downmix 7 or 8 channels to 6 channels -down2 downmix multi channel audio to stereo (Dolby Pro Logic II) -phaseShift shift phase (when doing stereo downmixing, see "down2") -mixlfe mix LFE in (when doing stereo downmixing, see "down2") -down16 downconvert decoded audio data to 14..23 bit -slowdown convert 25.000 and 24.000 content to 23.976 fps -speedup convert 23.976 and 24.000 content to 25.000 fps -23.976/... define source fps to be "23.976", "24.000", "25.000", ... -changeTo24.000 change source fps to "23.976", "24.000", "25.000", ... -resampleTo48000 resample audio to "44100", "48000" or "96000" Hz -r8brain use r8brain resampler instead of SSRC -quality=0.50 Nero AAC encoding quality (0.00 = lowest; 1.00 = highest) -8 define PCM file to be "1".."8" channels -16 define PCM file to be "16" or "24" bit -little define PCM file to be "little" or "big" endian -96000 define PCM file to be 44100, 48000, 96000 or 192000 Hz -override forcefully overrides PCM auto detection with manual values -sonic/nero/... force the use of a specific decoder (not recommended) -keepDialnorm disables dialog normalization removal (not recommended) -decodeHdcd decodes HDCD source track (usually 16 -> 20 bit) -demux demuxes 1st video track and all audio and subtitle tracks -stripPulldown strips the pulldown from MPEG2 video tracks -keepPulldown disable removal of pulldown for MPEG2, h264 and VC-1 tracks -seekToIFrames make all h264/AVC "I" frames seekable -check checks if the source EVO/(M2)TS file is clean. -test checks if the external filters are installed & working -lowPriority moves processing to background/idle priority -shutdown automatically shutdown the PC after processing is done Supported source formats: (1) RAW, (L)PCM (2) WAV (PCM, DTS and AC3), W64, RF64 (3) AC3, E-AC3 (4) DTS, DTS-ES, DTS-96/24, DTS-HD Hi-Res, DTS-HD Master Audio (5) MP1, MP2, MP3 audio (6) AAC audio (7) MLP, TrueHD, TrueHD/AC3 (8) FLAC (9) EVO/VOB/(M2)TS and MKV Decoded audio data can be stored as / encoded to: (1) RAW, (L)PCM (2) WAV (PCM only), W64, RF64, AGM (3) WAVs (multiple mono WAV files, PCM only) (4) AC3 (5) DTS (6) AAC (7) FLAC For best AC3, E-AC3 and AAC decoding you need: (1) Nero 7 (Nero 8 won't work!) (2) Nero HD DVD / Blu-Ray plugin For best DTS decoding you need: (1) ArcSoft DTS Decoder - version 1.1.0.0 or newer For DTS encoding you need: (1) SurCode DVD DTS - version 1.0.21 or newer For AAC encoding you need: (1) Nero AAC Encoder For video muxing you need: (1) Haali Matroska Muxer
wiki:
http://en.wikibooks.org/wiki/Eac3to
GUIs:
Eac3to and More GUI:
http://forum.doom9.org/showthread.php?t=135095
HD DVD/Blu-Ray Stream Extractor:
http://forum.doom9.org/showthread.php?t=141829
Clown BD:
http://forum.slysoft.com/showthread.php?t=25818
Fwd: News from Condor Storage Inc.
From: "Jeanne Wilson" <jeanne@condorstorage.com>
Date: Nov 28, 2011 5:13 PM
Subject: News from Condor Storage Inc.
Having trouble viewing this email? Click here |
|
Sunday, November 27, 2011
Saturday, November 26, 2011
Techniques for Automatic Monitoring of Stereoscopic 3D Video
3D CineCast
________________________________
Techniques for Automatic Monitoring of Stereoscopic 3D Video
Posted: 25 Nov 2011 08:07 AM PST
Running a multi-channel TV broadcast installation brings new headaches when 3D is involved. Live monitoring of dozens of TV channels is difficult enough. Over the years several manufacturers have developed automated monitoring solutions covering a whole range of tasks of increasing complexity.
With the advent of 3D, there is literally a new dimension of monitoring tasks, because we have to check not only the integrity of individual video signals but also the correct relationship between the left and right video signals in a stereo pair.
In addition, manual monitoring of 3D is more difficult than 2D because the operator would need either to wear glasses or accept the limitations of autostereoscopic displays. For these reasons, there is a burgeoning interest and market in automatic monitoring of 3D television.
Overview of 3D Monitoring
Analysis and Correction
One of the purposes of automatic analysis is to provide information to enable correction of any problems encountered. The techniques for correction are beyond the scope of this paper, though it is important to point out that correction of an upstream problem may be necessary before monitoring of further aspects can be carried out.
Metadata
The correct use of metadata, for example to identify left and right channels or to signal how they are packed into a single container, can in theory remove the need for some analysis. However, metadata for 3D is not yet fully standardized, and even when it is there will still be cases of incorrect usage, so there will always be a place for techniques that avoid the requirement for metadata. Of course, the results of measurements performed at any point in the processing chain may in their turn be passed on downstream as metadata.
Format Detection
The first task when faced with a single video signal carrying a stereoscopic pair is to identify the format by which the two channels are packed into one container. For some formats this is an easy task, but there are some problems when the granularity of the packing is finer.
Matching Left and Right Images
Having unpacked the signal into left and right channels, the next task is to check whether the two channels are correctly matched, particularly as regards timing, grey scale and colour balance. Grey scale and colour balance can be aligned using histogramming techniques. Relative timing can be measured using fingerprinting techniques similar to those used for lip-sync measurement. It is important to note that a timing mismatch will not only be detrimental to the 3D viewing experience but will also have an adverse effect on downstream analysis, particularly of 3D depth. Relative timing is thus a good example of the need to correct a problem before further analysis can reliably be performed.
Depth or Disparity Analysis
A more algorithmically challenging analysis task is to measure the 3D depth across the picture, which is directly related, via screen size and resolution, to disparity or relative displacement between the left and right representations of objects in the scene. Horizontal disparity that is outside a certain range, as well as undue vertical disparity, are known to cause significant problems of eye strain for some viewers. Disparity analysis is also important for checking the overall relative geometric alignment of the two images.
Higher Level Analysis
Finally, we shall look at two examples of detection tasks which require a higher level of analysis. The first is deceptively simple to state: can we tell whether the left and right channels have been inadvertently swapped? The second is: can we tell whether the 3D pair has come from a simple 2D to 3D converter? Ultimately, 3D analysis can extend to detecting or measuring any process that has been carried out on 3D signals, either with a view to improving, modifying or reversing the process, or simply in order to report or record what has been done.
Format Detection
There are many ways in which left and right signals may be packed into a single video channel. These include left-right or top-bottom juxtaposition (with or without reflection of one of the channels), line interleaved, column interleaved, checkerboard and frame interleaved formats.
For the purposes of automatic detection, these formats may be classified into two groups. Left/right and top/bottom formats are "loose packed" because the two pictures are physically quite separate. The remaining formats are "close packed" because corresponding left and right channel pixels are close together in space or time.
Loose Packing
Loose packed formats are quite easy to detect. One way is to carry out a trial unpacking with an assumed format and then detect whether the two resulting images are sufficiently similar to be a stereoscopic pair. And if the two images turn out to be identical, we may conclude that a 2D image is being transported in a 3D container; this is a simple case of disparity estimation in which we look for zero disparity across the image.
Figure 1 shows the left-right differences for a small area of a picture when each of four possible trial formats is used to unpack each of four possible actual formats. Where the correct format has been used, the left-right difference contains only edge information arising from disparity.
We can summarise the detection of loose-packed formats by saying that we exploit the relative similarity of the left and right images when compared with unrelated, distant parts of the picture.
Figure 1 - Detection of loose packed formats
Close Packing
Close packed formats present more of a problem because the packed image looks increasingly like a single 2D image as the amount of 3D content in the scene decreases. So simply carrying out trial unpackings will often give a positive result, even if the wrong format is being tried. If there is significant 3D content, the detection becomes easier because a picture wrongly unpacked will look increasingly less like a pair of plausible images.
The left half of Figure 2 shows a small part of the left image for some different combinations of packing and unpacking formats, and the right half shows the combined energy of horizontal and vertical high pass filtered versions of those outputs. The energy is clearly significantly lower when the correct unpacking format has been used.
We can summarise the detection of these close-packed formats by saying that we exploit the relative difference of the left and right images when compared with adjacent pixels or lines.
Figure 2 - Detection of close packed formats
Temporal interleaving presents further difficulties because there is a higher chance that motion can be confused with left-right disparity. This could be overcome using motion compensated high-pass filtering, though care would have to be taken to use information from a single channel (albeit subsampled) for the purposes of motion estimation.
Depth of Disparity Analysis
One of the most important monitoring or analysis tasks in stereoscopic 3D is to measure the perceived depth of the various objects in the scene. Perceived depth is a function of disparity (the horizontal distance between left and right representations of the object, measured in pixels), display size and resolution, and viewing distance. In the context of signal monitoring, we can only measure disparity and then relate it to perceived depth for different display configurations.
Disparity measurement is useful for many monitoring purposes, the most important being to provide a warning if the viewer is likely to suffer eye strain. Other reasons for measuring disparity are to verify that the sequence really is 3D rather than just being 2D in a 3D container, to detect and correct for global geometric distortions between the two channels, and to assist in the insertion of captions or subtitles at suitable depths.
Eye Strain
Eye strain can occur in 3D viewing when disparity exceeds certain limits – particularly if the eyes are being encouraged to diverge, an unnatural action. The limits depend on display size but it is also useful to measure how often and for how long extreme disparity values are observed, and possibly to identify where in the scene the extremes are occurring.
Disparity Measurement
One class of disparity measurement methods involves performing a local correlation between the left and right images to generate a sparse disparity map. This approach is ideal for looking at the behaviour of different objects in the scene and for determining to what extent limits have been exceeded.
Other methods seek to generate a dense disparity map, in which every pixel has an associated disparity value, or possibly an occlusion indicator if there is no corresponding point in the other picture. This approach would be necessary if the measurement were being used to drive post-processing, for example to change the effective camera spacing.
Finally, for some applications an approximate, region-based approach to disparity measurement might be sufficient, for example to gather statistics about typical depth ranges used across a programme, or to drive a global spatial transform to correct for camera misalignment.
Vertical Disparity
The impression of depth is conveyed by introducing horizontal disparity. If there is any vertical disparity present, it should be detected and corrected, both because it can be very disturbing to the eyes, and because it can interfere with correct measurement of horizontal disparity.
Of course, horizontal and vertical disparity can be measured jointly using conventional motion estimation methods. However, it would be preferable to exploit the constraints arising from stereoscopy. For example, we would expect vertical disparity to be a combination of two components: one directly related to horizontal disparity, such as might arise from a vertical displacement between the cameras, and one which fits a simple global model, such as might arise from different zoom factors or axis directions between the cameras.
Disparity Monitoring Display
Figure 3 shows an example of a monitoring display that provides information about the distribution of disparity in various ways, including a left-right difference, a disparity histogram, an indication of vertical disparity and a colour coded warning of the possibility of eye strain from near and far objects for different display sizes. Such a tool makes good use of automatic analysis coupled with an operator's skill in interpreting the results.
Figure 3 - Example of a disparity monitoring display
Dense Disparity Maps
Because of the difficulty and the usefulness of measuring dense disparity maps, there is some interest in standardising a format for dense disparity map metadata. For example, SMPTE has recently begun such an activity.
Higher Level Analysis
Left-Right Swap Detection
Many people viewing 3D demonstrations have encountered the situation where the left and right images have been inadvertently swapped over. The result is very disturbing, but it is not always obvious even to a human observer what is wrong. It would be useful to be able to detect the swap automatically, but this turns out to be quite a difficult problem.
Measurement of a disparity map is a good starting point, but a correctly arranged 3D pair will typically exhibit both negative disparity values for objects intended to be seen in front of the screen and positive values for objects behind the screen. So a simple analysis of the histogram of disparity values, for example, will not be enough.
One approach that works with reasonable reliability is based on the spatial distribution of disparity values. We observe that for most scenes objects at the centre and bottom of the screen are generally nearer than objects at the top and sides. Figure 4 shows the spatial disparity distribution measured over a set of varied clips comprising 6000 frames.
A possible left-right detection algorithm is to correlate measured disparity with the above template. A positive correlation indicates that the assumed left-right configuration is correct, while a negative correlation indicates that it is reversed.
Figure 4 - Spatial disparity distribution
Figure 5 shows the results of such an algorithm on 38,000 frames of (correctly ordered) 3D material. The blue line shows a 10-frame rolling average and the red line a 1000-frame rolling average of correlation coefficients between measured disparity and the template.
Figure 5 - Performance of left-right swap detection algorithm based on disparity distribution
Whenever the graph is positive, the algorithm is giving a correct result. The last third of the material is professionally produced, well-behaved 3D material whereas the first two-thirds consists of test sequences of varying quality. Clearly, there is always some material that will defeat the algorithm, but on "normal" material it is quite reliable.
A potentially more reliable method of left-right detection is based on the observation that closer objects are expected to occlude objects that are further away. A dense disparity estimator will usually have some kind of confidence output which indicates whether a pixel or region in one view has no equivalent in the other view and is therefore an occluded background region.
As shown in Figure 6, we would expect occluded regions to extend to the left of transitions in the left-eye view and to the right in the right-eye view. The bottom part of the diagram shows where the transitions between foreground (green) and background (blue) are observed to be in relation to occlusions (red) in the two views.
This observation allows us to determine automatically, on a statistical basis, which view is the left-eye view and which is the right-eye view. This approach is potentially more reliable than the method based on spatial disparity distribution, but it does depend on accurate dense disparity estimation including reliable location of occlusions.
Figure 6 - Use of occlusions in left-right swap detection
Reliable analysis of the local relationship between depth and occlusions may be employed for other high-level monitoring tasks, for example to provide a warning that captions might have been inserted at an inappropriate location or depth relative to the other objects in the scene.
2D to 3D Conversion Detection
Our final example concerns the automatic detection of automatic 2D to 3D conversion.
One common technique in simple 2D to 3D conversion is the use of a fixed spatial disparity profile; for example the bottom and centre of the picture are made to appear closer than the top and sides, much as shown by Figure 4 above.
Another technique is to introduce delay between two versions of the same moving sequence to give an impression of depth. This can work because a 3D camera rig tracking across a static scene will in fact generate two streams separated by a delay which corresponds to the time taken for the camera to move by the eye spacing distance.
The algorithm illustrated in Figure 7 detects the use of either or both of these techniques, to give a warning that a 2D to 3D converter might have been used.
Figure 7 - Block diagram of automatic 2D to 3D conversion detector
Fingerprints are calculated separately on the left and right input picture signals. These could be as simple as the average luminance value over each frame, an average over each of a few regions, or any measure which when applied to correctly co-timed left and right signals would be expected to be similar to each other.
A correlation process is then applied to the two fingerprint signals to produce an estimated temporal offset between the input channels. This estimated offset is applied to a temporal low pass filter, which may for example be designed to detect piecewise constant inputs. The filtered temporal offset value is used to control a temporal alignment process on the left and right images; this would be done by applying a delay to one or other of the two inputs.
A disparity map between the temporally aligned left and right images is then calculated, producing a number of disparity values across the picture. A temporal high pass filter is applied to the disparity values, thereby looking for variation in time of the disparity observed in each part of the picture. The mean square value, or other average energy value, of the high pass filter output is calculated. In parallel, a spatial reression process is applied to the disparity map to see if the map fits a fixed spatial model. A low mean square output from the temporal high pass filter, or a close correlation to a fixed spatial model, both provide evidence for a final decision that simple 2D to 3D conversion might have been performed.
With automatic detection such as this, one can envisage a game of "cat and mouse" whereby detection algorithms have to become ever more sophisticated in order to keep up with the increasing complexity of automatic 2D to 3D converters.
By Mike Knee, Snell via 3Droundabout
MPEG Analysis and Measurement
Posted: 25 Nov 2011 08:52 AM PST
Broadcast engineering requires a unique set of skills and talents. Some audio engineers claim the ability to hear the difference between tiny nuisances such as different kinds of speaker wire. They are known as those with golden ears. Their video engineering counterparts can spot and obsess over a single deviate pixel during a Super Bowl touchdown pass or a "Leave it to Beaver" rerun in real time. They are known as eagle eyes or video experts.
Not all audio and video engineers are blessed with super-senses. Nor do we all have the talent to focus our brain's undivided processing power to discover and discern vague, cryptic and sometimes immeasurable sound or image anomalies with our bare eyes or ears on the fly, me included. Sometimes, the message can overpower the media. Fortunately for us and thanks to the Internet and digital video, more objective quality and measurement standards and tools have developed.
One of those standards is Perceptual Evaluation of Video Quality (PEVQ). It is an End-to-End (E2E) measurement algorithm standard that grades picture quality of a video presentation by a five-point Mean Opinion Score (MOS), one being bad and five being excellent.
PEVQ can be used to analyze visible artifacts caused by digital video encoding/decoding or transcoding processes, RF- or IP-based transmission systems and viewer devices like set-top boxes. PEVQ is suited for next-generation networking and mobile services and include SD and HD IPTV, streaming video, mobile TV, video conferencing and video messaging.
The development for PEVQ began with still images. Evaluation models were later expanded to include motion video. PEVQ can be used to assess degradations of a decoded video stream from the network, such as that received by a TV set-top box, in comparison to the original reference picture as broadcast from the studio. This evaluation model is referred to as End-to-End (E2E) quality testing.
E2E exactly replicates how so-called average viewers would evaluate the video quality based on subjective comparison, so it addresses Quality-of-Experience (QoE) testing. PEVQ is based on modeling human visual behaviors. It is a full-reference algorithm that analyzes the picture pixel-by-pixel after a temporal alignment of corresponding frames of reference and test signal.
Besides an overall quality Mean Opinion Score figure of merit, abnormalities in the video signal are quantified by several Key Performance Indicators (KPI), such as Peak Signal-to-Noise Ratios (PSNR), distortion indicators and lip-sync delay.
PVEQ References
Depending on the data made available to the algorithm, video quality test algorithms can be divided into three categories based on available reference data.
A Full Reference (FR) algorithm has access to and makes use of the original reference sequence for a comparative difference analysis. It compares each pixel of the reference sequence to each corresponding pixel of the received sequence. FR measurements deliver the highest accuracy and repeatability but are processing intensive.
A Reduced Reference (RR) algorithm uses a reduced bandwidth side channel between the sender and the receiver, which is not capable of transmitting the full reference signal. Instead, parameters are extracted at the sending side, which help predict the quality at the receiving end. RR measurements are less accurate than FR and represent a working compromise if bandwidth for the reference signal is limited.
A No Reference (NR) algorithm only uses the degraded signal for the quality estimation and has no information of the original reference sequence. NR algorithms are low accuracy estimates only, because the original quality of the source reference is unknown. A common variant at the upper end of NR algorithms analyzes the stream at the packet level, but not the decoded video at the pixel level. The measurement is consequently limited to a transport stream analysis.
Another widely used MOS algorithm is VQmon. This algorithm was recently updated to VQmon for Streaming Video. It performs real-time analysis of video streamed using the key Adobe, Apple and Microsoft streaming protocols, analyzes video quality and buffering performance and reports detailed performance and QoE metrics. It uses packet/frame-based zero reference, with fast performance that enables real-time analysis on the impact that loss of I, B and P frames has on the content, both encrypted and unencrypted.
The 411 on MDI
The Media Delivery Index (MDI) measurement is specifically designed to monitor networks that are sensitive to arrival time and packet loss such as MPEG-2 video streams, and is described by the Internet Engineering Task Force document RFC 4445. It measures key video network performance metrics, including jitter, nominal flow rate deviations and instant data loss events for a particular stream.
MDI provides information to detect virtually all network-related impairments for streaming video, and it enables the measurement of jitter on fixed and variable bit-rate IP streams. MDI is typically shown as the ratio of the Delay Factor (DF) to the Media Loss Rate (MLR), i.e. DF:MLR.
DF is the number of milliseconds of streaming data that buffers must handle to eliminate jitter, something like a time-base corrector once did for baseband video. It is determined by first calculating the MDI virtual buffer depth of each packet as it arrives. In video streams, this value is sometimes called the Instantaneous Flow Rate (IFR). When calculating DF, it is known as DELTA.
To determine DF, DELTA is monitored to identify maximum and minimum virtual depths over time. Usually one or two seconds is enough time. The difference between maximum and minimum DELTA divided by the stream rate reveals the DF. In video streams, the difference is sometimes called the Instantaneous Flow Rate Deviation (IFRD). DF values less than 50ms are usually considered acceptable. An excellent white paper with much more detail on MDI is available from Agilent.
Figure 1 - The Delay Factor (DF) dictates buffer size needed to eliminate jitter
Using the formula in Figure 1, let's say a 3.Mb/s MPEG video stream observed over a one-second interval feeds a maximum data rate into a virtual buffer of 3.005Mb and a low of 2.995Mb. The difference is the DF, which in this case is 10Kb. DF divided by the stream rate reveals the buffer requirements. In this case, 10K divided by 3.Mb/s is 3.333 milliseconds. Thus, to avoid packet loss in the presence of the known jitter, the receiver's buffer must be 15kb, which at a 3Mb rate injects 4 milliseconds of delay. A device with an MDI rating of 4:0.003, for example, would indicate that the device has a 4 millisecond DF and a MLR of 0.003 media packets per second.
The MLR formula in Figure 2 is computed by dividing the number of lost or out-of-order media packets by observed time in seconds. Out-of-order packets are crucial because many devices don't reorder packets before handing them to the decoder. The best-case MLR is zero. The minimum acceptable MLR for HDTV is generally considered to be less than 0.0005. An MLR greater than zero adds time for viewing devices to lock into the higher MLR, which slows channel surfing an can introduce various ongoing anomalies when locked in.
Figure 2 - The Media Loss Rate (MLR) is used in the Media Delivery Index (MDI)
Watch That Jitter
Just as too much coffee can make you jittery, heavy traffic can make a network jittery, and jitter is a major source of video-related IP problems. Pro-actively monitoring jitter can alert you to help avert impending QoE issues before they occur.
One way to overload a MPEG-2 stream is with excessive bursts. Packet bursts can cause a network-level or a set-top box buffer to overflow or under-run, resulting in lost packets or empty buffers, which cause macro blocking or black/freeze frame conditions, respectively. An overload of metadata such as video content PIDs can contribute to this problem.
Probing a streaming media network at various nodes and under different load conditions makes it possible to isolate and identify devices or bottlenecks that introduce significant jitter or packet loss to the transport stream. Deviations from nominal jitter or data loss benchmarks are indicative of an imminent or ongoing fault condition.
QoE is one of many subjective measurements used to determine how well a broadcaster's signal, whether on-air, online or on-demand, satisfies the viewer's perception of the sights and sounds as they are reproduced at his or her location. I can't help but find some humor in the idea that the ones-and-zeros of a digital video stream can be rated on a gray scale of 1-5 for quality.
Experienced broadcast engineers know the so-called quality of a digital image begins well before the light enters lens, and with apologies to our friends in the broadcast camera lens business, the image is pre-distorted to some degree within the optical system before the photons hit the image sensors.
QoE or RST?
A scale of 1-5 is what ham radio operators have used for 100 years in the readability part of the Readability, Strength and Tone (RST) code system. While signal strength (S) could be objectively measured with an S-meter such as shown in Figure 3, readability (R) was purely subjective, and tone (T) could be subjective, objective or both.
Figure 3 - The S-meter was the first commonly used metric to objectively
read and report signal strength at an RF receive site
Engineers and hams know that as S and or T diminish, R follows, but that minimum acceptable RST values depend almost entirely on the minimum R figure the viewer or listener is willing to accept. In analog times, the minimum acceptable R figure often varied with the value of the message.
Digital technology and transport removes the viewer or listener's subjective reception opinion from the loop. Digital video and audio is either as perfect as the originator intended or practically useless. We don't need a committee to tell us that. It seems to me the digital cliff falls just south of a 4x5x8 RST. Your opinion may vary.
By Ned Soseman, Broadcast Engineering
You are subscribed to email updates from 3D CineCast
To stop receiving these emails, you may unsubscribe now.
Email delivery powered by Google
Google Inc., 20 West Kinzie, Chicago IL USA 60610
Wednesday, November 23, 2011
3D using the Pulfrich effect a psycho-optical phenomenon
It's a shame because the general public never understood any of this and it just gave 3D a bad name.
http://en.wikipedia.org/wiki/Pulfrich_effect
From: http://www.rainbowsymphony.com/3d-pulfrich-glasses.html
The Pulfrich effect is a psycho-optical phenomenon wherein lateral motion by an object in the field of view is interpreted by the brain as having a depth component, due to differences in processing speed between images from the two eyes. The effect is generally created by covering one eye with a really dark filter. The phenomenon is named for German physicist Carl Pulfrich who first described it in 1922.
In the classic Pulfrich effect experiment a subject views a pendulum swinging in a plane perpendicular to the observer’s line of sight. When a neutral density filter A lens which has been darkened, perhaps with grey for example, the pendulum is placed in front of the right eye seeming to be making an elliptical orbit, giving the illusion that it is closer as it swings to the right, and further away as it swings to the left.
The most accepted explanation for the noticeable depth is reduced retinal illumination. in terms of the other eye creating a signal delay due to the immediate spatial differences between objects in motion. The probable reason this seems to occur is due to the visual latencies which are normally shorter for The visual system reacts faster to targets that are bright in contrast to targets which are dim . (this was originally described by Carl Pulfrich , who was a German physicist) The moving object is observed in the retinal luminance and hence there is a difference in the signal latencies because of the distance between two eyes.
The Pulfrich effect, yielding about 15 meters, is scaled under real-life conditions with dark targets on a background of bright colors. delay for a factor of ten difference in average retinal luminance . These delays increase monotonically with decreased luminance over a wide (> 6 log-units) There is a vast spectrum of light. Also, this effect is seen as a bright target with a black background and shines with the same time period of brightness.
This effect could happen at any time in several diseases of the eye such as cataracts. Optic neuritis, or multiple sclerosis. In these cases, symptoms that have been reported include having a hard time judging the paths of cars that are coming forward.
In visual media such as film and television, the Pulfrich effect is often used to produce 3-Dimensional imagery with Glasses. As in other kinds of stereoscopy, 3D glasses are used to create the illusion of a three-dimensional image. By placing a neutral filter (by way of example, the darkened sunglass lens)
covering one eye, an image, while moving back and forth. to the left or to the right, but definitely not up or down.
Because the Pulfrich effect depends on motion in a particular direction to instigate the illusion of depth, it is not useful as a general stereoscopic technique; for example it cannot be used to show a stationary object apparently extending into or out of the screen; similarly, objects moving vertically will not be seen as moving in depth. However, the novelty effect is found in the visual scenario. One advantage of material produced to take advantage of the Pulfrich effect is that it is fully compatible with "regular" viewing without the need for "special" 3D Glasses.
This effect was somewhat popular in the 1990's. It was used, for example, in a 3D motion TV advertisement in 1990s, where objects moving in a particular direction seemed less distant to the viewer than others. viewing the front of a TV screen and they seemed to be further away from the viewer when moved in the opposite direction. behind the screen of a television set. To allow viewers to see the effect, the advertiser provided a large number of viewers with a pair of filters in a paper frame. In one eye the filter was more of a dark neutral gray and the other one was more transparent. In this instance, the commercial was restricted to objects only like skateboarders and refrigerators moving down a steep hill from left to right across the screen, a directional dependency determined by which eye was covered by the darker filter.
The effect was also used in the 1993 Doctor Who charity special Dimensions in Time and a 1997 special TV episode of 3rd Rock from the Sun. In many European countries, a group of short 3D movies made in the Netherlands were seen on TV. 3D Glasses were sold at a chain of gas stations. These short films were mainly travelogues of Dutch localities. An episode of The Power Rangers. uses "Circlescan 4D" technology and was given away through McDonalds. This is based on the Pulfrich effect. Animated programs that employed the Pulfrich effect in specific segments of its programs include The Bots Master and Space Strikers; they typically achieved the effect through the use of constantly-moving background and foreground layers. The famed Nintendo Entertainment System was known for using the effect along with their videogame Orb-3D. through keeping the player's ship continually moving and also included a set of 3D Glasses. So did Jim Power: The Lost Dimension in 3-D for the popular Super Nintendo gaming system, utilizing interesting and unique scrolling backgrounds to an especially great effect.
In 2000, 3D Pulfrich glasses were given to six million viewers in the United States and Canada for Discovery Channel's Shark Week.