I can't wait to watch football on a 360 degree immersive VR headset. NextVR of Laguna Beach California says that it has developed a custom "lens-to-lens system for capturing and delivering live and on-demand virtual reality experiences in true broadcast quality." How great will it be to be able to watch a play unfold while virtually standing in the pocket and seeing everything the quarterback sees? If you are defense-oriented, then you will want to stand where the linebackers line up and see the game from their side of the ball.
Live action streaming VR cameras will be set up in a calibrated array around the stadium. Among the many technical challenges that must be overcome to make a VR broadcast happen, including real-time stitching together of the many cameras' images, there is one challenge that I have not seen addressed yet. The problem is know as occlusion.
Live action streaming VR cameras will be set up in a calibrated array around the stadium. Among the many technical challenges that must be overcome to make a VR broadcast happen, including real-time stitching together of the many cameras' images, there is one challenge that I have not seen addressed yet. The problem is know as occlusion.
In 3D graphics, occlusion is the effect of one object in a 3D space blocking another object from view. In the context of a football game, think of the large offensive linemen who stands up and blocks the camera's view of the front of the guy he is blocking. To some extent, multiple cameras can help to fill in the visual gap, but won't there still necessarily be areas on the field that will be blocked by moving players? And why is that a problem? It will matter when you suddenly find yourself looking at the other side of a player that is blocked by the cameras when you try to move around in the virtual space. The ruptured image would be tremendously jarring and, of course, takes you out of any sense of presence.
In the short term the problem may well be handled by limiting your ability to move around in a 2D/3D world so you can only see what the cameras see. Eventually, however, just like you can in a game environment, you are going to want to be able to change your viewing position (potentially by "walking around") so you can watch the action from anywhere on the field.
To be clear, I'm not talking about occlusion-culling software like Umbra Software's, which is used to decrease the graphics load by not calculating and rendering the parts of a scene the user is not looking at or which they can't see (for example a house sitting behind another building). That is a cool innovation, but addresses a different problem.
I'm wondering if any of the readers of this blog know whether something like the following is being explored. It seems to me that one option to deal with the occlusion problem might be to scan or film the objects you want to be able to show continuously in a scene (i.e., during the broadcast event), in this example that would be the football players. The process would then include rigging the players (as is done in animation) and for any given scene extrapolate the "back" (the part blocked from the camera) onto the rigged person or object(s). While no doubt computationally expensive, I wonder if this approach is achievable. In a game or experience, once rigged the system can display the part that is visible to the camera and extrapolate any missing piece from the pre-rigged image by overlapping the pre-rigged person with the real-time person. I could understand that approach not working if real-time rigging of the players in the scene can't be done, but I am struggling with how this problem will be solved.
I would be interested in knowing if any readers have encountered other proposed fixes to the problem of occlusion. This seems to be an issue that researchers have been wrestling with in the computer vision field for over 10 years. See for example,
Reconstruction of Objects from Images with Partial Occlusion (2005) and Resolving occlusion in image sequence made easy (1995). One promising approach seemed to include estimating the shape of the occluded object from the silhouettes of objects. Multi-Object Shape Estimation from Silhouette Cues (2007) I'm interested to know whether anyone has seen a satisfactory resolution of the occlusion problem in the context of VR.
So, while doing final research for this post, I find this interview of NextVR in which Co-Founder David Cole indicates they have solved the problem for their system.
Q. In NextVR's press release, it describes the experience of being able to peer around objects or people. I completely understand the ability to maintain or change focus when kneeling forward and back, but how can the cameras capture information that isn't directly in front of them? Did I misunderstand the meaning of the text?
A. We have very high spatio-angular offsets in our camera configurations. This allows the second element of the positional tracking solution; view synthesis. Simply put, we can fill in holes that are formed when occluded elements of the scene are uncovered by a change in viewing perspective. This is a very hard-won solution to the problem and something we've been working on for years.
Has anyone seen the view synthesis approach? How well does it fill in the pieces? Does it truly allow you to change your location to anywhere on the field and not have jarring occlusion fragments? Let me know what you know! I think this is an interesting problem.
In the short term the problem may well be handled by limiting your ability to move around in a 2D/3D world so you can only see what the cameras see. Eventually, however, just like you can in a game environment, you are going to want to be able to change your viewing position (potentially by "walking around") so you can watch the action from anywhere on the field.
To be clear, I'm not talking about occlusion-culling software like Umbra Software's, which is used to decrease the graphics load by not calculating and rendering the parts of a scene the user is not looking at or which they can't see (for example a house sitting behind another building). That is a cool innovation, but addresses a different problem.
I'm wondering if any of the readers of this blog know whether something like the following is being explored. It seems to me that one option to deal with the occlusion problem might be to scan or film the objects you want to be able to show continuously in a scene (i.e., during the broadcast event), in this example that would be the football players. The process would then include rigging the players (as is done in animation) and for any given scene extrapolate the "back" (the part blocked from the camera) onto the rigged person or object(s). While no doubt computationally expensive, I wonder if this approach is achievable. In a game or experience, once rigged the system can display the part that is visible to the camera and extrapolate any missing piece from the pre-rigged image by overlapping the pre-rigged person with the real-time person. I could understand that approach not working if real-time rigging of the players in the scene can't be done, but I am struggling with how this problem will be solved.
I would be interested in knowing if any readers have encountered other proposed fixes to the problem of occlusion. This seems to be an issue that researchers have been wrestling with in the computer vision field for over 10 years. See for example,
Reconstruction of Objects from Images with Partial Occlusion (2005) and Resolving occlusion in image sequence made easy (1995). One promising approach seemed to include estimating the shape of the occluded object from the silhouettes of objects. Multi-Object Shape Estimation from Silhouette Cues (2007) I'm interested to know whether anyone has seen a satisfactory resolution of the occlusion problem in the context of VR.
So, while doing final research for this post, I find this interview of NextVR in which Co-Founder David Cole indicates they have solved the problem for their system.
Q. In NextVR's press release, it describes the experience of being able to peer around objects or people. I completely understand the ability to maintain or change focus when kneeling forward and back, but how can the cameras capture information that isn't directly in front of them? Did I misunderstand the meaning of the text?
A. We have very high spatio-angular offsets in our camera configurations. This allows the second element of the positional tracking solution; view synthesis. Simply put, we can fill in holes that are formed when occluded elements of the scene are uncovered by a change in viewing perspective. This is a very hard-won solution to the problem and something we've been working on for years.
Has anyone seen the view synthesis approach? How well does it fill in the pieces? Does it truly allow you to change your location to anywhere on the field and not have jarring occlusion fragments? Let me know what you know! I think this is an interesting problem.