This was a hack in Amazon's smart doorbell displayed on stage at MWC 2019
Today we focus on the Ring Doorbell, an Amazon-acquired home security device aimed at replacing the plain old doorbell. Its main feature is two-way communication between the smart doorbell and your mobile App, allowing the user to confirm who is dinging from anywhere via the internet. Assuming the Ring owner is away from home, he or she can then remotely open the door via Alexa if a supported smart lock is installed, to let the cleaner in for example.
The Shark tank-funded startup has been on the IoT mainstage for a while now, and it did not surprise anyone when the Wifi password leak vulnerability drew massive attention in 2015. Considering the nature of this device, it is critical that a security-first approach is maintained throughout the production cycle, and indeed Ring were quick to respond, patching the vulnerability in just two weeks.
Approaching the Doorbell, we fixated on first inspecting the network traffic for any alarming behaviour. The network topology chosen by Ring is using AWS as relay servers, with both the mobile and the Doorbell device communicating exclusively with the cloud. Schematically, a ding triggers an API call to the server, which messages the device and triggers a notification. Then an audio/video stream is sent to the server and bounced to the app. If the user picks up, an audio-only stream is sent back and played by the Doorbell speaker.
Inspecting the call setup quickly indicated Ring were rolling out their own “innovative” SIP/RTP crypto. Instead of using the well standardized SIP/TLS and SRTP protocols, Ring added a security triplet in the “INVITE” SIP message (as seen below). SIP (session initiation protocol) is the dialect through which two sides establish a call. The per-session X-SSRC-A, X-SSRC-V, X-Session-Hash headers supposedly protect the SIP message via some sort of signature, and perhaps contain the key to the upcoming RTP stream.
Upon finishing the SIP handshake, RTP packets are sent to the cloud in a custom layer 7 encryption. RTP (real-time transport protocol) specifies how real time data is sent, with sequencing and multiplexing functionalities. The RTP header is transferred as-is while the actual payload is encrypted, as is shown here with wireshark showing a clearly jibberish sequence of messages. Wireshark assumes from the RTP header that the payload is H264 (a video encoding standard), yet the codec dissector is not making sense of the stream, showing “Bad NAL Length”, “Unknown subtype”, “Reserved” and so on. Basic cryptographic attacks such as known-plaintext or simple XORing / block cyphering with the security parameters did not yield results.
We moved over to sniffing the application. Here we see a more sensible SIP/TLS approach, with pretty much all notifications, updates and information being passed via HTTPS. However, the actual RTP traffic seems plain!
The data seems sensible, and therefore we might be able to extract it. Using our handy videosnarf utility, we get a viewable MPEG file. This means anyone with access to incoming packets can see the feed! Similarly, we can also extract the audio G711 encoded stream.
Accessing application traffic is not a difficult task — if the user is at home, we just need Wifi access — either cracking weak encryption if present, or exploiting another home device. When the user is in transit, one can open a rogue Wifi near him and wait for him to join, or join a common public network. Once sharing a network, a simple ARP spoof will allow us to capture Ring data traffic before passing it on to the app. Certain 3G/4G configurations may allow intra-network poisioning as well.
Capturing the Doorbell feed is already great, but why stop there when we can inject our own? We developed a POC, whereby we first captured real footage in a so-called “recon mode”. Then, in “active mode” we can drop genuine traffic and inject the acquired footage. This hack works smoothly and is undetectable from within the app. In Mobile World Congress 2019, we publicly demonstrated the attack.
Is it really Jesus at the door?
The attack scenarios possible are far too numerous to list, but for example imagine capturing an Amazon delivery and then streaming this feed. It would make for a particularly easy burglary. Spying on the doorbell allows for gathering of sensitive information — household habits, names and details about family members including children, all of which make the target an easy prey for future exploitation. Letting the babysitter in while kids are at home could be a potentially life threatening mistake.
The main takeaway from this research is that security is only as strong as its weakest link. Encrypting the upstream RTP traffic will not make forgery any harder if the downstream traffic is not secure, and encrypting the downstream SIP transmission does not thwart stream interception. When dealing with such sensitive data like a doorbell, secure transmission is not a feature but a must, as the average user will not be aware of potential tampering. We urge Ring to move to a secure scheme like SRTP (secure RTP) as soon as possible.