Thor update. High Efficiency, Moderate Complexity Video Codec using only RF IPR

Similar documents
Thor update. High Efficiency, Moderate Complexity Video Codec using only RF IPR. (

h 8x8 chroma a b c d Boundary filtering: 16x16 luma H.264 / MPEG-4 Part 10 : Intra Prediction H.264 / MPEG-4 Part 10 White Paper Reconstruction Filter


H.264/MPEG4 Part INTRODUCTION Terminology

HM9: High Efficiency Video Coding (HEVC) Test Model 9 Encoder Description Il-Koo Kim, Ken McCann, Kazuo Sugimoto, Benjamin Bross, Woo-Jin Han

Neural network based intra prediction for video coding

Motion Vector Prediction With Reference Frame Consideration

Adaptive Quantization Matrices for HD and UHD Display Resolutions in Scalable HEVC

H.264 / MPEG-4 Part 10 : Intra Prediction

Detailed Review of H.264/AVC

The MPEG4/AVC standard: description and basic tasks splitting

IMPROVED INTRA ANGULAR PREDICTION BY DCT-BASED INTERPOLATION FILTER. Shohei Matsuo, Seishi Takamura, and Hirohisa Jozawa

Intra Prediction by a linear combination of Template Matching predictors

Redundancy Allocation Based on Weighted Mismatch-Rate Slope for Multiple Description Video Coding

LOSSLESS INTRA CODING IN HEVC WITH INTEGER-TO-INTEGER DST. Fatih Kamisli. Middle East Technical University Ankara, Turkey

Application of a Bi-Geometric Transparent Composite Model to HEVC: Residual Data Modelling and Rate Control

Intraframe Prediction with Intraframe Update Step for Motion-Compensated Lifted Wavelet Video Coding

Module 4. Multi-Resolution Analysis. Version 2 ECE IIT, Kharagpur

INTERNATIONAL ORGANISATION FOR STANDARDISATION ORGANISATION INTERNATIONALE DE NORMALISATION ISO/IEC JTC1/SC29/WG11 CODING OF MOVING PICTURES AND AUDIO

Compression and Coding

On Compression Encrypted Data part 2. Prof. Ja-Ling Wu The Graduate Institute of Networking and Multimedia National Taiwan University

Lossless Image and Intra-frame Compression with Integer-to-Integer DST

Direction-Adaptive Transforms for Coding Prediction Residuals

6. H.261 Video Coding Standard

A TWO-STAGE VIDEO CODING FRAMEWORK WITH BOTH SELF-ADAPTIVE REDUNDANT DICTIONARY AND ADAPTIVELY ORTHONORMALIZED DCT BASIS

Multimedia Communications Fall 07 Midterm Exam (Close Book)

Scalable resource allocation for H.264 video encoder: Frame-level controller

Lecture 2: Introduction to Audio, Video & Image Coding Techniques (I) -- Fundaments

LORD: LOw-complexity, Rate-controlled, Distributed video coding system

SYDE 575: Introduction to Image Processing. Image Compression Part 2: Variable-rate compression

SIGNAL COMPRESSION. 8. Lossy image compression: Principle of embedding

Lecture 2: Introduction to Audio, Video & Image Coding Techniques (I) -- Fundaments. Tutorial 1. Acknowledgement and References for lectures 1 to 5

EE5585 Data Compression April 18, Lecture 23

High Throughput Entropy Coding in the HEVC Standard

VIDEO CODING USING A SELF-ADAPTIVE REDUNDANT DICTIONARY CONSISTING OF SPATIAL AND TEMPORAL PREDICTION CANDIDATES. Author 1 and Author 2

Context-adaptive coded block pattern coding for H.264/AVC

Module 5 EMBEDDED WAVELET CODING. Version 2 ECE IIT, Kharagpur

WITH increasing demand for video services [1], [2], the

Analysis of Rate-distortion Functions and Congestion Control in Scalable Internet Video Streaming

THE newest video coding standard is known as H.264/AVC

Predictive Coding. Prediction Prediction in Images

Predictive Coding. Prediction

+ (50% contribution by each member)

Vector Quantizers for Reduced Bit-Rate Coding of Correlated Sources

Lecture 7 Predictive Coding & Quantization

EE368B Image and Video Compression

Reduce the amount of data required to represent a given quantity of information Data vs information R = 1 1 C

State of the art Image Compression Techniques

Run-length & Entropy Coding. Redundancy Removal. Sampling. Quantization. Perform inverse operations at the receiver EEE

CHAPTER 3. Implementation of Transformation, Quantization, Inverse Transformation, Inverse Quantization and CAVLC for H.

Estimation-Theoretic Delayed Decoding of Predictively Encoded Video Sequences

Basic Principles of Video Coding

1 Overview. Coding flow

HIGH Efficiency Video Coding (HEVC) standard [1] was. Saliency-Guided Complexity Control for HEVC Decoding. arxiv: v5 [cs.

Relationship Between λ and Q in RDO

arxiv: v1 [cs.mm] 10 Mar 2016

Enhanced SATD-based cost function for mode selection of H.264/AVC intra coding

4x4 Transform and Quantization in H.264/AVC

4. Quantization and Data Compression. ECE 302 Spring 2012 Purdue University, School of ECE Prof. Ilya Pollak

MODERN video coding standards, such as H.263, H.264,

CODING SAMPLE DIFFERENCES ATTEMPT 1: NAIVE DIFFERENTIAL CODING

Lec 05 Arithmetic Coding

Can the sample being transmitted be used to refine its own PDF estimate?

SSIM-Inspired Perceptual Video Coding for HEVC

Lec 04 Variable Length Coding (VLC) in JPEG

Rate-Constrained Multihypothesis Prediction for Motion-Compensated Video Compression

3D-CE5.h: Merge candidate list for disparity compensated prediction

CSE 408 Multimedia Information System Yezhou Yang

3684 IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 23, NO. 8, AUGUST 2014

Vector Quantization Encoder Decoder Original Form image Minimize distortion Table Channel Image Vectors Look-up (X, X i ) X may be a block of l

Module 2 LOSSLESS IMAGE COMPRESSION SYSTEMS. Version 2 ECE IIT, Kharagpur

Matrix Assembly in FEA

Phase-Correlation Motion Estimation Yi Liang

Autumn Coping with NP-completeness (Conclusion) Introduction to Data Compression

Human Visual System Based Adaptive Inter Quantization

L. Yaroslavsky. Fundamentals of Digital Image Processing. Course

Image and Multidimensional Signal Processing

Fast Progressive Wavelet Coding

Computer Engineering Mekelweg 4, 2628 CD Delft The Netherlands MSc THESIS

Rate-Distortion Based Temporal Filtering for. Video Compression. Beckman Institute, 405 N. Mathews Ave., Urbana, IL 61801

BASICS OF COMPRESSION THEORY

A DISTRIBUTED VIDEO CODER BASED ON THE H.264/AVC STANDARD

Instruction Set Extensions for Reed-Solomon Encoding and Decoding

Wavelet Scalable Video Codec Part 1: image compression by JPEG2000

Introduction p. 1 Compression Techniques p. 3 Lossless Compression p. 4 Lossy Compression p. 5 Measures of Performance p. 5 Modeling and Coding p.

A Video Codec Incorporating Block-Based Multi-Hypothesis Motion-Compensated Prediction

Introduction to Video Compression H.261

NCU EE -- DSP VLSI Design. Tsung-Han Tsai 1

THE currently prevalent video coding framework (e.g. A Novel Video Coding Framework using Self-adaptive Dictionary

Compression methods: the 1 st generation

Half-Pel Accurate Motion-Compensated Orthogonal Video Transforms

Lec 03 Entropy and Coding II Hoffman and Golomb Coding

Resource allocation for H.264 video encoder: Sequence-level controller

Summary of Last Lectures

Recurrent Neural Networks (Part - 2) Sumit Chopra Facebook

Product Obsolete/Under Obsolescence. Quantization. Author: Latha Pillai

A WAVELET BASED CODING SCHEME VIA ATOMIC APPROXIMATION AND ADAPTIVE SAMPLING OF THE LOWEST FREQUENCY BAND

Vector Quantization and Subband Coding

(12) Patent Application Publication (10) Pub. No.: US 2009/ A1

Learning an Adaptive Dictionary Structure for Efficient Image Sparse Coding

Transcription:

Thor update High Efficiency, Moderate Complexity Video Codec using only RF IPR draft-fuldseth-netvc-thor-01 Steinar Midtskogen (Cisco) IETF 94 Yokohama, JP November 2015 1

IPR note https://datatracker.ietf.org/ipr/2636/ If technology in this document is included in a standard adopted by IETF and any claims of any Cisco patents are necessary for practicing the standard, any party will have the right to use any such patent claims royalty-free under reasonable, non-discriminatory terms, including defensive suspension, to implement and fully comply with the standard. 2

Topics for this update Brief recap of the Thor design Changes since IETF93 Constrained low pass filter Interpolated reference frames Optimisation and SIMD support Updated compression performance 2

Design principles Moderate complexity to allow real-time implementation in software Favouring simplicity both in terms of computation and description Using techniques known to work well and improving on those Many similarities with H.26x Royalty free IPR 3

Encoder/decoder architecture The same basic architecture as H.261, H.263, H.264 and H.265 Input video - Transform Quantizer Entropy Coding Output bitstream Input Bitstream Entropy Decoding Inverse Transform Inverse Transform Intra Frame Prediction Inter Frame Prediction Motion Estimation Loop filters Reconstructed Frame Memory Intra Frame Prediction Inter Frame Prediction Loop filters Reconstructed Frame Memory Output video 4

Block Structure Super block (SB) 64x64 Quad-tree split into coding blocks (CB) >= 8x8 Multiple prediction blocks (PB) per CB Intra: 1 PB per CB Inter: 1, 2 (rectangular) or 4 (square) PBs per CB 1 or 4 transform blocks (TB) per CB 5

Coding-block modes Intra Inter0 MV index, no residual information Inter1 MV index, residual information Inter2 Explicit motion vector information, residual information Bipred Explicit motion vector information (x2), residual information 6

Some difference from H.265 Slightly shorter interpolation luma filter and a special non-separable filter for the (½, ½) position Fewer intra modes Simpler deblocking filter Simpler deringing filter VLC-based (non-arithmetic) entropy coding Temporally interpolated reference frames (never displayed) 7

Changes since IETF93/July 2015 New constrained low pass filter Support for frame reordering Temporally interpolated reference frames (never displayed) Simplified 64x64 transform (32x32 and scaling) New filter coefficients Different coefficients for uni-pred and bi-pred Various syntactic changes Major speed improvements (non-normative changes) Motion estimation rewritten 8

New constrained low-pass filter An attempt to reduce the problem into a lookup table Create an index using the pixel to be filtered and its neighbours Comparisons with 8 neighbours gives a relatively small table I = (A>X) 2⁰ + (B>X) 2¹ + (C>X) 2² + (D>X) 2³ + (E>X) 2⁴ + (F>X) 2⁵ + (G>X) 2⁶ + (H>X) 2⁷ 256 entries Pixel weights or offsets? Most simple: a 0 or 1 offset A B C D X E F G H 9

New constrained low-pass filter An overnight script can create the table: Make all tables consisting of 255 0's and one 1, and record all 1's that give an improvement. It turns out that B, D, E and G are important. Comparing with A, C, F and H (diagonally) only give very small gains. Initial experiments using both > and >= operators to create an index were not convincing, but not fully explored. Signalled offsets higher than 1 give small gains. A B C D X E F G H 10

New constrained low-pass filter A table with few 1's still giving most of the gain: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 Equivalent to: X' = X + ((B>X)+(D>X)+(E>X)+(G>X) > 2) Increase by one if at least three of the four neighbours (up, left, right, down) are larger. Symmetry: X' = X + ((B>X)+(D>X)+(E>X)+(G>X) > 2) - ((B<X)+(D<X)+(E<X)+(G<X) > 2) A B C D X E F G H 12

New constrained low-pass filter Pixels outside frame or block border: Give X's value Input pixels are always unfiltered to allow parallelism A very simple filter with little memory footprint and very well suited for SIMD instructions. Does not work well with bi-prediction. Probably because the bi-predictive averaging itself is a low-pass filter. 12

New constrained low-pass filter We don't want to filter everything! Flag at superblock (64x64) level indicates whether to filter the block or not Test using squared sum of differences Sub-blocks with no residual are not filtered Sub-blocks with bi-prediction are not filtered Superblocks with no residual or fully bi-predictive are implicitly unfiltered no need spend a bit for the flag 13

New constrained low-pass filter Performs better than the previous filter and gives more consistent gains Subjective gains larger than objective gains The objective gains at low bitrates are small, so there is still room for improvements. 14

New constrained low-pass filter Results with only uni-prediction: Sequence BDR BDR (low br) BDR (high br) Kimono -1.5% -0.8% -2.5% BasketballDrive -2.9% -1.6% -4.5% BQTerrace -6.6% -3.8% -8.0% FourPeople -4.5% -2.3% -8.0% Johnny -3.6% -1.5% -7.0% ChangeSeats -4.7% -1.9% -8.3% HeadAndShoulder -6.7% -0.7% -14.9% TelePresence -2.9% -1.0% -5.8% Average -4.2% -1.7% -7.4% 15

New constrained low-pass filter Results with bi-prediction enabled: Sequence BDR BDR (low br) BDR (high br) Kimono -0.9% -0.4% -1.5% BasketballDrive -1.2% -0.8% -1.5% BQTerrace -1.6% -1.1% -2.0% FourPeople -2.5% -1.5% -3.7% Johnny -2.1% -1.1% -3.6% ChangeSeats -2.5% -1.2% -4.1% HeadAndShoulder -2.4% -0.9% -4.5% TelePresence -1.5% -0.2% -3.5% Average -1.8% -0.9% -3.1% 16

Interpolated reference frames Uses motion estimation between two frames to create a new reference frame For prediction only, never displayed (unless used to code a frame with no residual and no vectors). Motion estimation must be done in both the encoder and decoder Generally speeds up encoding (but not in the worst case), because we get a lot of skip blocks But adds complexity to the decoder 17

Interpolated reference frames Can be used for extrapolation (motion estimation between two past frames) and interpolation (between past and future frame, requires frame reordering) Only interpolation seems to give useful results Since the decoder has to perform the same motion estimation as the encoder, we need a fast and simple algorithm! 18

Interpolated reference frames The typical case: Two frames R0 and R1 and a frame F equidistant in time between them to be interpolated R0 F R1 mv0 mv1 19

Interpolated reference frames Both reference frames are repeatedly scaled down by a factor of ½ vertically and horisontally using the filter (½,½) up to 4 times (or until the frame cannot hold a 16x16 block) 960x540 480x270 240x135 1920x1080 20

Interpolated reference frames Start ME for the smallest frames and use motion vectors found as search candidates for the higher layer For each layer, the stages are as follows: For each 16x16 in raster order Check if ME can be bypassed If not, get candidates from lower layer and neighbour blocks Perform an adaptive cross search around each candidate vector and detemine the best vector. Up to 16 steps at lowest layer, else just 2. For each 8x8 in raster order, find the best merge candidate, i.e. use the original 16x16 block vector or one of the neighbouring block vectors 21

Interpolated reference frames Bypass prediction is used to stabilise the mv field (i.e. prevent accidental matches) and reduce complexity. mv1 (and its derived mv0) are computed from neighbouring blocks (like a candidate vector) For each 8x8 block S calculate the SAD between S+mv0 in R0 and S+mv1 in R1 (luma and chroma). If all SADs are below a given threshold, further ME is bypassed Corresponds to early skip techniques used in encoders 22

Interpolated reference frames Adaptive cross search examines in each step 4 positions (left, right, up, down) with a displacement D. If none of them is better, divide D by two and try again. Otherwise, search again around the best position. D is 1 and the number of matches allowed is 8 (two steps), except at the lowest level where it is 64. The matching criterion (420 video): SAD(B0, B1) + 4*(SAD(U0, U1) + SAD(V0, V1)) + λ*mv_cost (B0 = b + mv0 in R0.luma, B1 = B + mv1 in R1.luma, etc) mv_cost is a measure of the disparity between the mv and neighbour vectors. λ is fixed for each layer. 23

Interpolated reference frames Sequence QP 22,27,32,37 QP 32,36,40,44 Kimono -3.5% -6.6% ParkScene -3.1% -7.0% Cactus -4.9% -8.9% BasketballDrive -2.1% -5.5% BQTerrace -1.9% -4.7% ChangeSeats -5.8% -12.1% HeadAndShoulder -6.6% -10.1% TelePresence -6.6% -11.0% WhiteBoard -7.5% -12.4% FourPeople -7.0% -9.1% Johnny -6.2% -8.0% KristenAndSara -7.0% -9.9% Average -5.2% -8.8% 24

SIMD optimisations We need to verify that Thor is SIMD friendly and can compete with other optimised codecs Supported by modern CPU's (x86: SSE2, SSE3, etc and ARM: NEON) Single instruction, multiple data Very useful for video processing Compilers are not (yet) good at redesigning code to match the instruction set Can we avoid having to maintain a separate set of function for different architectures? 25

SIMD optimisations Thor's solution: An abstraction layer for intrinsics Most compilers offer intrinsics to support SIMD instructions in the C code. Let the compiler do the register allocation! The most used instructions in different architectures such as x86 and ARM are identical So the abstraction layer is mostly an instruction name translator Support for 64 and 128 bit wide operands Does not always give optimal code, but close enough 26

SIMD optimisations An example: Add 16 pairs of bytes with a single instruction x86/sse2: _mm_add_epi8(a, b) ARM/NEON: vaddq_u8(a, b) Thor: v128_add_8(a, b) Thor supports many instructions, but not everything Support for x86 and ARM, and C implementations to ease porting to new architectures Kernels in both SIMD and plain C as fallback. Bitexact. 27

SIMD optimisations void transpose8x8(const int16_t *src, int sstride, int16_t *dst, int dstride) { v128 i0 = v128_load_aligned(src + sstride*0); v128 i1 = v128_load_aligned(src + sstride*1); v128 i2 = v128_load_aligned(src + sstride*2); v128 i3 = v128_load_aligned(src + sstride*3); v128 i4 = v128_load_aligned(src + sstride*4); v128 i5 = v128_load_aligned(src + sstride*5); v128 i6 = v128_load_aligned(src + sstride*6); v128 i7 = v128_load_aligned(src + sstride*7); } v128 t0 = v128_ziplo_16(i1, i0); v128 t1 = v128_ziplo_16(i3, i2); v128 t2 = v128_ziplo_16(i5, i4); v128 t3 = v128_ziplo_16(i7, i6); v128 t4 = v128_ziphi_16(i1, i0); v128 t5 = v128_ziphi_16(i3, i2); v128 t6 = v128_ziphi_16(i5, i4); v128 t7 = v128_ziphi_16(i7, i6); i0 = v128_ziplo_32(t1, t0); i1 = v128_ziplo_32(t3, t2); i2 = v128_ziplo_32(t5, t4); i3 = v128_ziplo_32(t7, t6); i4 = v128_ziphi_32(t1, t0); i5 = v128_ziphi_32(t3, t2); i6 = v128_ziphi_32(t5, t4); i7 = v128_ziphi_32(t7, t6); v128_store_aligned(dst + dstride*0, v128_ziplo_64(i1, i0)); v128_store_aligned(dst + dstride*1, v128_ziphi_64(i1, i0)); v128_store_aligned(dst + dstride*2, v128_ziplo_64(i5, i4)); v128_store_aligned(dst + dstride*3, v128_ziphi_64(i5, i4)); v128_store_aligned(dst + dstride*4, v128_ziplo_64(i3, i2)); v128_store_aligned(dst + dstride*5, v128_ziphi_64(i3, i2)); v128_store_aligned(dst + dstride*6, v128_ziplo_64(i7, i6)); v128_store_aligned(dst + dstride*7, v128_ziphi_64(i7, i6)); 00 01 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 00 08 16 24 32 40 48 56 01 09 17 25 33 41 49 57 02 10 18 26 34 42 50 58 03 11 19 27 35 43 51 59 04 12 20 28 36 44 52 60 05 13 21 29 37 45 53 61 06 14 22 30 38 46 54 62 07 15 23 31 39 47 55 63 28

Performance, high delay Anchor: HM13.0 (HEVC reference software) Random access without periodic I frames Thor: Same constraints as the anchor VP9: -p 1 --cpu-used=0 --end-usage=q cq-level=$q --auto-alt-ref=1 --disable-kf -y x265: -I -1 --no-wpp --tune psnr -p veryslow --qp $q Complexity: FourPeople at QP 32 on a single core Note: HM and Thor have fixed QP variation, x265 and VP9 adapt dynamically. 29

Performance, high delay Class Sequence Thor VP9 x265 Class B Kimono 24.5% 49.3% 20.3% ParkScene 23.2% 45.4% 26.5% Cactus 17.5% 34.5% 17.2% BasketballDrive 31.3% 46.1% 13.3% BQTerrace 35.4% 51.5% 19.7% Class E FourPeople 8.8% 13.8% 26.7% Johnny 16.2% 38.6% 28.4% KristenAndSara 7.3% 16.8% 23.0% Internal ChangeSeats 20.9% 29.1% 18.3% HeadAndShoulder 10.9% 6.0% 21.0% TelePresence 22.6% 45.1% 20.0% WhiteBoard 15.6% 22.0% 24.9% Average 19.5% 33.2% 21.6% 30

Frame rate Frame rate vs. bandwidth 100.00 10.00 1.00 VP9 X265 H.265 ref Thor-Nov 0.10 0.01 0.0 % 10.0 % 20.0 % 30.0 % 40.0 % 50.0 % 60.0 % 70.0 % 80.0 % 90.0 % 100.0 % Bandwidth

Performance, low delay Anchor: HM13.0 (HEVC reference software) Low-delay B configuration Thor: Same constraints as the anchor VP9: -p 1 --cpu-used=0 --end-usage=q --cq-level=$q autoalt-ref=0 --lag-in-frames=0 --disable-kf y x265: -I -1 --no-wpp --bframes 0 --tune psnr -p veryslow --qp $q -qpfile $q.txt Complexity: FourPeople at QP 32 on a single core Note: HM, Thor and x265 have fixed QP variation, VP9 adapts dynamically. 32

Performance, low delay Class Sequence Thor VP9 x265 Class B Kimono 16.1% 21.7% 14.1% ParkScene 19.5% 31.4% 16.4% Cactus 16.6% 26.6% 21.5% BasketballDrive 26.9% 32.9% 14.0% BQTerrace 31.8% 84.1% 44.9% Class E FourPeople 6.6% 35.5% 22.5% Johnny 12.0% 66.9% 30.8% KristenAndSara 4.7% 36.9% 20.3% Internal ChangeSeats 14.4% 20.5% 12.8% HeadAndShoulder 2.5% 59.8% 34.8% TelePresence 15.1% 25.3% 11.9% WhiteBoard 11.1% 43.8% 24.3% Average 14.8% 40.5% 22.4% 33

Frame rate 100.00 Frame rate vs. bandwidth 10.00 VP9 1.00 X265 H.265 ref Thor-July Thor-Nov 0.10 0.01 0.0 % 10.0 % 20.0 % 30.0 % 40.0 % 50.0 % 60.0 % 70.0 % 80.0 % 90.0 % 100.0 % Bandwidth

Source Code Available at: github.com/cisco/thor 35