Discussion topic you may see within the computer forensics (CF) industry is over what hashing algorithm is adequate and/or appropriate for CF purposes. This discussion has been going on for years as different algorithms have become the standard, however, this became quite heated a few years ago when researchers generated an MD5 collision, where they created 2 files that were not identical that generated the same MD5 hash value, created rogue CA certificates, and other collision issues.

See: http://www.win.tue.nl/hashclash/rogue-ca/ and https://ad-pdf.s3.amazonaws.com/papers/wp.MD5_Collisions.en_us.pdf, among many others.

You will find many other similar discussions/debates online about this topic. More recently, NIST released the new SHA-3 Cryptographic Hash Standard. https://www.nist.gov/news-events/news/2015/08/nist…

What are your thoughts on algorithms for CF? Do you think one is more appropriate than others for CF purpose and why? Before you answer, keep in mind a few things: The longer the algorithm/hash value, the longer it takes your CF tools to generate the hash values…and in the field, time is a critical consideration. Does the fact that there are known collisions offset the fact that there is a 1 in 2^128 (3.402 x 10^38 or 340 billion billion billion billion) possible chances of a MD5 collision, and how does that compare to DNA validation statistics? Keep in mind that no matter what algorithm you choose, there is always a finite number of possible hash values but an infinite number of possible inputs…so there will ALWAYS be the possibility of collision. Does it matter that there are already millions of MD5 hash values in databases of known files, or should we regenerate them using SHA-1 or SHA-256 or other algorithm?

