Use Case
I want to know programmatically whether listings posted at Mudah.My are similar or not even though it is posted by different persons and different date.
To do this, I think the best way is to detect whether the photos are similar or not.
Why To Know Same Property Is Advertised Over Period of Time?
I would like to know whether the property
1) price change over time, signalling time to purchase it
2) possibility owner becomes desperate to let it go if advertised for quite some time. So I can get better price
Same Property But Advertised by Different Agents and Different Date
All listings refer to same property by evaluating using naked eyes.
So now, I want to detect programmatically that all the listings are referring to same property by detecting those photos are similar.
Listing | Agent | Posted Date |
---|---|---|
Listing 1 URL Price: RM260,000 | Dilla | 05/07/2019 |
Listing 2 URL Price: RM260,000 | Aiman | 05/07/2019 |
Listing 3 URL Price:RM260,000 | Norhayati | 05/07/2019 |
Listing 4 URL Price:RM260,000 | Fahana | 05/07/2019 |
Listing 5 URL Price:RM260,000 | Fazri | 22/07/2019 |
Listing 6: URL Price:RM270,000 | Marina | 12/07/2019 |
Kitchen Photos





Bedroom Images


Technique Used
-
- Step 1: Fingerprinting the Photos
Fingerprinting the photos is using image hashing technique. In this case, DHash will be used.
-
- Step 2: Compare the Photos
After fingerprinting, image hash will be compared among listings. If the image hash are same or Levenshtein distance is less or equal to 2, then we can consider as the listings are referring to same property.
Results
Photos | Size | Noticeable Features | Image Hash (DHash) |
---|---|---|---|
Listing 1 - Kitchen | 18KB 480x480 | Kitchen - listing 1 ,listing 3 and listing 6 are similar | cc6c7a727e7c3e77 |
Listing 2 - Kitchen | 19KB 640x480 | Has door and wall fan | 3f37333333333339 |
Listing 3 - Kitchen | 18KB 480x480 | cc6c7a727e7c3e77 | |
Listing 4 - Kitchen | 17KB 480x480 | Kitchen - Listing 4 color is lighter compared to listing 1, 3 & 6 | cc6e7a727e7c7e77 |
Listing 6 - Kitchen | 18KB 480x480 | cc6c7a727e7c3e77 | |
Listing 2 - Bedroom | 22KB 640x480 | similar with bedroom listing 5 only size is different. | e1b90d0c8ccc84c1 |
Listing 5 - Bedroom | 10KB 320x240 | e1b52d0c8ccc84c1 |
Kitchen Photos
If we take Listing 1 as a base, we can say easily that it has same image hash with Listing 3 & Listing 6.
Listing 1 has Levenshtein distance of 2 with Listing 4.
Listing 1 has Levenshtein distance of 15 with Listing 5.
Bedroom Photos
Listing 2 and Listing 5 has Levenshtein distance of 2.
Conclusion
We can conclude photos are similar if their image hash is the same or their Levenshtein distance is less or equal to 2.
False Positive
What happens if different properties use same photos such as signage or building block? The algorithm will detect it as same property even though it is not.
To avoid this, we should establish a database of signage or building block to remove this false positive.
Photos Example


References:
Fingerprinting Images For Near Duplicate Detection
Python Code & Images Used
DHash Algorithm
Listing 1
Listing 2
Listing 3
Listing 4
Listing 5
Listing 6