Mydin Outlets Web Scraping – How To Get Hidden Data API Embedded in Google Map

This time, I would like to get Mydin Malaysia outlets that are located on top of Google Map. I tried using a scraper tool but it always gives me empty result.

So I have to find the hidden outlets API to retrieve it. I’m using Google Chrome to find the hidden data API.

Steps

1) Go to https://www.mydin.com.my/stores/store-locator

2) Hover to “Find a store near” panel and right click then click “Inspect”

mydin outlets location inspect
mydin outlets location inspect

3) You will see new window at bottom or right panel of your Chrome browser. Then click Network tab. The tab will be empty

mydin outlets inspect network empty
mydin outlets inspect network empty

4) Refresh your browser and you will see, it is populated with files and their types.

mydin outlets xhr
mydin outlets xhr

5) Sort by type and look for “xhr” type

mydin outlets api prettified
mydin outlets api prettified

Look for ProcessAjaxRequest

XHR is XMLHttpRequest (XHR) is an API in the form of an object whose methods transfer data between a web browser and a web server. It supports XML or JSON data format.

6) You can double click to see full view in the browser and see its full URL

mydin outlets api full view
mydin outlets api full view

7) Write a code that read the API and parse the JSON data

I’m using NodeJS and save it as CSV file.

import rp from 'request-promise';
const createCsvWriter = require('csv-writer').createObjectCsvWriter;

(async function() {
let records = await rp("https://www.mydin.com.my/base/ajax/ProcessAjaxRequest?action=getAllStores&_=1590477828561");
records = JSON.parse(records);
records = records.Response.Stores;

const csvWriter = createCsvWriter({
    path: '/path-to-save/mydin.csv',
    header: [
        {id: 'StoreExtId', title: 'ID'},
        {id: 'StoreName', title: 'Name'},
        {id: 'AddressLine1', title: 'Address 1'},
        {id: 'AddressLine2', title: 'Address 2'},
        {id: 'City', title: 'City'},
        {id: 'State', title: 'State'},
        {id: 'Latitude', title: 'Latitude'},
        {id: 'Longitude', title: 'Longitude'}
    ]
});

  await csvWriter.writeRecords(rows);
}());

8) Sample CSV Output

mydin outlets csv
mydin outlets csv

References:

XHR Explanation

NodeJS Reading and Writing to WordPress PHP via Rest API

I developed NodeJS program that can read and write to WordPress Rest API. I’m having problem of reading or writing properly the data in NodeJS and WordPress and vice versa.

NodeJS Writes JSON Objects to WordPress Rest API

Any JSON String[] or Object[], will be saved into serialized PHP string by WordPress.

Surprisingly, when NodeJS read the data save in serialize string, NodeJS can read it as JSON object without needs to call JSON.parse().

However JSON Object, will be saved as string by WordPress but still follows JSON string format.

As the JSON Object is saved as string, when NodeJS read the data via WordPress Rest API, NodeJS has to call JSON.parse() to convert it back into JSON Object.

Examples of JSON Object saved by WordPress Rest API

NodeJS JSON String[]: 
[ '019-3312 493' ]

Saved in Wordress PHP MySQL
as PHP Serialize string: a:1:{i:0;s:12:"019-3312 493";}

NodeJS JSON Object[]: 
[ { day: 'monday', open: '8:00 AM', close: '10:00 PM' },
  { day: 'tuesday', open: '8:00 AM', close: '10:00 PM' },
  { day: 'wednesday', open: '8:00 AM', close: '10:00 PM' },
  { day: 'thursday', open: '8:00 AM', close: '10:00 PM' },
  { day: 'friday', open: '8:00 AM', close: '10:00 PM' },
  { day: 'saturday', open: '8:00 AM', close: '10:00 PM' },
  { day: 'sunday', open: '8:00 AM', close: '10:00 PM' } ]

Saved as string that follows JSON Object format in WordPress PHP MySQL

 a:7:{i:0;a:3:{s:3:"day";s:6:"monday";s:4:"open";s:7:"8:00 AM";s:5:"close";s:8:"10:00 PM";}i:1;a:3:{s:3:"day";s:7:"tuesday";s:4:"open";s:7:"8:00 AM";s:5:"close";s:8:"10:00 PM";}i:2;a:3:{s:3:"day";s:9:"wednesday";s:4:"open";s:7:"8:00 AM";s:5:"close";s:8:"10:00 PM";}i:3;a:3:{s:3:"day";s:8:"thursday";s:4:"open";s:7:"8:00 AM";s:5:"close";s:8:"10:00 PM";}i:4;a:3:{s:3:"day";s:6:"friday";s:4:"open";s:7:"8:00 AM";s:5:"close";s:8:"10:00 PM";}i:5;a:3:{s:3:"day";s:8:"saturday";s:4:"open";s:7:"8:00 AM";s:5:"close";s:8:"10:00 PM";}i:6;a:3:{s:3:"day";s:6:"sunday";s:4:"open";s:7:"8:00 AM";s:5:"close";s:8:"10:00 PM";}}


NodeJS JSON Object:
  { open24h: true,
    hasbreakfast: true,
    hasdrivethru: false,
    delivery: true,
    selfcollect: true,
    haswifi: true }

Saved as string that follows JSON Object format in WordPress PHP MySQL
{"open24h":true,"hasbreakfast":true,"hasdrivethru":false,"delivery":true,"selfcollect":true,"haswifi":true}

How To Have Consistency?

By having JSON arrays save as PHP serialize string and JSON object as string in PHP, I need to have do extra checking or conversion by calling JSON.parse()

Another possible solution is to call JSON.stringify() to JSON arrays so it is saved as string at PHP. However, it is still extra step that we need to do and it doesn’t solve the problem stated above where there is no consistency.

So, in short there is no possible solution to have consistency, we still need to do extra step as mentioned above.

PHP Serialize String and JSON Object

I’m not really understand difference between PHP serialize string and JSON object.

I notice PHP when stored data inside database is in PHP serialize string format. Serialize string is only available in PHP, hence it is not interoperable with NodeJS.

Hence, I did simple comparison for better understanding in order to make my NodeJS can read and write PHP data and vice versa.

PHP Array Conversion to PHP Serialize String and JSON Object

//convert PHP array to serialize string and JSON object
$array = array( '1' => 'elem 1', '2'=> 'elem 2', '3'=>' elem 3');


$serialized = serialize($array);
print_r($serialized);
//expected output - serialize string: a:3:{i:1;s:6:"elem 1";i:2;s:6:"elem 2";i:3;s:7:" elem 3";}


$json = json_encode($array);
print_r($json);
//expected output - JSON object: {"1":"elem 1","2":"elem 2","3":" elem 3"}

//convert back to PHP array

$unserialized = unserialize($serialized);
print_r($unserialized);
//expected output - unserialized: Array ( [1] => elem 1 [2] => elem 2 [3] => elem 3 )

$decoded1 = json_decode($json);
print_r($decoded1);
//expected output - decoded into PHP object: stdClass Object ( [1] => elem 1 [2] => elem 2 [3] => elem 3 )


$decoded2 = json_decode($json, true);
print_r($decoded2);
//expected output - decoded into PHP array: Array ( [1] => elem 1 [2] => elem 2 [3] => elem 3 )

PHP Object Conversion to PHP Serialize String and JSON Object

$object = new stdClass();
$object->name = 'Here we go';
$object->message = 'Hello world';


$serialized = serialize($object);
print_r($serialized);
//expected output - serialize string: O:8:"stdClass":2:{s:4:"name";s:10:"Here we go";s:7:"message";s:11:"Hello world";}
 

$json = json_encode($object);
print_r($json);
//expected output - json object: {"name":"Here we go","message":"Hello world"}

$unserialized = unserialize($serialized);
print_r($unserialized);
//expected output - unserialized: stdClass Object ( [name] => Here we go [message] => Hello world )

$decoded = json_decode($json);
print_r($decoded);
//expected output - decoded JSON: stdClass Object ( [name] => Here we go [message] => Hello world )

$decoded2 = json_decode($json, true);
print_r($decoded2);
//expected output - decoded JSON: Array ( [name] => Here we go [message] => Hello world )

PHP Array of Object Conversion to PHP Serialize String and JSON Object

$object1 = new stdClass();
$object1->color = 'blue';
$object1->type = 'suv';

$object2 = new stdClass();
$object2->color = 'white';
$object2->type = 'mpv';

$arrObjects = array($object1, $object2);


$serialized = serialize($arrObjects);
print_r($serialized);
//expected output - serialize: a:2:{i:0;O:8:"stdClass":2:{s:5:"color";s:4:"blue";s:4:"type";s:3:"suv";}i:1;O:8:"stdClass":2:{s:5:"color";s:5:"white";s:4:"type";s:3:"mpv";}}


$json = json_encode($arrObjects);
print_r($json);
//expected output - JSON: [{"color":"blue","type":"suv"},{"color":"white","type":"mpv"}]
 

$unserialized = unserialize($serialized);
print_r($unserialized);
//expected output - unserialize: Array ( [0] => stdClass Object ( [color] => blue [type] => suv ) [1] => stdClass Object ( [color] => white [type] => mpv ) )


$decoded = json_decode($json);
print_r($decoded);
//expected output - decoded: Array ( [0] => stdClass Object ( [color] => blue [type] => suv ) [1] => stdClass Object ( [color] => white [type] => mpv ) )


$decoded2 = json_decode($json, true);
print_r($decoded2);
//expected output - decoded with true: Array ( [0] => Array ( [color] => blue [type] => suv ) [1] => Array ( [color] => white [type] => mpv ) )

Web Scraping – How To Get Hidden Data API that is Embedded in Google Map

I would like to get KFC Malaysia outlets that are located on top of Google Map. I tried using a scraper tool but it always gives me empty result.

So I have to find the hidden outlets API to retrieve it. I’m using Google Chrome to find the hidden data API.

Steps

1) Go to https://kfc.com.my/find-a-kfc/

2) Click on the outlets to view its details information.

kfc outlets location
kfc outlets location

3) Right click on the outlet’s information box and click Inspect.

kfc outlets inspect
kfc outlets inspect

 

4) You will see new window at bottom or right panel of your Chrome browser. Then click Network tab. The tab will be empty

kfc inspect network blank
kfc inspect network blank

5) Refresh your browser and you will see, it is populated with files and their types.

6) Sort by type and look for “fetch” type

identify data api by looking for fetch type
identify data api by looking for fetch type

7) Click on the file link and new window will appear. Click on “Response” tab.

Find until you see JSON format with outlet information. In KFC case, I found the file name is store?xxxxxxxx (xx denotes numbers)

kfc outlets api response
kfc outlets api response

8) You can also click “Preview” tab to see prettified JSON format.

kfc outlets api preview
kfc outlets api preview

9) After found the file that provides the outlets data, right click on the file and Copy -> Copy link address

get the outlets api by copying link address
get the outlets api by copying link address

10) Paste the link to a new browser tab and you can see the link

  • In KFC case it is, https://kfc.com.my/api/v2/store?1588173941864
  • You will see the response as below
kfc hidden outlets api response
kfc hidden outlets api response

11) Write a code that read the API and parse the JSON data

  • I’m using NodeJS and save it as CSV file.
import rp from 'request-promise';
const createCsvWriter = require('csv-writer').createObjectCsvWriter;

(async function() {
const records = await rp("https://kfc.com.my/api/v2/store?1588173941864");
const rows = JSON.parse(records);
const csvWriter = createCsvWriter({
    path: '/path-to-save/kfc.csv',
    header: [
        {id: 'id', title: 'ID'},
        {id: 'name', title: 'Name'},
        {id: 'address', title: 'Address'},
        {id: 'phone', title: 'Phone'},
        {id: 'open24h', title: 'Open 24 Hr'},
        {id: 'hasbreakfast', title: 'Has Breakfast'},
        {id: 'hasdrivethru', title: 'Has Drive Thru'},
        {id: 'delivery', title: 'Delivery'},
        {id: 'selfcollect', title: 'Self Collect'},
        {id: 'haswifi', title: 'Has Wifi'},
        {id: 'weekdayopen', title: 'Weekday Open'},
        {id: 'weekdayclose', title: 'Weekday Close'},
        {id: 'weekendopen', title: 'Weekend Open'},
        {id: 'weekendclose', title: 'Weekend Close'},
        {id: 'latitude', title: 'Latitude'},
        {id: 'longitude', title: 'Longitude'}
    ]
});

  await csvWriter.writeRecords(rows);
}());

11) Sample CSV Output

kfc sample csv output
kfc sample csv output
  • From the CSV, KFC Malaysia has 712 outlets as of 29/04/2020.

php.ini File Location at MacOS Catalina

Disable PHP Warning Message

I wanted to disable my warning message from my php installation. To do so

I need to configure my php.ini file. I searched on the internet but couldn’t find a direct answers where to change my php.ini on MacOS Catalina.

php warning appears
php warning appears

Location of php.ini File at MacOS Catalina

For MacOS Catalina the location of php.ini is located at /etc/

You can verify it by printing phpinfo() and look for “Loaded Configuration File (php.ini) path”

phpinfo location of php.ini
phpinfo location of php.ini

For my MacOS Catalina, the location of php.ini is located at /etc/. As you can see there is no php.ini is loaded (None).

Load Your Own php.ini File

To load your own php.ini, copy the default file into your new php.ini

sudo cp /etc/php.ini.default /etc/php.ini

Open the php.ini, in this case I’m using atom to open it.

sudo atom /etc/php.ini

Then change php error reporting into

error_reporting = E_ERROR

Save it then restart apache.

sudo apachectl restart

Then check the php.ini file is loaded correctly by printing the phpinfo. It will shows /etc/php.ini at “Loaded Configuration File” row.

phpinfo loaded php ini at macos catalina
phpinfo loaded php ini at macos catalina

or you can run php –ini to see the loaded configuration file.

macos php ini command
macos php ini command

When you refresh the website, the warning php message will disappear.

php warning message disappear
php warning message disappear

What is Python Numpy Array Dimension or Axis?

I’m beginner in Python & Numpy. Most tutorials I found seems for expert without really explaining the basic of it.

Even understanding what axis represents in Numpy array is difficult.

I have to read few tutorials and try it out myself before really understand it.

I will update it along with my growing knowledge.

1. Numpy Array Properties

1.1 Dimension

Important to know dimension because when to do concatenation, it will use axis or array dimension.

python array and axis - source oreilly
python array and axis – source oreilly

Row – in Numpy it is called axis 0

Columns – in Numpy it is called axis 1

Depth – in Numpy it is called axis 2

Python Example

import numpy as np

# Array with 1 dimension
A = np.array([1])
B = np.array([1,2])

print("A: ", A)
print("A dimensions: ", A.ndim)

print("B: ", B)
print("B dimensions: ", B.ndim)

# Array with 2 dimensions

C = np.array([[1,2], [3,4], [5,6]])

print("C: ", C)
print("C dimensions: ", C.ndim)

# Array with 3 dimensions

D = np.array([[[1,2], [3,4], [5,6]]])
print("D: ", D)
print("D dimensions: ", D.ndim)

Output

A:  [1]
A dimensions:  1
B:  [1 2]
B dimensions:  1
C:  [[1 2]
 [3 4]
 [5 6]]
C dimensions:  2
D:  [[[1 2]
  [3 4]
  [5 6]]]
D dimensions:  3

Snippet

References

https://www.datacamp.com/community/tutorials/python-numpy-tutorial
https://www.oreilly.com/library/view/elegant-scipy/9781491922927/ch01.html

WebDriverIO Version 5 vs Version 4 Differences

I used webdriverIO library in my programming. Recently I upgraded from version 4 to version 5. To my horror so many breaking changes and not much explanation on the internet. So I put some difference on it.

WebDriveIO – Version 4

  const options = {
    desiredCapabilities: {
        browserName: 'firefox'
    }
  };

  const browser = webdriverio.remote(options);
  await browser.init();

  const select = "a";
  const attribute = "href";

  let results = await browser.getAttribute(selector, attribute);
  results = await browser.getHTML(selector);

WebDriverIO – Version 5

const options = {
    capabilities: {
        browserName: 'firefox'
    }
  };

  const browser = await webdriverio.remote(options);


  const select = "a";
  const attribute = "href";

  let elements = await browser.$(selector);
  let results = await elements.getAttribute(attribute);

  elements = await browser.$(selector);
  results = await elements.getHTML(selector);

The Difference

  1.  In version 5, options desiredCapabilities change to capabilities
  2.  In version 5, need to put await in remote and no need to declare init() anymore as it is deprecated.
  3. In version 5, use $(selector) to select elements before getting elements content or attribute.

Reference

Breaking Change WebDriverIO from Version 4 to Version 5

WebDriver version 5 Release Announcement

Node Red Set POST Parameters for HTTP Request

It is easy to set POST parameters for HTTP request.

The Flow Diagram

node red set post parameters for http request flow diagram
node red set post parameters for http request flow diagram

Steps To Set Get Parameters

1) Inject Node

– Payload – can set to anything. In my case I use timestamp

node red inject node
node red inject node

2) Function Node

– Set POST parameters value here
– msg.headers must set the content type.
– All the POST parameters are set inside msg.payload and the parameter names must correspond to RESTFUL API that you query.

msg.headers={ 
    'Content-Type': 'application/x-www-form-urlencoded'
};
msg.payload = {};
msg.payload={ 
    'ID': 1159
};
node red - function node setting post parameters for http request
node red – function node setting post parameters for http request

3) HTTP Request Node

– Choose method: POST
– Set the correspondent RESTFUL endpoint.

node red - http request node properties
node red – http request node properties

4) Debug Node

– Use to display the output

node red http request post output
node red http request post output

Notes:

Node Red version used – v0.20.7

Node Red Pass GET Parameters for HTTP Request

It is easy to set GET parameters for HTTP request. You need to use the mustache brackets inside the URL itself.

The Flow Diagram

node red http request flow diagram
node red http request flow diagram

Steps To Set Get Parameters

1) Inject Node

– Payload – can set to anything. In my case I use timestamp

node red inject node
node red inject node

2) Function Node

– Set parameter GET value here
– In my case I use parameter name “path” and “limit”, set as msg.path and msg.limit
– You can use any name for your parameter name but it be the same as in HTTP Request Node

node red function node set get http request parameters
node red function node set get http request parameters

3) HTTP Request Node

– Choose method: GET
– In the URL, use mustache brackets to declare your variable {{{path}}}
– The parameter name must be same as in the Function Node

node red http request node
node red http request node

4) Debug Node

– Use to display the output

node red debug output with GET parameter value
node red debug output with GET parameter value

Notes:

Node Red version used – v0.20.7

Node Red Set Parameters HTTP Request GET – Flow File Example

Detecting Similar Images

Use Case

I want to know programmatically whether listings posted at Mudah.My are similar or not even though it is posted by different persons and different date.

To do this, I think the best way is to detect whether the photos are similar or not.

Why To Know Same Property Is Advertised Over Period of Time?

I would like to know whether the property
1) price change over time, signalling time to purchase it
2) possibility owner becomes desperate to let it go if advertised for quite some time. So I can get better price

Same Property But Advertised by Different Agents and Different Date

All listings refer to same property by evaluating using naked eyes.
So now, I want to detect programmatically that all the listings are referring to same property by detecting those photos are similar.

Listing AgentPosted Date
Listing 1
URL
Price: RM260,000
Dilla05/07/2019
Listing 2
URL
Price: RM260,000
Aiman05/07/2019
Listing 3
URL
Price:RM260,000
Norhayati05/07/2019
Listing 4
URL
Price:RM260,000
Fahana05/07/2019
Listing 5
URL
Price:RM260,000
Fazri22/07/2019
Listing 6:
URL
Price:RM270,000
Marina12/07/2019

Kitchen Photos

Seroja Apartment Listing 1 - Kitchen
Seroja Apartment Listing 1 – Kitchen
Seroja Apartment Listing 2 - Kitchen
Seroja Apartment Listing 2 – Kitchen
Seroja Apartment Listing 3 - Kitchen
Seroja Apartment Listing 3 – Kitchen
Seroja Apartment Listing 4 - Kitchen
Seroja Apartment Listing 4 – Kitchen
Seroja Apartment Listing 6 - Kitchen
Seroja Apartment Listing 6 – Kitchen

Bedroom Images

Seroja Apartment Listing 2 - Bedroom
Seroja Apartment Listing 2 – Bedroom
Seroja Apartment Listing 5 - Bedroom
Seroja Apartment Listing 5 – Bedroom

Technique Used

    1. Step 1: Fingerprinting the Photos

Fingerprinting the photos is using image hashing technique. In this case, DHash will be used.

    1. Step 2: Compare the Photos

After fingerprinting, image hash will be compared among listings. If the image hash are same or Levenshtein distance is less or equal to 2, then we can consider as the listings are referring to same property.

Results

PhotosSizeNoticeable FeaturesImage Hash
(DHash)
Listing 1 - Kitchen18KB
480x480
Kitchen - listing 1 ,listing 3 and listing 6 are similarcc6c7a727e7c3e77
Listing 2 - Kitchen19KB
640x480
Has door and wall fan3f37333333333339
Listing 3 - Kitchen18KB
480x480
cc6c7a727e7c3e77
Listing 4 - Kitchen17KB
480x480
Kitchen - Listing 4 color is lighter compared to listing 1, 3 & 6cc6e7a727e7c7e77
Listing 6 - Kitchen18KB
480x480
cc6c7a727e7c3e77
Listing 2 - Bedroom22KB
640x480
similar with bedroom listing 5 only size is different.e1b90d0c8ccc84c1
Listing 5 - Bedroom10KB
320x240
e1b52d0c8ccc84c1

Kitchen Photos

If we take Listing 1 as a base, we can say easily that it has same image hash with Listing 3 & Listing 6.

Listing 1 has Levenshtein distance of 2 with Listing 4.
Listing 1 has Levenshtein distance of 15 with Listing 5.

Bedroom Photos

Listing 2 and Listing 5 has Levenshtein distance of 2.

Conclusion

We can conclude photos are similar if their image hash is the same or their Levenshtein distance is less or equal to 2.

False Positive

What happens if different properties use same photos such as signage or building block? The algorithm will detect it as same property even though it is not.

To avoid this, we should establish a database of signage or building block to remove this false positive.

Photos Example

Apartment Signage
Apartment Signage
Apartment Block
Apartment Block

References:

Fingerprinting Images For Near Duplicate Detection
Python Code & Images Used
DHash Algorithm
Listing 1
Listing 2
Listing 3
Listing 4
Listing 5
Listing 6