Web Scraping Advice


#1

Has anyone found a way to parse html?

Someone suggested on stackoverflow that cheerio.js could be used, but I get “ERROR Error: Uncaught (in promise): ReferenceError: Can’t find variable: process” (using nativescript-nodify).
I have also tried parse5, jsdom and xray but all result in a node dependency error of some kind.

Any advice on how to scrape web content?


#2

Does it need to happen on the device?

You could get a backend up to which you send the url you wish to scrape, scraping magic is done on the backend using cheerio.js or any other parsing module, and then you send the desired result back as a response. I’d recommend this, as it is not platform-dependent.

Otherwise you’d have to integrate android/ios libraries in your project that will do the appropriate html/xml parsing.

Jericho and Jsoup - https://jsoup.org/ are 2 that you could look into to use on the android side.


#3

Had been nice if it was possible to parse direct on the device when you recieve the data (I have no clue how to getting into android and ios libraries).

I have to get into node then, hoped there was an easier way.

Thanks!


#4

Unless there was someone with needs similar to yours, and has wrapped libraries under a common public javascript for others to use, then it’s either a server, or doing the parsing yourself.

Feel free to provide the “easier way” for others.