I need to scrape an internal website and make it into JSON. they don't offer API, the data is not an endpoint, it's in the HTML. what tool should I use? ![]()
Timeline
Post
Remote status
Context
2@kaia awk
@kaia wait, you mean for the scraping or for the conversion?
Replies
2@u0421793 @kaia depends how well-formed the HTML is and what amount of conversion is needed. If the HTML is NOT well formed (as it usually isn't in these cases), XSLT cannot process it, but there are libraries for scripting languages that can do a pretty good job at selecting and extracting data (beautiful soup for example)