Quantcast
Channel: Text processing
Browsing latest articles
Browse All 10 View Live

PHP tip: How to get a web page using the fopen wrappers

PHP’s fopen wrappers enable the standard file functions to read web pages from a web server. A few additional calls are needed to set parameters for a web server request and to get the server’s HTTP...

View Article



PHP tip: How to get a web page using CURL

The first step when building a PHP search engine, link checker, or keyword extractor is to get the web page from the web server. There are several ways to do this. From PHP 4 onwards, the most flexible...

View Article

PHP tip: How to get a web page content type

A web page’s content type tells you the page's MIME type (such as “text/html” or “image/png”) and the character set used by page text. You'll need the character set to interpret the page's characters...

View Article

PHP tip: How to extract URLs from a web page

URL extraction is at the core of link checkers, search engine spiders, and a variety of web page analysis tools. While <a> and <img> elements are primary sources of URLs, there are more...

View Article

PHP tip: How to extract URLs from a CSS file

Though HTML is usually the focus for extracting URLs for a link checker or analysis tool, CSS files also include URLs. The CSS @import rule uses a URL to include another CSS file, and many style...

View Article


PHP tip: How to extract keywords from a web page

Web page keywords characterize the page's topic for a search engine. Extracting keywords requires that you recognize the page's character encoding, strip away HTML tags, scripts, and styles, decode...

View Article

PHP tip: How to decode HTML entities on a web page

HTML entities encode special characters and symbols, such as &euro; for €, or &copy; for ©. When building a PHP search engine or web page analysis tool, HTML entities within a page must be...

View Article

PHP tip: How to convert a relative URL to an absolute URL

An absolute URL is complete and ready to use to download a web file. But web pages often include incomplete relative URLs with missing parts, such as an "http" or host name, or the first part of a...

View Article


Java tip: How to parse integers quickly

Java has several ways to parse integers from strings. Performance differences between these methods can be significant when parsing a large number of integers. Doing your own integer parsing can...

View Article


Java tip: How to get a web page

The starting point for building a link checker, web spider, or web page analyzer is, of course, to get the web page from the web server. Java's java.net package includes classes to manage URLs and to...

View Article
Browsing latest articles
Browse All 10 View Live




Latest Images