Validate an HTML Document
In the previous topic, you took steps to ensure that users would be able to find the sites that you develop once they are published. This is an important step to ensuring that the largest number of people will be able to view your sites. Another important step is to validate your HTML documents so that you can feel confident that you do not have any errors that may prevent the page from displaying properly in a browser.
Although your site may have worked well when it was tested on your computer and your client’s computer, it may be a whole different story when the site becomes available to millions of users worldwide. A Web page with validation errors has the potential of causing user frustration, as well as major inconsistencies in its appearance. Although it is good practice to rigidly test the functionality of all of your sites, validating your Web pages is something that you can do that is simple and will help decrease the chances that users will encounter problems.
Document Type Definitions
The Document Type Definition, or DTD, defines a document’s type and structure, and determines which tag names are allowable and where they’re allowed to be used within the structural context of a document. HTML itself is not only a markup language, it’s a DTD of the Standard Generalized Markup Language (SGML).
Once you define a document as HTML, you can then declare which version of HTML the document conforms to. This is done using the DOCTYPE DOCTYPE tag. This tag is required in a valid HTML 4.01 document, and it declares which version of HTML is used in a document. Most HTML authoring tools, like HomeSite or Dreamweaver, automatically include a DOCTYPE declaration at the beginning of a new document. You can also configure them to use whichever DTD you want to use. This is obviously a nice feature of these authoring programs, since typing a DOCTYPE declaration, or even copying and pasting, can be tedious. There are three DTDs that can be used with HTML documents: strict, transitional, or frameset.
The transitional DTD (or “loose” DTD) is probably the most common DOCTYPE declaration in use on the Web at the time of this writing. For pages directed to the general public, this is perhaps the wisest choice, at least for now. It allows for deprecated tags and style-related attributes to accommodate users with older browsers. This DOCTYPE declaration allows for any deprecated elements and attributes, like the <font> tag and bgcolor attribute. It’s called transitional to account for the difficult process of moving to an entirely different set of coding standards, while preventing “legacy” code from breaking down in the latest browsers.
To declare a document as HTML 4.01-transitional, you would use the following DOCTYPE declaration:
<!DOCTYPE HTML PUBLIC “-//W3C//DTD HTML 4.01 Transitional//EN”
Note that the word “transitional” is added to the DOCTYPE declaration, unlike the strict declaration, which includes the statement implicitly.
The strict DTD declares that the document code is in strict compliance with the rules of HTML 4.01. This document type emphasizes the separation of content from formatting instructions. In other words, CSS controls all visual formatting in an external or embedded style sheet. The strict DTD does not include any element or attribute that is deprecated in HTML 4.01. It also excludes any frameset-related markup. Of course, only truly HTML 4.01-
compliant browsers will properly render strict HTML 4 documents. To declare a document as HTML 4.01-strict, you include the following DOCTYPE declaration as the first line of code:
<!DOCTYPE HTML PUBLIC “-//W3C//DTD HTML 4.01//EN”
Any document that uses frames requires the frameset DTD. This declaration includes all the rules of the transitional DTD plus frameset markup. So, for framed documents, use the following DOCTYPE declaration:
<!DOCTYPE HTML PUBLIC “-//W3C//DTD HTML 4.01 Frameset//EN”
Web Page Validation
In terms of HTML, validation is the process of checking code to verify that it conforms to the rules of the markup language. These rules are defined by the DTD that you choose. If you view a completed page in a browser and it looks exactly as you intended it to, that doesn’t mean your code is valid. Many browsers, including Internet Explorer, “forgive” many coding mistakes and interpret them as if the error or errors don’t exist. This is arguably a disservice to the developer, as it does not encourage proper coding habits, and more importantly, will not produce consistent results in other browsers and platforms. Going forward, browsers will not “forgive” coding mistakes when the document has a strict DOCTYPE declaration. Even now, Internet Explorer 5 (Mac) and 6 (PC) will inconsistently display Web pages if invalid code is included in a strictly defined document. This is not a fault of the browsers; this is actually their strength. By enforcing the importance of standardized and valid code, it makes interoperability with Web applications, browsers, and other types of devices much easier.
The DOCTYPE declaration is used to establish validation rules. HTML validation can be done online or using software. When you validate your document’s markup, most validation programs or services require a DOCTYPE declaration so that it knows what elements, attributes, and element nesting to allow or forbid. For example, if you declare a document as HTML 4.01-strict and it has <font> tags and other style-related tags and attributes, the validator will flag that code as improper. If you change the DOCTYPE declaration to HTML 4.01-transitional, that same code will pass without a hitch.
So, should you write code that conforms strictly to the HTML 4.01 standard, or should you write code that accounts for browser inconsistencies and their proprietary workarounds? Naturally, that dilemma is yours. Depending on the project and your audience, there will likely be good arguments to both sides.
Validate HTML Code
To validate a Web page:
1. Determine which DOCTYPE declaration is appropriate for your Web pages and, at the top of each page you wish to validate, add that declaration.
<!DOCTYPE HTML PUBLIC “-//W3C//DTD HTML 4.01 Transitional//EN”
If you don’t want to add the DOCTYPE to the actual HTML code, you can typically select a DOCTYPE from a menu within the validator.
2. Using your browser, navigate to an online validator such as the W3C’s validator at http://validator.w3.org/.
3. To select the file (or files) to navigate:
• Upload and publish the files to a Web server and point the validator to the address that the file is located.
• Point the validator to the location of the files on your hard drive, removable disk, or network.
4. Validate the selected document or directory.
5. Correct any validation errors or warnings, save the file, and test by re-validating.
DOCTYPE Declaration Syntax
It’s important to note that the standards-compliant browsers can be very picky about the DOCTYPE declaration. It’s vital that your declarations are error-free. Sometimes, even the slightest typos can impact the way the page is displayed. DOCTYPE declarations are tedious to remember precisely, and equally difficult to type precisely. That’s one reason that most HTML editors will insert your chosen declaration for you. If you don’t use an HTML editor, and use a simple text editor instead, try keeping copies of different DOCTYPE declarations in text files that you can simply cut and paste into your documents.
All DOCTYPE declarations begin with the traditional opening tag delimiter (<), followed by an exclamation point, just like an HTML comment. The reason for some of the other syntactical features aren’t exactly important knowledge, but for the curious, the word “public” merely indicates that the content of the quote that follows is a formal public identifier. The hyphen (or minus symbol) that follows the opening quotation mark signifies that the W3C is not registered with the International Standards Organization (ISO). The W3C is listed because it created and maintains the DTD and HTML specifications. The EN declares the language—English. Because this statement refers to the language of the markup rather than the document’s content, HTML is always “tagged” as EN. The document itself may contain any language. The URL that points to the location of the actual DTD on the World Wide Web Consortium (W3C) Web site is optional.
- The Next War: XHTML 2.0 vs HTML 5 (xemion.com)
- So… What’s new in HTML5 (codecutter.wordpress.com)
- Multimedia on the web and using HTML5 sensibly (hacks.mozilla.org)
- W3C Markup Validation Service adds experimental HTML5 support (blogs.sitepoint.com)
- [INFOGRAPHIC] Internet History: HTML Code Evolution 1.0 to 5.0 (rackspace.com)
- Apple.com goes from HTML 4.01 to HTML5 standards (bubblecube.wordpress.com)
- Hammurabi’s Code and XHTML Rules (clickfire.com)
- Guide to HTML: Your First Webpage (linearfix.wordpress.com)
- object tag: embed content in valid strict HTML (joliclic.free.fr)