What is HTML?

Overview

HTML (HyperText Markup Language) is the foundation of web development. Understanding HTML is crucial for any webserver developer because it's the language that browsers use to display web pages.

What HTML Stands For

HTML stands for HyperText Markup Language:

HyperText: Text that contains links to other documents or parts of the same document
Markup: A system of annotations that describe the structure and presentation of text
Language: A standardized way of communicating with web browsers

Key Concepts for Webserver Developers

1. Client-Side Accessibility

When a client receives an HTML page, you need to keep in mind that everything is accessible by them even if you mark content as "hidden" on the page. Nothing is secret on an HTML page.

How does this affect webpage development?

Sensitive information must never be directly in HTML pages

I'll say it again:

Sensitive information must never be directly in HTML pages

No passwords (encrypted or otherwise)
No access tokens
No user id information
As little information about the server as possible

Once the HTML page leaves the webserver, consider everything about that HTML page completely known by the client as well as potentially modified. You can't build any safeguards into an HTML page since those safeguards are also completely accessible by the client.

2. The Request-Response Cycle

Now that we understand the security implications, let's examine how HTML actually gets from the server to the client. When a user visits a webpage, here's what happens:

Client Request: Client browser sends HTTP request to webserver
Server Processing: Webserver processes the request
Generation: Server generates or retrieves HTML content
Response: Server sends HTML back to client via HTTP response
Client Rendering: Browser receives HTML and renders the webpage

HTTP Response Structure:

Headers: Metadata (Content-Type, Content-Length, etc.)
Body: The actual HTML content

We will talk more about these two parts in the next subchapter

3. Static vs Dynamic HTML Delivery

Understanding the request-response cycle leads us to consider the two main approaches for delivering HTML content:

Static HTML:

Pre-written HTML files stored on server
Server simply reads file and sends it to client
Same content for every request

This is how your basic server is set up on the umainecos static site. There's nothing wrong with this method if you don't have any content that involves sensitive information. In fact, this is the better option if you don't have anything special about your website. No need to stress the server with additional tasks when it's not necessary. It's very hard for this setup to go wrong and cause a security breach if the HTML pages have no sensitive information in them.

Dynamic HTML:

HTML generated by server-side scripts (PHP, Python, Node.js)
Content can change based on user, time, database, etc.
Server processes request, generates HTML, then sends to client

This is more server intensive, takes more time to set up, and can have additional security problems if you leave some sort of vulnerability that allows the client to request something that they shouldn't have access to on your server. However, in exchange, your pages can be much more dynamic and you can start handling information that you want to keep secret from the user.

4. MIME Types and Content Delivery

Beyond the content itself, webservers must also communicate how to interpret that content. When serving HTML, the webserver includes specific headers:

Content-Type: text/html; charset=UTF-8
Content-Length: 1234

This tells the browser:

How to interpret the content (HTML format)
What character encoding to use
How much data to expect

You don't need to do too much here as the programmer. This sort of information is going to be automatically included when you set the filetype of the data sent to the client. If this were to go wrong, the client's browser might get confused about how to interpret your response.

Webserver Developer Considerations

With the fundamentals of HTML delivery understood, let's examine the practical considerations for webserver developers:

1. File Organization and Serving

HTML files typically have .html or .htm extensions
Main page is usually named index.html (default document)
Webserver serves files from designated directories (like public_html). This is set up through the configuration files on whichever server software you're using.
File permissions must allow webserver to read the files.

Incidentally, the file permission setup can lead to some security vulnerabilities on static servers if you have files that you don't want the client to access inside of a folder that's being watched by the webserver.

You might not have a link to "/resources/passwords.txt", but that doesn't mean that it's inaccessible. Make sure any static resource folder only has files that you're willing to have the client access.

2. Additional Security Implications

HTML is processed by the client, not the server
Server-side validation is essential for security
Client-side validation (JavaScript) can be bypassed
All HTML content is visible to end users

3. Performance Considerations

Larger HTML files take longer to transfer
Multiple HTTP requests for CSS, images, and JavaScript
Compression can reduce transfer time
Caching headers can improve performance

4. Error Handling

404 errors when HTML files don't exist
403 errors when files can't be accessed
500 errors when server-side HTML generation fails
Custom error pages can be configured

While it's not as important as securing your website or making sure the content is being delivered, it's important to have custom error pages since a default error page looks unprofessional

Next Steps

Now that you understand HTML basics, learn about some HTML Tags and how to use them effectively.

HTML is the foundation of web development. Understanding it is nessessary to have a solid base for building dynamic web applications.