What is HTML?
Overview
HTML (HyperText Markup Language) is the foundation of web development. Understanding HTML is crucial for any webserver developer because it's the language that browsers use to display web pages.
What HTML Stands For
HTML stands for HyperText Markup Language:
- HyperText: Text that contains links to other documents or parts of the same document
- Markup: A system of annotations that describe the structure and presentation of text
- Language: A standardized way of communicating with web browsers
Key Concepts for Webserver Developers
1. Client-Side Accessibility
When a client receives an HTML page, you need to keep in mind that everything is accessible by them even if you mark content as "hidden" on the page. Nothing is secret on an HTML page.
How does this affect webpage development?
Sensitive information must never be directly in HTML pages
I'll say it again:
Sensitive information must never be directly in HTML pages
- No passwords (encrypted or otherwise)
- No access tokens
- No user id information
- As little information about the server as possible
Once the HTML page leaves the webserver, consider everything about that HTML page completely known by the client as well as potentially modified. You can't build any safeguards into an HTML page since those safeguards are also completely accessible by the client.
2. The Request-Response Cycle
Now that we understand the security implications, let's examine how HTML actually gets from the server to the client. When a user visits a webpage, here's what happens:
- Client Request: Client browser sends HTTP request to webserver
- Server Processing: Webserver processes the request
- Generation: Server generates or retrieves HTML content
- Response: Server sends HTML back to client via HTTP response
- Client Rendering: Browser receives HTML and renders the webpage
HTTP Response Structure:
- Headers: Metadata (Content-Type, Content-Length, etc.)
- Body: The actual HTML content
We will talk more about these two parts in the next subchapter
3. Static vs Dynamic HTML Delivery
Understanding the request-response cycle leads us to consider the two main approaches for delivering HTML content:
Static HTML:
- Pre-written HTML files stored on server
- Server simply reads file and sends it to client
- Same content for every request
This is how your basic server is set up on the umainecos static site. There's nothing wrong with this method if you don't have any content that involves sensitive information. In fact, this is the better option if you don't have anything special about your website. No need to stress the server with additional tasks when it's not necessary. It's very hard for this setup to go wrong and cause a security breach if the HTML pages have no sensitive information in them.
Dynamic HTML:
- HTML generated by server-side scripts (PHP, Python, Node.js)
- Content can change based on user, time, database, etc.
- Server processes request, generates HTML, then sends to client
This is more server intensive, takes more time to set up, and can have additional security problems if you leave some sort of vulnerability that allows the client to request something that they shouldn't have access to on your server. However, in exchange, your pages can be much more dynamic and you can start handling information that you want to keep secret from the user.
4. MIME Types and Content Delivery
Beyond the content itself, webservers must also communicate how to interpret that content. When serving HTML, the webserver includes specific headers:
Content-Type: text/html; charset=UTF-8
Content-Length: 1234
This tells the browser:
- How to interpret the content (HTML format)
- What character encoding to use
- How much data to expect
You don't need to do too much here as the programmer. This sort of information is going to be automatically included when you set the filetype of the data sent to the client. If this were to go wrong, the client's browser might get confused about how to interpret your response.
Webserver Developer Considerations
With the fundamentals of HTML delivery understood, let's examine the practical considerations for webserver developers:
1. File Organization and Serving
- HTML files typically have
.htmlor.htmextensions - Main page is usually named
index.html(default document) - Webserver serves files from designated directories (like
public_html). This is set up through the configuration files on whichever server software you're using. - File permissions must allow webserver to read the files.
Incidentally, the file permission setup can lead to some security vulnerabilities on static servers if you have files that you don't want the client to access inside of a folder that's being watched by the webserver.
You might not have a link to "/resources/passwords.txt", but that doesn't mean that it's inaccessible. Make sure any static resource folder only has files that you're willing to have the client access.
2. Additional Security Implications
- HTML is processed by the client, not the server
- Server-side validation is essential for security
- Client-side validation (JavaScript) can be bypassed
- All HTML content is visible to end users
3. Performance Considerations
- Larger HTML files take longer to transfer
- Multiple HTTP requests for CSS, images, and JavaScript
- Compression can reduce transfer time
- Caching headers can improve performance
4. Error Handling
- 404 errors when HTML files don't exist
- 403 errors when files can't be accessed
- 500 errors when server-side HTML generation fails
- Custom error pages can be configured
While it's not as important as securing your website or making sure the content is being delivered, it's important to have custom error pages since a default error page looks unprofessional
Next Steps
Now that you understand HTML basics, learn about some HTML Tags and how to use them effectively.
HTML is the foundation of web development. Understanding it is nessessary to have a solid base for building dynamic web applications.