Abstracted Article Archive
This page contains an archive of articles that have been abstracted
on this site. The articles are ordered by date posted, with the latest
posted articles appearing first. Alternatively, you can jump to entries
for a specific week by clicking the links below:
Entries for Week
6: April 13, 2009 through April 19, 2009
Password Attack Discussion & Benchmarks
Alan Amesbury16
of the University of Minnesota's Office of Information Technology
provides an excellent write-up regarding passwords and how both the
number of possible characters, the length of the password, and the
hashing algorithm can really effect how long it takes to crack a
password:
- The easiest thing to do in order to make a password encoded
with any algorithm harder to crack is increase the number of
possible characters used in the password. To illustrate this,
Amesbury assumes a standard length of 7 characters, but instead
makes the password case sensitive, meaning that one password has
26 possible characters while the other has 52. The resulting
difference is about 4 orders of magnitude more combinations!
- The best password policy involves all letters
(case-sensitive), all digits, all symbols (i.e. shift+digits),
and a space. This provides 69 total characters. A password with
7 characters and 69 possible characters provides over 7 trillion
combinations.
- Increasing the size of the password by just one character
(all other things constant from above) increases the
combinations to over 513 trillion.
- A good hashing algorithm is purposefully CPU intensive, such
that hashing a password 1 time is no big deal but trying to hash
a bunch (as in a brute force attack) is too slow to be
practical.
- Using a 3.2Ghz Xenon, Amesbury found that it would take 95
years to brute force through all possible passwords hashed with
Microsoft's NTLM given a variable character length between 1-8
characters with 69 possible characters. FreeBSDs MD5 hash given
the same parameters would take more than 11,000 years!
- Given the data he presents, Amesbury suggests that passwords
be variable length of at least 6 characters, contain mixed case
letters, and at least a symbol, and no part of the password
should be anything found in a dictionary. Dictionary attacks on
even FreeBSDs hash can succeed in under 15 days.
Biggest Mistakes in Web Design 1995-2015
Everyone's favorite cranky web design consultant Vincent Flanders17
has compiled a list of the most common things that bad webmasters
do. His best suggestions are:
- Your site is only important to potential surfers if it does
something useful for them.
- If visitors to your site can't figure out the purpose of
your site within four seconds, the site is not doing its job.
- Design should not get in the way of the purpose of the site.
Even if it is pretty, if a design keeps visitors from what they
came to the site for, scrap it.
- Don't put too much stuff on one page, and certainly don't
put too much different types of stuff on a page.
- Don't think your visitors are going to care too much about
web standards. While adhering to standards is good, your
visitors only stick around if the site is useful to them.
- Be careful with the use of images, Flash, and Javascript.
Only use these elements if they add actual benefits to users.
- There is nothing wrong with making your site look and behave
like other successful sites. Being totally different with
navigation or design such that your site looks nothing like any
other site will probably confuse many of your visitors.
Practical Tips for Government Web Sites (And Everyone Else!) To
Improve Their Findability in Search
Vanessa Fox18
at O'Reilly Radar believes that government websites will only be
useful if their contents can be easily found with search engines.
She says these recommendations are vital for government sites, but
also important for non-government sites too:
- Sites should create well-formed XML sitemaps. If the
site is structured well enough so that a sitemap can be created
to explain its contents, the its contents are most likely
logically organized.
- While the sitemaps do not have total control over what
search engines do, they help steer the search engine toward what
is most important.
- Make sure any public content is accessible without requiring
registration or specific user input. Search engines (and
oftentimes users) will abandon their quest for information if
some sort of input gets in their way.
- If file names and locations change, make sure to serve up a
301 Resource Permanently Moved redirection.
- Make sure to include ALT text with images.
- Make page titles unique and ensure they are titled such that
the title actually describes the page.
- Make sure pages are functional and informational even if
Javascript and images are turned off. This helps search engines
(as well as users) see important information even if these
features are turned off or unavailable.
Entries for Week 5: April 6, 2009 through April
12, 2009
Memo: Avoid Nested Queries in MySQL at All Costs
Scott Selikoff13
of Down Home Country Coding sheds some light on an issue
that a lot of developers know to avoid but do not fully understand:
Nested queries and the fact that they are bad. It turns out, there
are some good reasons why developers use them, but the bad far
outweighs the good:
- The reason nested queries are so widely used make sense.
Nested queries work in a sequential order that fall in well with
the way people think. It basically creates a list using some
criteria that is narrowed down by further criteria, that again
narrowed down by further criteria, ad infinitum. This
step-by-step approach will get you to the answer, but at the
cost of memory and CPU cycles. The process requires multiple
passes (i.e. loops), which are bad if there is a lot of
simultaneous requests.
- Joins combined with aliases can skip the grinding loop of
shaving down a list of returned rows and get to the data the
developer really wants in an efficient way. The trade-off
is that the developer has to keep track of the aliases and
understand fully how joins work.
- If executed properly, joins basically shorten the list as
the rows are retrieved, rather than generating multiple lists
and trimming each one criteria at a time.
- The vast majority of the time, according to Selikoff, nested
queries are not necessary and the same work can be achieved more
efficiently with joins.
- In the event a nested query is absolutely necessary,
Selikoff recommends making multiple database calls and using the
programming language to construct queries that take the output
of each call and use it to continue to the next step. This
sounds counter-intuitive, but Selikoff says in his experience,
this method scales better and is less susceptible to slow-downs
in the event the data composition of the table changes.
While the author's advice won't work 100 percent of the time, it
does offer developers some thing to think about when they are trying
to optimize a MySQL-based application.
Alternatives to LAMP
While Linux, Apache, Mysql, and PHP (LAMP) work well as a web
development combination, that does not mean that there aren't
alternatives that are better suited for specific purposes. David
Chisnall14 at
Informit has some good recommendations for alternatives that may be
useful for any developer to be aware of from time to time:
- The alternatives to Linux are pretty well known, but
the advantages of each may not be:
- Solaris is great for multi-threaded applications and can
squeeze a ton of performance out of multiprocessor machines.
- OpenBSD is super-secure, thanks to its development
process and runs really well on old hardware. It is not
optimized for multithread or multiprocessor configurations
though.
- FreeBSD is great for virtualization because of its
"jail" architecture, that allows a high level of privileges
for each user without the worry of the entire system being
affected.
- Finally, NetBSD can run on virtually any hardware and is
really fast. It's not as fully featured, but is the best for
speed.
- The well-recommended Apache alternatives are:
- LighTPD, which is really fast for static content (up to
200% faster than Apache in some cases). With FastCGI,
Chisnall claims that PHP can be served with LighTPD as well
as Apache.
- Yaws, which is heavily optimized for concurrency and
parallelization.
- Tux, which is very fast but if it crashes, it can kill
the whole server.
- MySql's alternatives are:
- PostgreSQL is very stable, protects data, and is fully
featured. It can sometimes be much faster than MySQL when
doing complex queries.
- SQLite is very stripped down and each database is a
single file on the server and all data is treated as a
string. It's really fast, though, and good for applications
that only require one process accessing it at any given
time.
- PHP alternatives are:
- Perl is an oldie, but it can still be very useful for
string-intensive web apps.
- Java, in the form of JSP and other frameworks such as
WebObjects. These frameworks are highly flexible and benefit
from the powerful APIs built with Java (as well as servers
like Tomcat).
- CGI and FastCGI can harness the power compiled languages
like C to be lightning fast. The downside is the procedural
nature of C and the need to recompile often.
Tuning Apache and PHP for Speed on Unix
John Lim15
writing for PHP Everywhere provides some good tips for
tuning Apache and PHP for performance. Some of these tips are common
sense, but some are sure to
- Make sure to benchmark. This can be time-consuming, however,
it is the only way to truly judge what changes are having a
positive, worthwhile impact. He recommends ApacheBench or
Microsoft's Web Application Stress Tool.
- According to Lim, PHP scripts running in Apache are 2-10
times slower than static HTML. Therefore, try to use
static pages whenever possible.
- If CPU cycles are not a premium, try enabling compression of
your HTML in PHP. It will speed up download times for the users
considerably. Faster downloading = happier users.
- In PHP, pass array and objects by reference. This can
conserve a considerable amount of RAM.
- Run each service on a separate machine whenever possible
(i.e. web server on one box, database on another). Lim goes on
further to say that it may be worthwhile in some situations to
use different server software on different boxes for different
types of content.
- Beyond these tips, Lim provides links to a myriad of
specific tuning tips for many types of environments and setups.
Entries for Week 4: March 30, 2009 through
April 5, 2009
20 Ways to Secure
your Apache Configuration
Web
developer Pete Freitag10
offers a nice list of suggestions for making an Apache server more
secure. While this list is by no means exhaustive, it is a good
starting point for some very basic things server admins can do to
make their boxes are not compromised. Some of the best ideas
include:
-
Use the
ServerSignature Off and
ServerTokens Prod
to keep Apache from displaying
too much in its headers or error pages. This is an example of
security through obscurity, but its better than telling all to
anyone who wants to know.
-
Make sure Apache is running in its own exclusive user and group.
If not, an attack on a service running the same user could
exploit Apache too! Also, make sure root owns the rights to
Apache's config and binary files.
-
Turn off all features you do not need, including directory
browsing, CGI, and SSI. All features add a level of exploit, and
if you're not using it, you're incurring the liability for
nothing.
-
Disable support for
directory-level .htaccess files by issuing
AllowOverride None in your
httpd.conf file.
-
Set restrictive limits on anything you can safely limit,
including the maximum request, timeout, concurrent connections,
and IP addresses allowed to access certain resources. Granted
that some or all of these may not be able to be limited
depending on the situation, but the more limited, the better in
security terms.
A HOWTO on Optimizing PHP with Tips and Methodologies
Web developer and systems administrator John Lim11
of PhpLens provides some excellent information in regards to
optimization into PHP. Not only does he provide concrete things that
can be done to optimize scripts, but he also provides a good
information about the tradeoff between scalability and speed:
- If RAM isn't an issue, scripts can be tuned for more speed
(in terms of CPU seconds). PHP runs multiple copies of the same
script for each request.. Oftentimes, the same job can be done
with fewer executions (minimizing CPU time needed), but the
tradeoff is that more RAM is needed to store the data processed.
Scripts that are memory efficient often use more CPU cycles.
More individual executions are made and data is handled in
smaller chunks, resulting in less need for memory. Lim provides
a nice graph and example of two scripts performing the same task
to illustrate this point.
- Optimize for output file sizes, since PHP can only
push as much data back to the browser as is the size of the
network connection.
- Watch the shared memory. Too little spread amongst multiple
copies running brings PHP scripts to a crawl.
- Avoid hard disk reads as much as possible. If RAM is
available, consider creating RAM disk caches for flat-file data
that is read frequently.
- Optimize code up front and consider scalability,
flexibility, and speed. Decide what tradeoffs the project will
tolerate, as Lim says you can't achieve 100% in all three areas.
Optimizing after the fact takes longer than doing it right the
first time.
- Use a PHP optimizer, such as Zend Optimizer. According to
the data Lim provides, these optimizers almost always crank up
performance on servers that recieve moderate traffic.
- Benchmark functions, both built-in and custom written. This
is fairly simple, requiring only a few lines involving microtime
calculations. Also, the ApacheBench tool is a handy way to
stress test without having to have real, live traffic.
Boosting Apache Performance by using Reverse Proxies
René Pfeiffer12
of the Linux Gazette provides a good explanation of what a reverse
proxy is as well as some technical information that is invaluable to
those wishing to make sure their Apache server is running without
too much overhead:
- A reverse proxy is a cache for the server that serves up
data that hasn't changed to the client (browser, web service,
etc.). This leaves the actual web server free to process
information that is not static, or is static but has changed
recently. The reverse proxy talks to the web server for most
requests, however, the actual amount of data sent between the
two for each request (in most cases) is considerably less than
the amount of data the web server would have to send to the
clients.
- Apache is slow for serving static content (images, static
HTML/CSS etc.), and since that is what reverse proxies are best
at, Apache really gets a boost from a well-configured reverse
proxy.
- Apache can be configured to generate request headers
(information about files being served that the end user
doesn't want or need to see) automatically for certain content
types. If a resource is known to be static (i.e. jpg banner),
the header can be configured so that the receiving client (i.e.
a web browser or reverse proxy) knows that the resource can be
cached for a certain period past its initially read modification
date. If a reverse proxy is the receiver, it happily (and
efficiently) serves the static content, only bothering Apache
every so often to ask if the resource has been modified.
- Squid is an excellent reverse proxy. Pfeiffer provides
graphs that suggest that Squid considerably reduces the amount
of load on Apache in a production server handling 120 requests
per second by nearly 50 percent!
Entries for Week 3: March 23, 2009 through
March 29, 2009
15 Essential Checks Before Launching Your Website
Smashing Magazine's Lee Munroe7
provides a great checklist for anyone launching a website to make
sure it's totally ready for visitors. While some of the tips are
commonly known, others are important but often overlooked. This is a
list that even seasoned web designers should consult to make sure
they haven't overlooked the small stuff that can make the
difference:
-
Include a favicon, which is a very easy way to get branding that
will actually stick if a surfer bookmarks your page.
-
Make sure to proofread. Visitors will appreciate it and search
engines will pick up on the properly-spelled keywords.
-
Make sure the site maintains some functionality even if things
like Flash and Javascipt are turned off. The functionality
doesn't have to be 1 to 1, however, make sure that the limited
visitors know they are missing out on something without
depriving them of at least a basic implementation of the
functionality.
-
Create an XML sitemap, with a good structure. Search engines, as
well as some live humans, can use it in case they're not
entirely sure where to go for the content they want.
-
Make friendly error pages, that provide links
that may be able to get users back on track (this may involve
pointing users to the sitemap mentioned above).
Which Are More
Legible: Serif or Sans Serif Typefaces?
If
you've ever wondered if serif typefaces are more readable than their
sans counterparts, web developer Alex Poole8
offers a fantastic literature review spanning over 100 years of
research on the subject. This information can be very useful to web
designers when deciding what typeface to choose (or at least help to
end holy wars about the topic). Here are his findings:
-
After it's all said it done, it appears that the majority of
typeface readability studies find that the differences are so
miniscule that the typeface alone isn't enough of a factor to
matter.
-
Some studies have found that serifs increase readability because
the extra marks provide extra space between letters.
-
The claimed readability of serif typefaces may be a result of
familiarity with a typeface moreso than the readability of the
typeface itself.
-
Some studies have found that sans serif fonts work better on
computer screens because there is less detail (serifs) that has
to be rendered on the monitor. However, this research took place
when fonts were bitmapped and resolutions were much lower.
-
Poole stresses that many other factors aside from typeface,
including type sizes, background, and font color, when taken in
aggregate appear to have more influence on readability.
HTML 5 differences from HTML 4
Whether web designer's like it or not, HTML 4 must eventually give
way to HTML 5. The good news is, according to Anne van Kesteren9
of W3C's HTML Working Group, browsers supporting HTML 4 should be
able to handle pages written with HTML 5's features (although those
browsers will not benefit from the new features of HTML 5). Here's
some of the more interesting information regarding the differences
between the two versions (although the document itself provides many
more details than are listed here):
-
HTML 5 places a higher
priority on accessibility with some new attributes. The article
named hidden and
progress as two
of these, but did not go into detail.
-
HTML 5 will be more flexible with differing media types, with
more attributes that deal with specific media types.
-
Elements that are specific to
certain semantic structures like menu, datagrid,
and the command
elements.
-
One thing that disturbs me is that HTML 5 will support two
syntax types: A traditional custom SGML markup that looks very
similar to current HTML and a pure XML format complete with all
the advantages and headaches of XML.
-
All purely presentational elements (center, font, strike, etc)
have been removed (although they are probably supposed to be
supported for the backward compatibility.
-
The DOM has been extended and is the overarching guide for how
HTML 5 is being constructed.
HTML 5
is still in heavy revision, but it will be here sooner or later, so
it is a good idea to know what to expect.
Entries for Week 2: March 9, 2009 through March
15, 2009
55 SEO Tips Even Your Mother Would
Love
Richard Burckhardt4
of Search Engine Journal provides a great list of 55 things you can
do to make sure search engines regard your pages with the highest
priority. Many of the tips (such as make sure your web page titles
are descriptive) are widely known, however, there are a few that are
not so commonly preached that may actually make a big difference:
-
Getting too hung up on PageRank is
short-sighted, since so many other factors matter, depending on
the context of a search.
-
Don't split your backlinks between equivalent
canonical names (i.e. www.domain.com and domain.com). Pick one
style and be consistent with it, or use 301 permanent redirects
if consistence is not possible.
-
Put plenty of descriptive text around links.
This can be as important as the link text itself.
-
Give visitors a strong call to action,
inspiring them to buy or use whatever got them to your page.
-
Use the word “image” or “photo” in your image
ALT text, since a lot of searches feature keywords with one of
these two words after it.
Obviously, there is a lot more to SEO than just
the tips listed above, but these are some that stood out from the
myriad of similar lists out there.
An Open Secret
Marshall Krantz5
of CFO Magazine believes that some Open Source
Software is ready to run certain portions of an enterprise while its
not so ready to run others:
-
It's best to deploy OSS first in a sector of the organization's
business that is not necessarily its core competency. Krantz
cites InterContinental Hotel group's use of SugarCRM, an
open-source customer relationship management package. This is
obviously an important area for InterContinental, however, it is
not mission-critical.
-
If
an organization has a commercial package that works well, don't
go open source just to switch to something new. InterContinental
sticks with it's old IBM mainframe to handle its booking
activities. The system is a little expensive, but it is
rock-solid and is backed by IBM's world-class support.
-
One of the least scrutinized areas of OSS is that some OSS
license terms make OSS not free for commercial applications.
Furthermore, some license are structured such that if the
application ends up containing non-free software, the
organizations using the software become liable for the
infringement costs just as much as the company who supplied it.
A way to mitigate this, according to Krantz, is to make sure
that any OSS used by an organization is OSI certified, which
should ensure that the software is totally free to use for
whatever purpose.
Cloud Computing Survey: IT Leaders See Big Promise, Have Big
Security Questions
Laurianne McLaughlin6
of CIO.com provides insight as to what 173 high-ranking IT leaders
in the US think about cloud computing by giving a rundown of a
survey about the use of the technology. The consensus seems to be
that cloud computing shows big promise but is too unpredictable to
use in mission-critical situations.
-
The flexible and cost-effective nature of cloud computing is the
most appealing factor amongst the IT professionals surveyed.
Cloud architecture allows companies to pay for what they need
and scale on demand (both up and down).
-
A
major downside is that cloud architecture is evolving too
quickly and isn't mature. This makes investing a lot of
resources in long-term cloud development risky because the whole
nature of how a particular cloud implementation could change far
more quickly than is possible to optimize for.
-
The biggest hurdle is that IT leaders do not regard the cloud as
a secure place for data.
-
In
the next several years, 53% of the survey respondents say that
cloud architecture will change the way many enterprises do
things, because of the power and scalability it offers.
-
Most respondents feel that the cloud is the key to rolling out
successful Software as a Service (SaaS) implementations, but
only after cloud architecture mature and standardizes.
-
Significantly, 42% say they would like to use cloud architecture
in some way by 2012 to power ERP applications, since the cloud
has the power and scalability to handle the amounts of data
needed to power ERP.
Judging from these results, most companies with sufficient resources
should position current data-intensive IT projects so that they can
begin using the power of the cloud once the technology matures and
the security concerns have been addressed.
Entries for Week 1: March 2, 2009
through March 8, 2009
Why a CSS Website Layout Will Make You Money
Trenton Moss1 of
Webcredible gives us four good reasons to make sure companies use CSS to
layout web pages instead of relying on the use of HTML tables:
- CSS layouts require less code than equivalent HTML table
layouts. The reduction in code results in less bandwidth used during
site traffic.
- CSS layouts make it easier for search engines to index a site.
The search engine does not have to parse as much code and the most
important content is easier to put at the top of the document.
- The reduction in the amount of code used in a CSS layout results
in faster download and rendering speeds for the end user. When
tables are used for layouts, the browser must receive all table data
before fully rendering the content. CSS layouts render as they
are received.
- CSS layouts can be device-agnostic, whereby the same HTML
structure can be used to generate layouts for computer screens, cell
phones, and PDAs. With traditional table-based layouts, totally
separate HTML structure version had to be created.
The Beauty of Simplicity
Linda Tischler2
of Fast Company Magazine investigates how making technology products
easy to use provides a huge competitive advantage to companies who can
figure out how to do it. Tischler makes several keen observations:
- Users say they want a ton of feature, however, they are much
happier with products that do fewer things but do them very simply
and very well. The bottom line is that users just want the products
to work, with minimum fuss.
- Creating simplicity from technology products is difficult. The
products often employ complex science that needs to be presented
behind the facade of simplicity.
- Powerful, simple-to-use products come from companies who are
committed to the idea of the simple user experience at all levels of
the company. Everyone from top management down to the product
testers must feel that simplicity is very important.
- A major problem with many products now is that adding features
often does not add cost because the features are added via software.
The iPod is successful not because of its hardware, but how the
software presents the hardware's functionality. Just because you can
do something does not mean you should.
- Google's director of Web products, Marissa Mayer, only puts
links to products on Google's front page that users have shown
(through traffic) are the most useful to them. Less popular products
are still available, but tucked away so as to not get in the way of
what most users want.
- Embracing simplicity can lead to big returns. Management at
electronics giant Philips has dug itself out of a slump by making
sure all of its products are easy to use. Everything from packaging,
menus, remote controls, and manuals must pass multiple simplicity
metrics before it can be used in a shipping product.
Web 2.0 Has Corporate America Spinning
BusinessWeek's Robert Hof3
examines the impact that Web 2.0 implementations are having on big
corporations. Some of his observations are:
- Web 2.0 sites help users "get something done."
- Employees of corporations are now using blogs to communicate with customers.
This gives corporations a face and allows customers to develop
emotional attachments to the brand.
- Companies like Disney are using wikis to enable departments and
product development groups to maintain up-to-the-minute
documentation of product developments which are contributed to by
the entire team. This has resulted in very responsive and agile
groups.
- Social Networking sites like LinkedIn are being tapped to find
sales leads as well as help with staffing.
- Web 2.0 sites are focused on the power of the collective,
whereby information that is useful to one person may be useful to
more people. Within the context of corporations, this is helping all
employees using a service to help other employees that use the same
service.
- Web 2.0 applications have the ability to provide free PR: Sites
like Technorati can call attention to a corporation's content, all
for free. All the company has to do is make sure its content is
appealing and relevant to its customers.