I know it is hard to imagine your system running slowly. In fact, I will even go out on a limb and say there has never ever been a slow network or processor. One exception is for different classes of hardware; another is for one case I know in which the firmware altered the CPU clocking.
I know you are calling BS here but I stand by it.
Networks transmit bytes at the rate they are designed for without fail.
Processors execute instructions for a quantum without fail.
Your view of slowness is not due to the network or the processor, it is always due to hurdles and obstacles that your software must overcome in order to actually work.
Think of it like cruising down the highway going 65 MPH through Hazard County. Roscoe flips up a stop sign. You stop because you are a law abiding citizen. This little obstacle just altered your timing and caused you to take longer to get to the Boar’s Nest. Imagine numerous and unforeseen obstacles like this. For software and systems, this is what happens. The job of the programmer, implementer, and support engineer is to minimize the number of obstacles and shorten their effect thus ensuring your software or system runs to maximum potential. How?
Well, a Forensic engineer focuses on critical path analysis.
Time for a Definition
Before jumping head first into critical path analysis, we should probably define what it is.
The online Oxford dictionary doesn’t exactly hit the mark; a real world definition in project management, software analysis, or even driving to work is “the longest time that it takes to complete a specific activity”.
If you take any activity and break it down into compartmentalized steps that are measurable, then you can start to reduce times in a methodical approach.
In the example below, times T1, T2, and T3 are measurable items in an activity. If the lengths of the chevrons represent time, then which one would you want to reduce to improve the overall time? T2, correct.
The key is that T1, T2, and T3 are measurable in some meaningful metric like time, cpu usage, memory utilization or similar.
Lets say I am driving to work and my route takes me on three main roads. If the time I spend in traffic on the roads is T1, T2, and T3, then focusing on improving T2, then T3 are critical for reducing my time to get to work. The same is applied to computer software and algorithm analysis.
When you click on a new browsers tab, load your email, or click on an application button do you want it to be fast or slow? Dumb question, fast…as fast as possible.
Let’s Move Forward!
At Forensic IT, we pride ourselves in critical path analysis of basically everything. Well, everything related to computers and software. My goal over the next few blog posts is to help you fine tune your critical path analysis skills. We will tackle some simple examples and complex ones as well using real data and repeatable steps you can take and follow along.
With the critical path analysis skills you will develop, you can apply them to your own issues and finally get some of those slow issues resolved. Let’s face it, if you have worked anywhere around IT then you have heard something like “the network is slow” or “my system is slow”.
Statements like those always crack me up.
My good friend and colleague, Erick Harlow, loves when people say “the network is slow”. He is also a farmer and usually has a comment back that involves hogs or hay. I won’t do him justice here trying to duplicate it, but I will try. He would say something like
"Well slap my face to the side of a hog and paste my ears with jam, do electrons really flow slower for you than they do for me?".
Ha! No, electrons are consistent.
Stay with us over the next few blog posts and we will really dig in and do some Forensic troubleshooting.
Do we have DBA (database administrator) talent or do we have to hire?
What kind of server or resources are needed? What are my power, server, disk, processing, network, and IO requirements?
How do I maintain, backup, administer and otherwise own the database framework?
What is my cost of ownership?
Let’s explore the first decision. What database type to use, SQL or No-SQL?
Traditional database types that are classified as SQL have a significant place in businesses and are a mainstay for business choices, however, as companies start to create applications that drive decisions based on significant database analysis of large, almost unfathomable amounts of data, they migrate to No-SQL solutions like MongoDB or Cassandra. The architecture of No-SQL makes it a good choice for big data solutions while the built in protections of a transactional based system like Oracle make it a better choice for banking or similar solutions.
When it comes to picking a specific system, businesses tend to stick with what they know. In other words, if they already have Oracle, and Oracle talent, then when management asks those individuals which database system they should use on Project X, it should be no surprise that they pick…Oracle.
Matching a specific database system to a set of business requirements is an arduous task that should always be looked at with a fresh perspective and not just based on what talent is already employed or what systems a business is comfortable with.
Let’s face it, if a business picks correctly, all is good. If they pick incorrectly, they have wasted a lot of resources which equates to dollars. Enter DBaaS.
Where DBaaS excels is that it gives businesses the ability to test the waters a bit, to try before they invest heavily.
Even if DBaaS is a stepping stone to total ownership, it is a cost effective solution to help figure out your needs prior to investing heavily.
Let’s explore the pros and cons.
Note, we need to distinguish between “hosting database systems” and DBaaS. There are many cloud based solutions that “host” a database system but provide no significant help in configuration, tuning, consulting, and providing the talent needed to actually use those systems. True DBaaS provides both the system and the talent to help you utilize the database and figure out how to store, query, and analyze your data.
The value of DBaaS goes way beyond the hosting.
Pros of DBaaS:
No equipment or software licenses
Flexibility. Multiple choices to test drive your applications and pick the right platform for your business requirements.
Significantly less staffing requirements. The DBaaS provider handles installation, configuration, and in many cases development.
Offsite provides protection from local power failures or disasters. Many businesses design with power redundancy in mind but rarely meet those goals in reality.
SLA agreements that have redundancy, up-time, and backup protections. A DBaaS provider has intent focus on protecting your data.
Cons of DBaaS:
Limited access to underlying servers. This can present itself as a feeling of no control.
Very little knowledge of how your data is protected from cyber security threats. This can be dangerous for sensitive data.
So how to you decide?
Is there a transition from one to the other. Yes there is almost always, but by following a few guidelines to start with, you can properly use DBaaS.
Forensic IT’s Guidelines for DBaaS:
Do all development using DBaaS. This is your chance to test drive different architectures and features.
Unless you have full disclosure of how your data is protected, managed, and secured by DBaaS providers, we suggest consulting with database architects to host sensitive data internally. Note, this is typically not big data. When we use the terms sensitive data, we mean just that. Data like SSNs, account details, financials, personal data, etc. Does this mean that you cannot use DBaaS for this? No, it means that you first have to find a DBaaS provider that will show you everything from how your encrypted data gets in their system to storage, access, etc.
When you are not sure of what your database needs really are, use DBaaS first. This lets you try SQL or No-SQL. This lets you explore the encryption capabilities of Oracle versus MySQL. Think of DBaaS like buying a car. You test drive sedans, trucks, and SUVs, and try different manufacturers and features. You may decide to lease or buy.
Always monitor and evaluate the cost of ownership. As your system grows, the operating costs might make sense to drop DBaaS and build an in-house system. By then, however, you have already decided on what you really need. You have test driven multiple solutions and can now buy only the licenses and hardware you know you need to be successful. You can hire the correct talent to manage your system.
Since our inception in 2003, I have been asked many times what is behind our name–Forensic. What does it mean with respect to IT?
To be honest, it is always a difficult question to answer in just a few words. I start talking about troubleshooting computers or computer systems and as soon as I bring up computers, I usually get a comment like:
“…oh, I get it now, my nephew is in computers too. He set up my Wi-Fi network…”.
They got their answer–we just work in computers. Smh.
Sometimes I just drop it and other times I continue to try and explain how we are different. Yes, we work in computers, but that has nothing to do with being Forensic, or really what we do. Computers just happens to be the current platform we work in.
So, since I have you captive for a few more words anyway, let me explain. Let’s start with a web search on the definition of “forensic”.
The word “Forensic” according to the definition above (remove the crime part for two reasons, 1-we don’t currently work in the criminal justice system, and 2-we are not criminals) means “…the application of scientific methods and techniques in the investigation…”.
Our company works with “IT” which is short for “Information Technology”.
Replace crime with IT and you have:
“Forensic IT – The application of scientific methods and techniques in the investigation of Information Technology”
Now we are getting somewhere.
I won’t bore you with more web searches but I will just tell you that the scientific method means
“…the systematic observation, measurement, and experimentation, and the formulation, testing, and modification of a hypotheses…”.
Oddly enough, we can apply that to almost anything. Any problem. Any industry–yes, even the criminal justice system.
A simple example. If you wake up and cannot find your car keys, you start to apply the scientific method. You do this naturally without thinking very hard. You think back to when you last had them and piece together what you did between when you last had them and the present. You think of what activities you did, people you talked with, places in your house you went, and usually, in a short time, you find your keys.
You don’t do silly things like look in the attic if you did not go in the attic. If you did not go out on your deck or your back porch then you probably don’t go looking there either. You use systematic observation to help find your keys.
We are no different at Forensic IT. We pride ourselves on looking at problems in this same fashion. It starts with learning some basic architecture of how a process works, how an application communicates, and what the anatomy of it is.
If we think of the above simplified process and break it down generically we can say that most computer processes do the same four common things:
Use memory to hold data
Interact with a user, system, or other process or device.
Follow a routine or algorithm for a purpose–i.e., the program.
Read/write data to files, networks, other processes or devices.
Now, that is a generic view of a process, but you have to admit–in its simplicity, it covers all processes. Not all processes use all four items listed and a few might add more, but in general it covers is.
Since all processes do basically the same core things, we felt that we could add value to all customers by tailoring our skills to analyze, expose, troubleshoot, and explain how processes do those four items.
I was teaching our Advanced Windbg course to a 3rd level support group and we were focusing on how the Virtual Memory Manager in windows mapped and addressed memory. I was challenged by one of the attendees who said:
“…so you mean to tell me that you don’t know all the differences between windows 2003, 2008, and 2012 heap managers? I thought you would be an expert on that…”
My response was to attach a debugger to a windows 2008 server that they had and use Windbg to map out the heap manager, put breakpoints on heapalloc() and heapfree() APIs and proceed to draw a heap manager architecture on the board in more detail than he could google about it. I hope that I handled myself in a professional way, but I do remember feeling a bit too satisfied during the challenge!
I bring this up to serve a point on being forensic. We do not pride ourselves on knowledge on how specific things work, rather, we pride ourselves on being able to analyze, reverse engineer, explain, and otherwise expose the architecture on how something works. To us, that is a way more valuable skill than trying to read and memorize technical details about any operating system or process.
Does this mean that we don’t believe in becoming experts on how certain processes or systems work?
Not at all. We strongly encourage our engineers to attain as much knowledge of how things work as possible. The difference is that we want them first to build skills to attain the knowledge–not the knowledge itself.
Below are two simple pictures that hopefully explain in more detail what a non-forensic subject matter expert (a level-2 or level-3 support engineer) sees in a process or system versus a forensic one. Both have value and both have a place.
If you want quick answers on how to configure or troubleshoot connectivity or look at normal application failures, the application or system expert can solve most of these issues. They have the knowledge, experience, and understanding of the system to help.
Figure: Level 2 or Level 3 Application Support
In this picture, we see a typical Level 2 or Level-3 support engineer with lots of system experience.
They are great at connectivity, configuration, and normal system failures.
They definitely provide value to an organization and fulfill a vital role in system support.
The troubleshoot many process and system failures.
When all processes are running but the system doesn’t work or when nothing is written to the log files or an issue surfaces that the specific application knowledge, experience, and internal architecture does not solve, the Forensic Engineer can help. They look at the same picture but with a different view, a different perspective.
Not better, just different.
Figure: Forensic Engineer
The Forensic engineer looks at the same picture using both a telescope and a microscope at the same time.
Although they have a generic view of what the application is intended to do (i.e, it’s requirements or design specification), they have a detailed view of what the application is actually doing.
Not only do they see the minute details of a process and system down to the memory blocks and disassembled code, they see a macro view of the overall system. It is with this different view, and frankly, without internal system knowledge of the process itself that they excel and provide value. This is because they can look at a failure and examine data to explain how the process itself behaves and operates.
Without any preconceived notion or expectation of behavior, they are free to let the data (The Bits & Bytes) lead them down a path to root cause.
I would be remiss if I did not point out that when you use a forensic, scientifically backed method or approach, you can safely declare:
A Forensic Engineer:
Never guesses. They explain behavior with data.
Always has a logical next step in the troubleshooting process.
Where to start? The hardest part of writing this post was creating the title. It should be simple, right? The problem is that there are too many possible titles like:
How NOT to Load Data to a Cloud Database
Using Wireshark to Troubleshoot Network Performance
If it’s Not Broke, Don’t Fix It
MySQL Table Import Wizard Architecture — What were they thinking?
Why the dilemma you ask?
All I wanted to do was load an excel file with about 466k rows into a database on a cloud server. This has probably been done a gazillion times by now all over the world and my fear is that many of you fall victim to the “it’s working so it must be right mentality” and have suffered through what I am going to describe. This post is not about uploading data to a cloud database or how MySQL table imports work. Rather, it is about how beneficial your IT life will be if you just learn to look under the hood at how things work and then learn when to adjust.
So, the chosen title is “Quick Network Analysis to Avoid Throwing Your Computer Against the Wall!”
So, to rewind. I had an excel file (csv – comma separated value) with about 466 thousand rows of data. I wanted to load this from my laptop to a hosted cloud server.
My target table structure was simple:
My exported file from another server would map perfectly to this so I just decided to use the Table Import Wizard in MySQL Workbench. In workbench, find the table, right-click and select Table Data Import Wizard.
Answer a few questions about encoding format, CSV-to-table mapping and turn it loose. I thought to myself, “466k rows, a few minutes, time for coffee”.
After a little water cooler talk and a new cup of coffee, I returned to my desk and much to my displeasure I still saw the table import wizard running. Hmm. So like any normal DBA would do, I decided to check the row count.
select count(*) from dcHMIMessages
A whopping 119 rows! 119!?! That is 0.025% of the total in a few minutes. A few more selects and I have come to the sad conclusion that I am inserting about a row of data a second. At this blistering pace, my import will be done in 5.39 days.
Ok, remove the lazy, GUI using, coffee drinking, developer hat and put on my trusty Forensic hat (available for purchase!) to see what is going on.
And yes, that is a John Wayne bobble-head. If a grown man was going to own a bobble-head, who else would be better than The Duke?
So with my Forensic hat on, I picked my tool of choice for troubleshooting data performance across a network–Wireshark.
A quick sample of the network data explains why it is such a snoozer…and reveals how I entertained the title “MySQL Table Import Wizard Architecture — What were they thinking?”
Note, we have network classes to teach you how to use Wireshark if you are interested. This post is just to show some benefit of using it. Hit us up on our contact page if you are interested, or purchase a PDF here to get the scoop on using Wireshark.
So what does Wireshark show me about the performance?
The key columns to notice in the above image are:
Delta – the time since the previously captured packet (you can set to displayed, but in this case it is captured). What does this tell me? It basically tells me that it takes my computer about 0.000# seconds to respond with next packet and it takes the MySQL server about 0.0# seconds to process. Not bad.
Length – the length of my packet. This is extremely small for what I would expect. When you look at the table structure (first image in article) it is much longer than 65 bytes especially when you remember that the length of the packet is also including all of the packet headers. Just the IP header alone is 20 bytes.
Bytes in flight – This is the amount of unacknowledged data flowing between the two ports in this conversation. In terms of flow control, this is what I am passing back and forth. Seems a bit small don’t you think?
This is the ah-ha moment in the troubleshooting in which we can see why this is such a snoozer.
My computer is IP Address 192.168.119. When you look at packets 1083, 1085, 1087, and 1089 you can see the length of data that my computer is sending to the destination (which is 5 bytes, 22 bytes, 5 bytes, 33 bytes) respectively. On a network that can handle at a up to 1460 bytes of data per packet, this is like moving a pool of water with a teaspoon.
Packets 1082, 1084, 1086, and 1088 are all 11 bytes in flight and are just part of MySQL acknowledging my data.
So a quick status check shows that my laptop response and the MySQL response are fairly quick and that the issue is that my data throughput is minuscule. Time to take a peak into the packets to see what is happening. First let’s look at a snippet of the CSV.
Since I did not write the MySQL data import wizard, I have no idea how it works, however, as a developer, my basic assumption is that they would read a large chunk of the CSV, send the data to the server and then parse the data by looking for commas and newlines. Without doing much ciphering we can see that when I send 5, 22, or 33 bytes of data that clearly we are not sending a large chunk of data. My Forensic sense tells me something is not quite right.
So, looking through Wireshark protocols (middle pane of Wireshark data) I can see that every 5 byte MySQL packet is a ping to see that the server is Alive. Yeah! We get to check before every packet. Ugh. See below.
After the ping and the reply, I send a 22 byte packet shown below:
This looks like a parameter statement “SET @a4 = “3018“. Hmm. Not sure about that yet, but from the Wireshark data we see that not only do we have to suffer through the 5 byte ping and the 11 byte response from the cloud MySQL but after our teaspoon of data (22 bytes in flight) we send another 5 byte ping and have to wait for the response.
Next, we send the 33 byte packet:
This has a little more meaningful data and looks like a datetime stamp. Ugh. Now after searching through the CSV for 3018 and the datetime value we can determine that the client portion of the program is sending the data ONE CELL AT A TIME–are you kidding me! One cell of data at a time.
A little more math. Number of cells per row is 13 and we have ~466k rows which is about 6,058,000 cells (i.e., teaspoons) of data. Remember too that in between each cell, we need to send a MySQL Ping and wait for the response. This means that to send a CSV value (i.e., a cell in the document) we use 4 packets (1-Ping, 2-Ack, 3-Cell data, 4-Ack). A quick calculation shows that the number of packets I will have to use to move the data to the server is 4 x 6,058,000 which equals 24,232,000 packets. WOW!
Ok. I did not set out to rant about MySQL table import wizard, but in all honesty, I did expect better performance as I am sure there are plenty of people that have suffered through what I just showcased for you and did not know they could do anything different. I am not a GUI person and maybe someone will point out that there are a million ways I could tune the import wizard to make it faster, however, that would have not let me write this post! But seriously, out of the box, this is undesirable behavior for any more than a few dozen rows of data.
Luckily, there are always other ways to do things as my colleague Jonathan suggested. He said to move the file to the server first to avoid the network latency which I did. On top of that I went back to the trusty command line (when in doubt, command line always beats GUI). To be fair to MySQL Workbench, I need to test speeds when both data and server are on the same box. I already spent too much time getting the rows in there that I will leave that for another day. To the rescue is the command line utility mysqlimport.
So, next I ftp’d the file to server (took about 30 seconds).
Basically the command ignores the header row, indicates that fields are terminated by a comma (,) and then has the credentials of my server. The default behavior is to import to a table with the same name as the csv.
Total execution time to insert 466,000 data rows. Less than 5 seconds.
Yes, less than 5 seconds. Shazam!
So I could spend 5.39 days using the GUI data import wizard or move the file to the server and use the command line and do the entire thing in just a few seconds (about 35 if you remove the time of my keystrokes). Wow!
Maybe I could improve my network, maybe I could tune the MySQL import wizard, maybe. I am sure I could never get to 5 seconds though using the GUI.
Now you might be saying,
“Dave, no one would move such a huge amount of data that way and they would do the way you did”
and you might be right. I suspect, however, that the majority of people out there would use the GUI and just suffer through it not knowing it was bad because it was working.
Besides, that is not the point of the post.
The point of the post is to look at performance with your Forensic hat and recognize when something smells bad. Then use tools to expose data and truly understand what is happening.
In this example, everything was working. No errors were present and the data was moving from source to destination. However, working does not always mean working well. With a little Forensic ability, we can peak under the hood at the Bits & Bytes and understand what “slow” really means and how to fix it.