Monday, December 25, 2006

300$ per hour is real ?

I have heared that some of SAP consultants gain 300$ per hour ! As for me it is extremely cool :) Such kind of pay rate per hour is not only in ERP field. Some people say that programmers also can gain 200$ per hour. It is interesting what is the top limit for IT people...

500$ ?

100,000 lines of code per week

eBay says that "We roll 100,000+ lines of code every two weeks". Hmm.. at first look it is a huge amount. But lets see.. 2 weeks = 10 working days for the 100 of programmers is... 100 lines of code for a programmer per day.

What can I say about that ? They working not very fast, but carefully :)

Friday, December 22, 2006

eBay Internals - very interesting.
It is about some internals of eBay.
212,000,000 registered users, 2 Petabytes of data, 26 Billion SQL executions per day. But the most interesting for me is 3.3 million line C++ ISAPI dll - 150 MB. Oh man...

Python : Generate Random Strings

I like Python. More and more :) Here is example of random string generation :

import random
alphabet = 'abcdefghijklmnopqrstuvwxyz'
min = 5
max = 15
total = 1000000
FILE = open("filename.out","w")
for count in xrange(1,total):
for x in random.sample(alphabet,random.randint(min,max)):

It's rather shorter than adequate Java application. It seems to be that doing less we can take more :). The key is the random.sample(A,N) function - it is return random subset of A with N elements. Since I need random strings with random length varies from 5 to 15, I'm using N as random.randint(min,max).
There may be more elegant solution but this is works for me :)

Thursday, December 07, 2006

MySQL InnoDB storage engine

MySQL supports a lot of storage engines as well as plugable storage engine architecture. What about referrential integrity of tables with different storage engines ? Foreign keys support avaliable for InnoDB and Falcon. I don't know nothing about Falcon, but InnoDB... It's interesting who is the owner. Answer : Oracle Corporation :) . By the way, if we compare storage engines by this criterias

  1. Storage Limits
  2. Tablespace Support
  3. Configurable Page Sizes
  4. Automatic Storage Extension
  5. ACID Transaction Support
  6. Distributed Transaction Support
  7. Locking Granularity
  8. Savepoint Support
  9. Crash Recovery
  10. Foreign Key Support
  11. B-Tree Indexes
  12. Hash Indexes
  13. Clustered Indexes
  14. Full Text Indexes
  15. Data Caches
  16. Index Caches
  17. Query Cache Support
  18. Online Parameter Support
  19. Geospatial Support
  20. Replication Support
  21. Backup/Point-in-Time Recovery
  22. Memory Footprint
  23. Bulk Insert Speed
we'll see that InnoDB has a most advanced features across competitors. It is the most widely-used. But the insert speed for InnoDB tables is very slow. But nobody forbids to mix up different storage engine types in one database but for different tables to gain owerall performance increase.

Java Scanner : read whole page at once

One of new features of Java 5 is java.util.Scanner. It is great for parsing text files. Here is example how to parse the first column in csv file :

import java.util.Scanner;

public class TextScanner
public static void main(String[] args)
File file = new File("somefile.csv");
Scanner scanner = new Scanner(file);
String temp = null;
while (scanner.hasNext())
temp =;
catch (FileNotFoundException e)

I can use regular expression as a filter !
But there is a non-trivial use of this class. Reading whole page at once into single string object:
import java.util.Scanner;

public class WholePage
public static void main(String[] args)
URLConnection connection =
new URL("").openConnection();
String text =
new Scanner(
catch (IOException e)

Tuesday, December 05, 2006

MySQL : InnoDB vs MyISAM vs Archive Benchmark

MySQL has a few storage engines. To list them type :

mysql> show engines;

First of all I'm interested in the INSERT speed of DML.
It is crytical for some type of applications.
So, here is my benchmark.
All of engines have different INSERT speed.
To test it, but not the speed of my
network connection or local loop interface,
I decided to create dummy stored
procedure for inserting a lot of data into some table.
Here is the code :

delimiter //
create procedure test_insert()
declare counter mediumint;
set counter = 0;
while counter < 100000
test values
(1,'sample dummy text',now());
set counter = counter + 1;
end while;
end //

The DDL of sample table is :
first int(11) default NULL,
second varchar(20) default NULL,
third date default NULL

We can change the storage engine for existing table as so

mysql> alter table test engine=archive;
Query OK, 100000 rows affected (0.98 sec)
Records: 100000 Duplicates: 0 Warnings: 0

Really fast ! That is all what we need to do our benchmark.
The results (the number of rows 100 000):

  1. Archive : 5.20 sec
  2. MyISAM : 7.13 sec
  3. InnoDB : very slow !!!