1. 程式人生 > >將web伺服器日誌檔案中的IP地址轉換為主機名

將web伺服器日誌檔案中的IP地址轉換為主機名

需求:將log檔案中的ip地址轉換為主機名

日誌檔案的格式如下:

10.100.122.132 - [17/Jun/2013:22:53:58] "GET /bgs/greenbg.gif HTTP 1.1" 200 50

10.100.122.133 - [17/Jun/2013:22:53:58] "GET /bgs/redbg.gif HTTP 1.1" 200 50

轉換後的效果

PC-20161220MYVT  - [17/Jun/2013:22:53:58] "GET /bgs/greenbg.gif HTTP 1.1" 200 50

sog  - [17/Jun/2013:22:53:58] "GET /bgs/redbg.gif HTTP 1.1" 200 50

1.       解決方案1:順序處理

public class MainThread {
    public static void main(String[] args) {
        try (BufferedReader in = new BufferedReader(
                new InputStreamReader(new FileInputStream("a.txt"), "UTF-8"));
        BufferedWriter bw = new BufferedWriter(new FileWriter("a_01.txt",true))) {
            for (String entry = in.readLine(); entry != null; entry = in.readLine()) {
                int index = entry.indexOf(' ');
                String address = entry.substring(0, index);
                String theRest = entry.substring(index);
                String hostname = InetAddress.getByName(address).getHostName();
                bw.append(hostname + " " + theRest);
                bw.newLine();
            }
        } catch (UnsupportedEncodingException e) {
            e.printStackTrace();
        } catch (FileNotFoundException e) {
            e.printStackTrace();
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

存在的問題:程式耗費了大量時間等待DNS返回請求,在此期間什麼也不做

2.       解決方案2:使用執行緒池

由一個主執行緒讀取日誌檔案,使用執行緒池將各個日誌項(每行)傳遞給其他執行緒進行處理。

通過這種方式,由於DNS轉換耗時,那麼可以在堵塞的時候進行其他執行緒的執行(如果DNS轉換不耗時,那麼就沒有什麼必要使用多執行緒),注意主執行緒中仍是順序執行的,future是按照讀取的順序逐次返回。

2.1          DNSResolverTask

public class DNSResolverTask implements Callable<String> {

    private String line;
    public DNSResolverTask(String line) {
        this.line = line;
    }
    @Override
    public String call() {
        try {
            // separate out the IP address
            int index = line.indexOf(' ');
            String address = line.substring(0, index);
            String theRest = line.substring(index);
            //很多訪問者訪問網站時會請求多個頁面。
            //DNS查詢成本很高,如果每個網站每次出現在日誌檔案中時都要查詢,這樣做並不合適。
            //InetAddress類會快取請求過的地址。如果再次請求相同的地址,它可以從快取中獲取,這比從DNS獲取要快得多。
            String hostname = InetAddress.getByName(address).getHostName();
            return hostname + " " + theRest;
        } catch (Exception ex) {
            return line;
        }
    }
}

2.2          MainThread

public class MainThread {

    private final static int NUM_THREADS = 4;

    public static void main(String[] args) throws IOException {
        ExecutorService executor = Executors.newFixedThreadPool(NUM_THREADS);
        Queue<LogEntry> results = new LinkedList<LogEntry>();

        try (BufferedReader in = new BufferedReader(
                new InputStreamReader(new FileInputStream("a.txt"), "UTF-8"))) {
            //主執行緒讀取檔案項的速度要比各個執行緒解析域名並結束的速度快得多

            //會讀取檔案,併為每一行建立一個LookupTask.
            //通過for迴圈保證順序
            for (String entry = in.readLine(); entry != null; entry = in.readLine()) {
                DNSResolverTask task = new DNSResolverTask(entry);
                //如果通過DNS轉換不堵塞,那麼使用多執行緒就沒有什麼必要
                //由於DNS轉換耗時,那麼可以在堵塞的時候進行其他執行緒的執行,快是快在這個地方
                Future<String> future = executor.submit(task);
                LogEntry result = new LogEntry(entry, future);
                //想法1:我直接向檔案寫 怎麼樣?
                //直接向檔案寫,可選 速度應該也不會太慢

                results.add(result);
            }
        }

        BufferedWriter bw = new BufferedWriter(new FileWriter("a_02.txt", true));
        for (LogEntry result : results) {
            try {
                bw.append(result.future.get());
            } catch (InterruptedException e) {
                bw.append(result.original);
            } catch (ExecutionException e) {
                bw.append(result.original);
            }
            bw.newLine();
            //不要忘記flush
            bw.flush();
        }
        executor.shutdown();
    }

    private static class LogEntry {
        //最初的一行log記錄
        String original;
        Future<String> future;

        LogEntry(String original, Future<String> future) {
            this.original = original;
            this.future = future;
        }
    }
}

存在的問題:日誌檔案可能很龐大,所以使用LinkedList會導致這個程式佔用大量記憶體

3.       解決方案3:使用生產者消費者佇列

為避免這一點,可以把輸出放在一個單獨的執行緒中,它與輸入執行緒共享同一個佇列。由於解析輸入的同時可以處理之前的日誌檔案項,所以佇列不會膨脹得過大。但是這又會帶來另一個問題。你需要一個單獨的訊號指示輸出已經完成,因為空佇列已經不足以證明任務已經完成。最容易的方法是統計輸入行數,確保它與輸出行數一致。

3.1          DNSResolveTask

public class DNSResolveTask implements Callable<String> {

    Logger logger = LoggerFactory.getLogger(DNSResolveTask.class);

    private String line;
    public DNSResolveTask(String line) {
        this.line = line;
    }
    @Override
    public String call() {
        try {
            // separate out the IP address
            int index = line.indexOf(' ');
            String address = line.substring(0, index);
            String theRest = line.substring(index);
            //很多訪問者訪問網站時會請求多個頁面。
            //DNS查詢成本很高,如果每個網站每次出現在日誌檔案中時都要查詢,這樣做並不合適。
            //InetAddress類會快取請求過的地址。如果再次請求相同的地址,它可以從快取中獲取,這比從DNS獲取要快得多。
            String hostname = InetAddress.getByName(address).getHostName();
            //logger.info("return a line to queue");
            return hostname + " " + theRest;
        } catch (Exception ex) {
            return line;
        }
    }
}

3.2          WriteTask

public class WriterTask implements Runnable {

    Logger logger = LoggerFactory.getLogger(WriterTask.class);

    private int lineCount;
    private LinkedBlockingQueue<MainThread.LogEntry> queue;

    public WriterTask(LinkedBlockingQueue<MainThread.LogEntry> queue, int lineCount) {
        this.queue = queue;
        this.lineCount = lineCount;
    }

    @Override
    public void run() {
        BufferedWriter bw = null;
        try {
            bw = new BufferedWriter(new FileWriter("a_03.txt", true));
            while (!Thread.interrupted() && lineCount != 0) {
                if(!queue.isEmpty()) {
                    MainThread.LogEntry remove = queue.remove();
                    try {
                        logger.info("write a line");
                        bw.append(remove.future.get());
                    } catch (InterruptedException e) {
                        bw.append(remove.original);
                    } catch (ExecutionException e) {
                        bw.append(remove.original);
                    }
                    bw.newLine();
                    bw.flush();
                    lineCount--;
                }
            }
        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}

3.3          MainThread

public class MainThread {

    static Logger logger = LoggerFactory.getLogger(MainThread.class);

    private final static int NUM_THREADS = 4;

    public static void main(String[] args) throws IOException {

        final String fileName = "a.txt";

        //計算txt檔案行數
        int lineCount = getLineCount(fileName);

        ExecutorService executor = Executors.newFixedThreadPool(NUM_THREADS);
        LinkedBlockingQueue<LogEntry> results = new LinkedBlockingQueue<>();

        executor.execute(new WriterTask(results, lineCount));

        BufferedReader in = new BufferedReader(new InputStreamReader(new FileInputStream(fileName), "UTF-8"));
        for (String entry = in.readLine(); entry != null; entry = in.readLine()) {
            DNSResolveTask task = new DNSResolveTask(entry);
            //如果通過DNS轉換不堵塞,那麼使用多執行緒就沒有什麼必要
            //由於DNS轉換耗時,那麼可以在堵塞的時候進行其他執行緒的執行,快是快在這個地方
            Future<String> future = executor.submit(task);
            LogEntry result = new LogEntry(entry, future);
            //想法1:我直接向檔案寫 怎麼樣?
            //直接向檔案寫,可選 速度應該也不會太慢

            //想法2:放到list中 作為一個生產者佇列
            logger.info("add a line to queue");
            results.add(result);
        }

        executor.shutdown();
    }

    private static int getLineCount(String fileName) throws IOException {
        BufferedReader in = new BufferedReader(new InputStreamReader(new FileInputStream(fileName), "UTF-8"));
        String line;
        int lineCount = 0;
        while((line = in.readLine())!=null){
            lineCount++;
        }
        return lineCount;
    }

    static class LogEntry {
        //最初的一行log記錄
        String original;
        Future<String> future;

        LogEntry(String original, Future<String> future) {
            this.original = original;
            this.future = future;
        }
    }
}

3.4          執行結果