哈工大編譯原理第一次實驗--詞法分析（Java版本）

阿新 • • 發佈：2019-01-10

1.在判斷空行的時候，java裡面用 line == "" 不好使，除錯發現進不去if，然後用line.equals("")就好使。

2.java標準化輸出，可以有：System.out.printf("%-10s\t<ERROR:識別符號重複!>\n",token);這種寫法！printf啊，但是可以不能輸出到檔案中。不過我們可以這麼寫：

output.write(String.format("%-10s\t<%s,-->",token,token));

String.format 救了我們哦~~

3.輸出到檔案中怎麼換行呢？output.write("空行~\r\n"); 呵呵，win下是\r\n哦，linux下\n。。。

===================================================================================

如何解讀這個看起來很糟糕的基本沒啥註釋的程式碼呢？

1.看清楚結構，結構如下：

（1）讀入一行line，把line轉成char[] 的strLine陣列，然後每次處理一個字元ch（看紅色程式碼，所有的處理都在for裡面）。

（2）然後對每個ch進行分類：if else if else if 。。。建議每次看一個if{}就不會頭暈啦

2.看清楚演算法，這個是基於很精巧的“狀態轉移圖”的程式，我拿個數字處理的程式碼講解下：

那麼我們就建立個二維陣列來實現這個狀態的轉移：

123456

1 d.#e##

2 ##d###

3 ##de##

4 ####-d

5 #####d

6 #####d

我們忽略0狀態，因為我們已經進入了。

狀態1到狀態1有向量連線，所以陣列d[1][1] = 'd'

狀態1到狀態2有向量連線，所以陣列d[1][2] = '.'

依次類推，沒有向量的就標為'#'，然後關鍵程式碼如下：

int s = 1;
Boolean isfloat = false;
while (ch != '\0'&& (isDigit(ch) || ch == '.' || ch == 'e' || ch == '-')) {
	if (ch == '.' || ch == 'e')
		isfloat = true;

	int k;
	for (k = 1; k <= 6; k++) {
	    char tmpstr[] = digitDFA[s].toCharArray();
	    if (ch != '#'&& 1 == in_digitDFA(ch, tmpstr[k])) {
		token += ch;
		s = k;
		break;
	    }
        }
        if (k > 6)
	    break;
	ch = strLine[++i];
}

當迴圈退出的時候（k為6），然後s是狀態，當狀態為 1 ， 3 ，6 的時候是正常退出

為 2 ，4 ，5的時候是有錯誤地退出。

=====================================================================

我的code.txt:

int a="a;
main()
{
int b =99A1;
int a= 999;
int c='a';
int abc = "hahah";
/*你妹啊*/
//你好啊

	print("Hello World!\n");//你又好了
	return 0;/*你妹啊*/
}

我的輸出：

line : 1
int       	<int,-->
a         	<識別符號,（a,入口：0）>
=         	<=,-->
"a        	ERROR:字串常量引號不封閉
;         	<;,-->




line : 2
main      	<識別符號,（main,入口：1）>
(         	<(,-->
)         	<),-->




line : 3
{         	<{,-->




line : 4
int       	<int,-->
b         	<識別符號,（b,入口：2）>
=         	<=,-->
99A1      	ERROR：請確保實常數輸入正確
;         	<;,-->




line : 5
int       	<int,-->
a         	<ERROR:識別符號重複!>
=         	<=,-->
999       	<實型常量,999>
;         	<;,-->




line : 6
int       	<int,-->
c         	<識別符號,（c,入口：3）>
=         	<=,-->
'a'       	<字元常量,a>
;         	<;,-->




line : 7
int       	<int,-->
abc       	<識別符號,（abc,入口：4）>
=         	<=,-->
"hahah"   	<字串常量,hahah>
;         	<;,-->




line : 8
/*你妹啊*/   	(註釋：/*你妹啊*/)




line : 9
//你好啊     	(註釋：//你好啊)




line : 10
空行~




line : 11
print     	<識別符號,（print,入口：5）>
(         	<(,-->
"Hello World!\n"	<字串常量,Hello World!\n>
)         	<),-->
;         	<;,-->
//你又好了    	(註釋：//你又好了)




line : 12
return    	<return,-->
0         	<實型常量,0>
;         	<;,-->
/*你妹啊*/   	(註釋：/*你妹啊*/)




line : 13
}         	<},-->

這是我那個很糟糕的原始碼：

package ouyang;

import java.io.*;
import java.util.*;

public class AnalysisCodeToWord {
	public static void main(String args[]) {
		String infile = "code.txt";
		String outfile = "out.txt";
		try {
			FileInputStream f = new FileInputStream(infile);
			BufferedReader dr = new BufferedReader(new InputStreamReader(f));

			BufferedWriter output = new BufferedWriter(new FileWriter(outfile));

			String line = "";
			int cnt = 0;
			while ((line = dr.readLine()) != null) {
				cnt++;
				if (cnt == 1) {
					System.out.println("line : " + cnt);
					output.write(String.format("line : %d\r\n", cnt));
				} else {
					System.out.println("\n\nline : " + cnt);
					output.write(String.format("\r\n\r\nline : %d\r\n", cnt));
				}
				if (line.equals("")) {
					System.out.println("空行~");
					output.write("空行~\r\n");
				} else {
					
					char[] strLine = line.toCharArray();
					
					for (int i = 0; i < strLine.length; i++) {
						char ch = strLine[i];
						String token = "";

						if (isAlpha(ch)) // 判斷關鍵字和識別符號
						{
							do {
								token += ch;
								i++;
								if(i>=strLine.length) break;
								ch = strLine[i];
							} while (ch != '\0' && (isAlpha(ch) || isDigit(ch)));

							--i; // 指標回退

							if (isMatchKeyword(token.toString())) // 是關鍵字
							{
								System.out.printf("%-10s\t<%s,-->\n", token,
										token);
								output.write(String.format(
										"%-10s\t<%s,-->\r\n", token, token));
							} else // 是識別符號
							{
								if (symbol.isEmpty()
										|| (!symbol.isEmpty() && !symbol
												.containsKey(token))) {
									symbol.put(token, symbol_pos);

									System.out.printf(
											"%-10s\t<識別符號,（%s,入口：%d）>\n", token,
											token, symbol_pos);
									output.write(String.format(
											"%-10s\t<識別符號,(%s,入口：%d)>\r\n",
											token, token, symbol_pos));
									symbol_pos++;
								} else {
									System.out.printf(
											"%-10s\t<ERROR:識別符號重複!>\n", token);
									output
											.write(String
													.format(
															"%-10s\t<ERROR:識別符號重複!>\r\n",
															token));
								}
							}
							token = "";
						} else if (isDigit(ch)) // 判斷數字常量
						{
							int s = 1;
							Boolean isfloat = false;
							while (ch != '\0'
									&& (isDigit(ch) || ch == '.' || ch == 'e' || ch == '-')) {
								if (ch == '.' || ch == 'e')
									isfloat = true;

								int k;
								for (k = 1; k <= 6; k++) {
									char tmpstr[] = digitDFA[s].toCharArray();
									if (ch != '#'
											&& 1 == in_digitDFA(ch, tmpstr[k])) {
										token += ch;
										s = k;
										break;
									}
								}
								if (k > 6)
									break;
								i++;if(i>=strLine.length) break;
								ch = strLine[i];
							}
							// if(ch) --i; // 指標回退
							Boolean haveMistake = false;

							if (s == 2 || s == 4 || s == 5) {
								haveMistake = true;
							} else // 1,3,6
							{
								if (!isOp(ch) || ch == '.')
									haveMistake = true;
							}

							if (haveMistake) // 錯誤處理
							{
								while (ch != '\0' && ch != ',' && ch != ';'
										&& ch != ' ') // 一直到“可分割”的字元結束
								{
									token += ch;
									i++;if(i>=strLine.length) break;
									ch = strLine[i];
								}
								System.out.printf("%-10s\tERROR：請確保實常數輸入正確\n",
										token);
								output.write(String.format(
										"%-10s\tERROR：請確保實常數輸入正確!\r\n", token));
							} else {
								if (isfloat) {
									System.out.printf("%-10s\t<實型常量,%s>\n",
											token, token);
									output.write(String.format(
											"%-10s\t<實型常量,%s>\r\n", token,
											token));
								} else {
									System.out.printf("%-10s\t<實型常量,%s>\n",
											token, token);
									output.write(String.format(
											"%-10s\t<整型常量,%s>\r\n", token,
											token));
								}
							}
							--i;
							token = "";
						} else if (ch == '\'') // 識別字符常量,類似處理字串常量。
						{
							int s = 0;
							Boolean haveMistake = false;
							String token1 = "";
							token1 += ch;
							while (s != 3) {
								i++;if(i>=strLine.length) break;
								ch = strLine[i];
								if (ch == '\0') {
									haveMistake = true;
									break;
								}
								for (int k = 0; k < 4; k++) {
									char tmpstr[] = stConDFA[s].toCharArray();
									if (in_sinStConDFA(ch, tmpstr[k])) {
										token1 += ch; // 為輸出
										if (k == 2 && s == 1) {
											if (isEsSt(ch)) // 是轉義字元
												token = token + '\\' + ch;
											else
												token += ch;
										} else if (k != 3 && k != 1)
											token += ch;
										s = k;
										break;
									}
								}
							}
							if (haveMistake) {
								System.out.printf("%s\tERROR：字元常量引號不封閉\n",
										token1);
								output.write(String.format(
										"%s\tERROR：字元常量引號不封閉\r\n", token1));
								--i;
							} else {
								if (token.length() == 1) {
									System.out.printf("%-10s\t<字元常量,%s>\n",
											token1, token);
									output.write(String.format(
											"%-10s\t<字元常量,%s>\r\n", token1,
											token));
								} else if (token.length() == 2) {
									if (isEsSt(token.charAt(1))
											&& token.charAt(0) == '\\') {
										System.out.printf("%-10s\t<字元常量,%s>\n",
												token1, token);
										output.write(String.format(
												"%-10s\t<字元常量,%s>\r\n", token1,
												token));
									}
								}
							}
							token = "";
						} else if (ch == '"') // 處理字串常量的
						{
							String token1 = "";
							token1 += ch;

							int s = 0;
							Boolean haveMistake = false;
							while (s != 3 ) {
								i++;
								if(i>=strLine.length-1) 
								{
									haveMistake = true;
									break;
								}
								
								ch = strLine[i];
								if (ch == '\0') {
									haveMistake = true;
									break;
								}
								for (int k = 0; k < 4; k++) {
									char tmpstr[] = stConDFA[s].toCharArray();
									if (in_stConDFA(ch, tmpstr[k])) {
										token1 += ch;
										if (k == 2 && s == 1) {
											if (isEsSt(ch)) // 是轉義字元
												token = token + '\\' + ch;
											else
												token += ch;
										} else if (k != 3 && k != 1)
											token += ch;
										s = k;
										break;
									}
								}
							}
							if (haveMistake) {
								System.out.printf("%-10s\tERROR:字串常量引號不封閉\n",
										token1);
								output.write(String.format(
										"%-10s\tERROR:字串常量引號不封閉\n", token1));
								--i;
							} else {
								System.out.printf("%-10s\t<字串常量,%s>\n",
										token1, token);
								output
										.write(String.format(
												"%-10s\t<字串常量,%s>\r\n",
												token1, token));
							}
							token = "";
						} else if (isOp(ch)) // 運算子,界符
						{
							token += ch;
							if (isPlusEqu(ch)) // 後面可以用一個"="
							{
								i++;if(i>=strLine.length) break;
								ch = strLine[i];
								if (ch == '=')
									token += ch;
								else {
									if (isPlusSame(strLine[i - 1])
											&& ch == strLine[i - 1])
										token += ch; // 後面可以用一個和自己一樣的
									else {
										--i;
									}
								}
							}
							System.out.printf("%-10s\t<%s,-->\n", token, token);
							output.write(String.format("%-10s\t<%s,-->\r\n",
									token, token));
							token = "";
						} else if (ch == '/') // 註釋+除號: 註釋只要識別出來就好。
						{
							token += ch;
							i++;if(i>=strLine.length) break;
							ch = strLine[i];

							if (ch != '*' && ch != '/') // 除號處理
							{
								if (ch == '=')
									token += ch; // /=
								else {
									--i; // 指標回退 // /
								}
								System.out.printf("%-10s\t<%s,-->\n", token,
										token);
								output.write(String.format("%-10s\t<%s,-->\n",
										token, token));
								token = "";
							} else // 註釋可能是‘//’也可能是‘/*’
							{
								Boolean haveMistake = false;
								if (ch == '*') {
									token += ch; // ch == '*'
									int s = 2;

									while (s != 4) {
										i++;if(i>=strLine.length) break;
										ch = strLine[i]; // 注意判斷溢位!
										if (ch == '\0') {
											haveMistake = true;
											break;
										}
										for (int k = 2; k <= 4; k++) {
											char tmpstr[] = noteDFA[s]
													.toCharArray();
											if (1 == in_noteDFA(ch, tmpstr[k],
													s)) {
												token += ch;
												s = k;
												break;
											}
										}
									}
								}
								else if(ch == '/') //這裡就不用狀態轉移了...
								{
									int index = line.lastIndexOf("//");
									
									String tmpstr=line.substring(index);
									int tmpint = tmpstr.length();
									for(int k=0;k<tmpint;k++) 
									{
										i++;
									}
									token = tmpstr;
								}
								System.out.printf("%-10s\t", token);
								output.write(String.format("%-10s\t", token));
								if (haveMistake) {
									System.out.printf("ERROR:註釋沒有封閉\n");
									output.write("ERROR:註釋沒有封閉\r\n");
									--i;
								} else {
									System.out.printf("(註釋：%s)\n", token);
									output.write(String.format("(註釋：%s)\n",
											token));
								}

								token = "";
							}
						}
					    else // 一些很奇怪的字元
			            {
			                if(ch != ' ' && ch != '\t')
			                {
			                	System.out.printf("%-10c ERROR:存在不合法字元\n",ch);
			                	output.write(String.format("%-10c ERROR:存在不合法字元\n",ch));
			                }
			            }
					}
				}

			}

			f.close();
			dr.close();
			output.close();
		} catch (FileNotFoundException e) {
			e.printStackTrace();
		} catch (IOException e) {
			e.printStackTrace();
		}

	}


	public static Boolean isAlpha(char ch) {
		return ((ch >= 'a' && ch <= 'z') || (ch >= 'A' && ch <= 'Z') || ch == '_');
	}

	public static Boolean isDigit(char ch) {
		return (ch >= '0' && ch <= '9');
	}

	public static Boolean isMatchKeyword(String str) {
		Boolean flag = false;
		for (int i = 0; i < 32; i++) {
			if (str.equals(keywords[i])) {
				flag = true;
				break;
			}
		}
		return flag;
	}

	public static Boolean isOp(char ch) // 判斷是否是運算子
	{
		for (int i = 0; i < 22; i++)
			if (ch == oper[i]) {
				return true;
			}
		return false;
	}

	public static int in_digitDFA(char ch, char dD) {
		if (dD == 'd') {
			if (isDigit(ch))
				return 1;
			else
				return 0;
		}
		return (ch == dD) ? 1 : 0;
	}

	public static Boolean in_stConDFA(char ch, char key) {
		if (key == 'a')
			return true;
		if (key == '\\')
			return ch == key;
		if (key == '"')
			return ch == key;
		if (key == 'd')
			return ch != '\\' && ch != '"';
		return false;
	}

	public static Boolean in_sinStConDFA(char ch, char key) {
		if (key == 'a')
			return true;
		if (key == '\\')
			return ch == key;
		if (key == '"')
			return ch == '\'';
		if (key == 'd')
			return ch != '\\' && ch != '\'';
		return false;
	}

	public static Boolean isPlusEqu(char ch) // 運算子後可加等於
	{
		return ch == '+' || ch == '-' || ch == '*' || ch == '/' || ch == '='
				|| ch == '>' || ch == '<' || ch == '&' || ch == '|'
				|| ch == '^';
	}

	public static Boolean isPlusSame(char ch) // 可以連續兩個運算子一樣
	{
		return ch == '+' || ch == '-' || ch == '&' || ch == '|';
	}

	public static Boolean isEsSt(char ch) {
		return ch == 'a' || ch == 'b' || ch == 'f' || ch == 'n' || ch == 'r'
				|| ch == 't' || ch == 'v' || ch == '?' || ch == '0';
	}

	public static int in_noteDFA(char ch, char nD, int s) {
		if (s == 2) {
			if (nD == 'c') {
				if (ch != '*')
					return 1;
				else
					return 0;
			}
		}
		if (s == 3) {
			if (nD == 'c') {
				if (ch != '*' && ch != '/')
					return 1;
				else
					return 0;
			}
		}
		return (ch == nD) ? 1 : 0;
	}

	public static String code = "";

	public static Map<String, Integer> symbol = new HashMap<String, Integer>();// =new
																				// HashMap<String,int>;

	public static int symbol_pos = 0;

	// 32個
	public static String keywords[] = { "auto", "double", "int", "struct",
			"break", "else", "long", "switch", "case", "enum", "register",
			"typedef", "char", "extern", "return", "union", "const", "float",
			"short", "unsigned", "continue", "for", "signed", "void",
			"default", "goto", "sizeof", "volatile", "do", "if", "while",
			"static" };

	// 7個
	public static String digitDFA[] = { "#", "#d.#e##", "###d###", "###de##",
			"#####-d", "######d", "######d" };

	// 22個
	public static char oper[] = { '+', '-', '*', '=', '<', '>', '&', '|', '~',
			'^', '!', '(', ')', '[', ']', '{', '}', '%', ';', ',', '#', '.' };

	// 4個
	public static String stConDFA[] = { "#\\d#", "##a#", "#\\d\"", "####" };

	// 4個
	public static String noteDFA[] = { "#", "##*##", "##c*#", "##c*/", "#####" };

}

哈工大編譯原理第一次實驗--詞法分析（Java版本）

1.在判斷空行的時候，java裡面用 line == "" 不好使，除錯發現進不去if，然後用line.equals("")就好使。 2.java標準化輸出，可以有：System.out.printf("%-10s\t<ERROR:識別符號重複!>\n",tok

編譯原理第三章詞法分析（上）

3.1.1 為什麼編譯器要把詞法分析和語法分析分開 3.1.2 詞法單元、模式和詞素（重要）例： 3.1.3 詞法單元的屬性（重要）詞法單元的屬性是用來記錄相對應的詞素的一些相關屬性資訊。例： int x = 10 + 20

編譯原理第三章詞法分析（下）

3.6 有窮自動機（非常重要） 3.6.1 不確定的有窮自動機(重要) 例：狀態0是開始狀態, 在狀態0上輸入符號b會進入狀態0，輸入a可能進去狀態0也有可能進入狀態1。所以對於狀態0來說一個確定的輸入符號a他有兩種離開狀態，這就是一種不確定的狀態。 &nbs

編譯原理小C語言--詞法分析程式

Problem Description 小C語言文法 1. <程式>→(){<宣告序列><語句序列>} 2. <宣告序列>→<宣告序列><宣告語句>|<宣告語句>|<空> 3.

編譯原理flex自動構造詞法分析基本瞭解

安裝flex 在Ubuntu下安裝flex非常簡單，只需要在終端中輸入 sudo apt-get install flex 即可；如果說找不到flex，可能你需要更新系統的源，百度一下“Ubuntu更新源”，應該可以解決； flex 什麼是flex？ fl

編譯原理----第四章語法分析（自上而下分析）

一、感受及總結語法分析是編譯過程的核心部分，語言的語法結構是用上下文無關描述的。因為自上而下分析可能會存在兩大問題，左遞迴和回溯，所以產生了消除左遞迴和克服回溯的方法----LL（1）分析法。使用LL（1）分析法要涉及FIRST集和FOLLOW集，這是重點。當

本人第一次編寫bat檔案（getToken.bat）

新建一個檔案命名為xxx.bat ,內容轉：https://www.cnblogs.com/micro-chen/p/5694423.html @echo off ::此處數字567 487表示需要點選的螢幕座標，可根據需求自行更改 adb shell input tap

人生第一次工作面試經歷（Java開發實習生）

今天參加先進數通公司在石家莊的專案組的面試，也算是第一次參加正式的面試吧，自我感覺不好，有點小緊張，有些問題剛出了門，就想起來了，面試的時候很尷尬，很多不會的，估計是沒希望。不過面試官小姐姐很和藹，問的問題也很基礎，不難，可是沒有一個問題回答的完整的。再接再勵

JavaScript詞法分析（重點理解）

javascript詞法分析JavaScript中在呼叫函式的那一瞬間，會先進行詞法分析。詞法分析的過程：當函式呼叫的一瞬間，會先形成一個啟用物件：Active Object（AO）,並會分析一下3個方面01：函式引數，如果有，則會將此引數賦值給AO，且值為undefi

第一次寫部落格（Java命名規範）

剛入社會，實習不過超過兩個月，總感覺該寫點什麼。自己依舊是初入程式設計的小白，寫些簡單的Java命名規範來作為開場白吧。當然有借鑑前輩的，請多多包涵。 java命名規範 ① 專案名全部小寫包名全部小寫 ② 類名所有單詞首字母大寫類名首字母大寫，如果

2018.3.13-第一次面試經歷總結（Java）

2018.3.13日下午3點面試，地點：廣州市黃埔區納金城D座。這一天我帶著激動與緊張的心情11點半從學校出發，2點半到達了目的地點。一樓是售樓的，一進門口一堆熱情的小哥哥就跑來問我來幹嘛的，我說我來面試的那熱情瞬間就沒了，O(∩_∩)O哈哈~，上到五樓看到一大

團隊作業八—第二次團隊沖刺（Beta版本）第 1 天

textview mat 地址源碼 tps 之間 res height blog 一、每個人的工作 (1) 昨天已完成的工作由於是才剛開始沖刺，所以沒有昨天的工作 (2) 今天計劃完成的工作；對界面的優化和一些細節的完善 (3) 工作中遇到的困難；工作中出現了

團隊作業八——第二次團隊沖刺（Beta版本）第7天

昨天 .cn 角色所有存儲技術簡單測試寫入功能團隊作業八——第二次團隊沖刺（Beta版本）第6天一、每個人的工作 (1) 昨天已完成的工作登錄註冊功能的完善與實現和簡單測試模塊的優化 (2) 今天計劃完成的工作修復昨天寫入SD存儲

團隊作業10——復審和事後分析（Beta版本）

pos spa com strong 版本 url tro href http 團隊作業10——事後分析（Beta版本） http://www.cnblogs.com/newteam6/p/6953992.html 團隊作業10——復審（Beta版本） http://

團隊作業9——事後分析（Beta版本）

arc 是不是合作一個決定例如工具角色教訓設想和目標 1. 我們的軟件要解決什麽問題？是否定義得很清楚？是否對典型用戶和典型場景有清晰的描述？　　單個系統的部分功能　　是　　是 2. 我們達到目標了麽（原計劃的功能做到了幾個？按照原計劃交付時間交

X.509證書的讀取操作與分析（Java版）

文章目錄 X.509證書的讀取操作與分析 1. X.509 1.1 定義 1.2 組成結構 1.3 安全性 1.4 證書檔名擴充套件型別 2. 讀取操作程式 2.1

Centos7 Openstack nova模組安裝與分析（Queens版本）

一、Nova框架 Nova Api ：提供統一Rest-api風格API介面，作為Nova元件的入口，接受使用者的請求 Nova Scheduler ：負責排程，將例項分配到具體計算節點 Nova Conductor ：負責Nova與資料庫進行

C語言源程式詞法分析器（Java實現）

一. 介紹詞法分析器，又稱掃描器，輸入源程式，進行詞法分析，輸出單詞符號。詞法分析僅僅是編譯程式工作中的一部分，編譯程式一般可以劃分為5個階段：詞法分析，語法分析，語義分析與中間程式碼產生，優化，目的碼生成。我們這裡編寫一個簡單的C語言源程式詞法分析器。

梯有N階，上樓可以一步上一階，也可以一次上二階（Java實現）

走樓梯問題組合數學和動態規劃演算法本文嘗試對“走樓梯”問題做一個較為系統的解釋。程式碼可以自己複製出去，除錯執行和理解！例3：一共有10級,每次可走一步也可以走兩步.必須要8步走完10級樓梯. 問:一共有多少種走法? 分析

哈工大編譯原理實驗1——詞法分析

設計實現類高階語言的詞法分析器，基本功能如下：（1）能識別以下幾類單詞：識別符號（由大小寫字母、數字以及下劃線組成，但必須以字母或者下劃線開頭）關鍵字（①型別關鍵字：整型、浮點型、布林型、記錄型；②分支結構中的if和else；③迴圈結構中的do和while；④

哈工大編譯原理第一次實驗--詞法分析（Java版本）

相關推薦