網頁

2018/12/10

LeetCode Unique Email Addresses 唯一的郵件地址

本篇為LeetCode上演算法的簡單問題,Unique Email Addresses

題目如下:

Every email consists of a local name and a domain name, separated by the @ sign.

For example, in alice@leetcode.com, alice is the local name, and leetcode.com is the domain name.

Besides lowercase letters, these emails may contain '.'s or '+'s.

If you add periods ('.') between some characters in the local name part of an email address, mail sent there will be forwarded to the same address without dots in the local name. For example, "alice.z@leetcode.com" and "alicez@leetcode.com" forward to the same email address. (Note that this rule does not apply for domain names.)

If you add a plus ('+') in the local name, everything after the first plus sign will be ignored. This allows certain emails to be filtered, for example m.y+name@email.com will be forwarded to my@email.com. (Again, this rule does not apply for domain names.)

It is possible to use both of these rules at the same time.

Given a list of emails, we send one email to each address in the list. How many different addresses actually receive mails?


Example 1:

Input: ["test.email+alex@leetcode.com","test.e.mail+bob.cathy@leetcode.com","testemail+david@lee.tcode.com"]
Output: 2
Explanation: "testemail@leetcode.com" and "testemail@lee.tcode.com" actually receive mails

一個email包含了兩部份,分別為local name(帳號名稱)及domain name(網域名稱),兩者以@符號隔開。例如alice@leetcode.com中的alice是local name;而leetcode.com為domain name。

除了小寫字母外,email可能包含'.' or '+'符號。如果你在local name中加了'.'符號,其效果等同沒加。例如"alice.z@leetcode.com""alicez@leetcode.com"是相同的郵件地址。注意此規則僅適用在local name,domain name並不適用此規則。

如果在local name中加了'+',則加號'+'後面的任意符號都會被忽略,例如m.y+name@email.commy@email.com相同。同樣地,此規則不適用domain name。

以上兩符號可能同時出現在local name中。

輸入一email清單,請查出在這清單中有幾個email地址?

其他條件

  • 郵件清單長度介於1到100。
  • 郵件地址的長度介於1到100。
  • 每個郵件地址確實只有一個'@'符號

第一次想到的解法步驟如下:

  • 跑迴圈遍歷輸入的郵件地址清單。
  • 將郵件地址拆為local name及domain name部分。
  • 將local name中'+'後的字符移除以及將'.'移除。
  • 重新將local name + @ + domain name串起放入HashSet中。
  • 回傳HashSet中裝載的郵件數目。
public static int numUniqueEmails(String[] emails) {

    Set<String> set = new HashSet<>();

    for (String email : emails) {
        String[] ss = email.split("@");
        String localName = ss[0];
        String domainName = ss[1];

        set.add(new StringBuilder(
                localName.substring(0, localName.indexOf("+")).replaceAll("\\.", ""))
                .append(domainName).toString());
    }

    return set.size();
}

不過這樣的效能不好,在leetCode跑分為53 ms左右。


下面使用Java 8的lambda改寫,效能更差,107ms

public static int numUniqueEmails(String[] emails) {
    return Arrays.stream(emails).map((email) -> {
        String[] ss = email.split("@");
        String localName = ss[0];
        String domainName = ss[1];
        localName = localName.substring(0, localName.indexOf("+")).replaceAll("\\.", "");
        email = new StringBuilder(localName).append("@").append(domainName).toString();
        return email;
    }).collect(Collectors.toMap(email -> email, email -> 1, (email1, email2) -> {
        return email1;
    })).size();

}

參考了討論區其他人的解法,改寫為下面則縮短到17ms。

public static int numUniqueEmails(String[] emails) {
    Set<String> set = new HashSet<>();
    for (String email : emails) {
        StringBuilder name = new StringBuilder(30);
        char[] chars = email.toCharArray();
        int i = 0;
        while (i < chars.length) {
            if (chars[i] == '@') {
                break;
            }
            name.append(chars[i]);
            i++;
        }
        StringBuilder sb = new StringBuilder();
        for (char c : name.toString().toCharArray()) {
            if (c == '+') {
                break;
            }
            if (c == '.') {
                continue;
            }
            sb.append(c);
        }
        set.add(sb.append(email.substring(i)).toString());
    }
    return set.size();
}

主要差異在於使用String.indexOf()relaceAll()的效能都沒有直接依需求判斷每個字符是否為'.''+'來的快速,因為兩個方法中都會再跑迴圈。而改寫後對'.''+'的處理只要跑一次迴圈就好。

從這題了解到雖然Java String或其他工具等提供了好用的方法,但若要更好的表現,程式碼則要依問題去特別設計。



參考:

沒有留言:

張貼留言