普通视图

发现新文章，点击刷新页面。

昨天以前IPhysResearch

ICA

IPhysResearch

2001年1月1日 08:00

ICA and the Real-Life Cocktail Party Problem https://towardsdatascience.com/ica-and-the-real-life-cocktail-party-problem-6375ba35894b
https://www.geeksforgeeks.org/blind-source-separation-using-fastica-in-scikit-learn/

IPhysResearch
IAIFI Summer School & Workshop
The first annual IAIFI PhD Summer School will be held at Tufts University August 1—August 5, 2022, followed by the IAIFI Summer Workshop August 8—August 9, 2022. Website: https://iaifi.org/phd-summer-school.html The full summer school agenda: https://iaifi.org/summer-school-agenda View the full program, including contact info and abstracts for lightning talk speakers here: https://iaifi.org/talks/Summer-School_Program_2022.pdf You can access a GitHub repo with links to the tutorials we’ve held
2022年8月8日 09:00

IAIFI Summer School & Workshop

IPhysResearch

2022年8月8日 09:00

The first annual IAIFI PhD Summer School will be held at Tufts University August 1—August 5, 2022, followed by the IAIFI Summer Workshop August 8—August 9, 2022.

Website: https://iaifi.org/phd-summer-school.html

The full summer school agenda: https://iaifi.org/summer-school-agenda

View the full program, including contact info and abstracts for lightning talk speakers here: https://iaifi.org/talks/Summer-School_Program_2022.pdf

You can access a GitHub repo with links to the tutorials we’ve held so far, as well as some future tutorials: https://github.com/iaifi/summer-school-2022. We will continue adding tutorials here, so keep checking in!

Table of Contents

Recap: IAIFI Summer School Day 1 - August 1, 2022

Taco Cohen, Foundations of Geometric Deep Learning:
- Slides (Lectures 3, 8, and 10 from this link)
- Recording
Javier Duarte, Representations, networks, and symmetries for learning from particle physics data:
- Slides
- Recording
Denis Boyda, Tutorial for Foundations of Geometric Deep Learning
- Tutorial Colab
- Recording
Patrick McCormack (for Dylan Rankin), Tutorial for Model compression and fast machine learning in particle physics: Training Invariant Networks
- Tutorial Colab
- Recording (beginning at 1:08:00)

Recap: IAIFI Summer School Day 2 - August 2, 2022

Lightning Talks
- Slides
- Recording
Taco Cohen, Foundations of Geometric Deep Learning:
- Slides (Lectures 3, 8, and 10 from this link)
- Recording
Javier Duarte, Representations, networks, and symmetries for learning from particle physics data:
- Slides
- Recording
Denis Boyda, Tutorial for Foundations of Geometric Deep Learning
- Tutorial Colab
- Recording
Patrick McCormack (for Dylan Rankin), Tutorial for Model compression and fast machine learning in particle physics: Training Invariant Networks
- Tutorial Colab
- Recording
Yasaman Bahri, Deep learning in the large-width regime
- Slides: Will be posted to the Slack channel when available
- Recording

Recap: IAIFI Summer School Day 3 - August 3, 2022

Lightning Talks
- Slides + Slides
- Recording
Yasaman Bahri, Deep learning in the large-width regime
- Slides: Day 1 | Day 2
- Recording
Sven Krippendorf, Machine learning for beyond-the-standard-model physics
- Slides
- Recording
Anna Golubeva, Tutorial for Deep learning in the large-width regime
- Tutorial 1 Colab | Tutorial 2 Colab
- Recording
Career Panel
- Recording

Recap: IAIFI Summer School Day 4 - August 4, 2022

Lightning Talks
- Slides + Slides
- Recording
Sven Krippendorf, Machine learning for beyond-the-standard-model physics
- Slides
- Recording
Juan Carrasquilla, Machine learning for many-body physics
- Slides
- Recording
Siddharth Mishra-Sharma, Tutorial for Beyond-the-standard-model physics
- Tutorial 1 Colab | Tutorial 2 Colab
- Recording
Di Luo, Tutorial for Machine learning for many-body physics
- Recording

Recap: IAIFI Summer School Day 5 - August 5, 2022

Lightning Talks
- Slides + Slides
- Recording
Juan Carrasquilla, Machine learning for many-body physics
- Slides
- Recording
Di Luo, Tutorial for Machine learning for many-body physics
- Tutorial
- Recording

Recap: IAIFI Summer Workshop - August 8, 2022

Welcome and Introduction from Jesse Thaler
- Slides
- Recording
Sébastien Racanière, Generative models with symmetries for physics
- Slides
- Recording
Claudius Krause, Normalizing Flows at the LHC
- Slides
- Recording
Phil Harris, Learning Physics in the Latent Space
- Slides
- Recording
Greg Yang, The unreasonable effectiveness of mathematics in large scale deep learning
- Recording
Kazuhiro Terao, Machine Learning for analyzing big image data in neutrino experiments
- Recording
Cora Dvorkin, Mining Cosmological Data: Looking for Physics Beyond the Standard Model
- Recording

Recap: IAIFI Summer Workshop - August 9, 2022

Day 2 Introduction from Jesse Thaler
- Slides
- Recording
Fabian Ruehle, Machine learning for formal theory
- Slides
- Recording
Jennifer Ngadiuba, Boosting sensitivity to new physics at the LHC with anomaly detection
- Recording (apologies for missing audio at the beginning)
Siamak Ravanbakhsh, Learning with Unknown and Nonlinear Symmetry Transformations
- Slides
- Recording
Yi-Zhuang You, Machine Learning Renormalization Group and Its Applications
- Recording
Anna Golubeva, Understanding and Improving Sparse Neural Network Training
- Slides
- Recording
Shuchin Aeron, Towards learning generative models for high energy physics
- Recording

解决 GitHub 的 host 域名被限制的问题

IPhysResearch

2021年4月28日 08:00

在国内，有时候使用 git clone 的速度实在太慢而让人难以承受，或者 git pull/git push 一点反应都没有，这里整理了一下解决方法，亲测有效。

总的来说，这是因为 github.global.ssl.fastly.net 域名被限制了。只要找到这个域名对应的 ip 地址，然后在 hosts 文件中加上 ip–>域名 的映射，刷新 DNS 缓存便可。

1. 查找域名对应的 `ip` 地址

在网站 https://www.ipaddress.com/ 分别搜索 github.global.ssl.fastly.net 和 github.com

或者在本地的终端中如下键入：

$ nslookup github.global.ssl.fastly.Net
Server:  127.0.0.53
Address: 127.0.0.53#53

Non-authoritative answer:
Name: github.global.ssl.fastly.Net
Address: 151.101.229.194

$ nslookup github.com
Server:  127.0.0.53
Address: 127.0.0.53#53

Non-authoritative answer:
Name: github.com
Address: 13.229.188.59

2. 修改 `hosts` 文件

Windows 上的 hosts 文件路径在: C:\Windows\System32\drivers\etc\hosts。
Linux 的 hosts 文件路径在 /etc/hosts 中:
```
$ sudo vim /etc/hosts
```
Mac 的 hosts 文件路径也在 /etc/hosts 中:
```
$ sudo vi /etc/hosts
```

在 hosts 文件末尾添加两行（要照猫画虎哦~）

https://github.com 13.229.188.59
https://github.global.ssl.fastly.Net 151.101.229.194

3. 刷新 `DNS` 缓存

Linux：
```
$ sudo /etc/init.d/networking restart
```
Windows：
```
$ ipconfig /flushdns
```
Mac：
```
$ sudo killall -HUP mDNSResponder
```

4. 参考文献

(End)

[Paper Summary] Complete Parameter Inference for GW150914 Using Deep Learning

IPhysResearch

2021年4月9日 08:00

Please note that this post is for my future self to review the materials on this paper without reading it all over again.

Gravitational-wave Parameter Estimation with Autoregressive Neural Network Flows

We introduce the use of autoregressive normalizing flows for rapid likelihood-free inference of binary black hole system parameters …

Stephen R. Green, Christine Simpson, Jonathan Gair

Cite

Complete Parameter Inference for GW150914 Using Deep Learning

The LIGO and Virgo gravitational-wave observatories have detected many exciting events over the past five years. As the rate of …

Stephen R. Green, Jonathan Gair

Cite

One-sentence Summary

The paper describes a neural network architecture, based on normalizing flows alone, that is able to generate posteriors on the full $D = 15$ dimensional parameter space of quasi-circular binary inspirals, using input data surrounding the first observed GW event, GW150914, from multiple gravitational-wave detectors.

Code

For both 2008.03312 and 2002.07656：
- 👉 https://github.com/stephengreen/lfi-gw/

Background

Inference is extremely computationally expensive.
- Run times (MCMC & Nested-samping) for single posterior calculations typically take days for BBH systems and weeks for BNS.
- In the nearly future, event rates reach one per day or even higher.
Existing pipelines:
- LALInference
- Bilby
An advantage of likelihood-free methods is that waveform generation is done in advance of training and inference, rather than at sampling time as for conventional methods.

Model: normalizing flows

Specifically, a neural spline normalizing flow.

Conor Durkan, Artur Bekasov, Iain Murray, and George Papamakarios, “Neural spline flows,” (2019), arXiv:1906.04032 [stat.ML].
Conor Durkan, Artur Bekasov, Iain Murray, and George Papamakarios, “Neural spline flows,” https://github.com/bayesiains/nsf (2019).

Training

Waveform generation is too costly to perform in real time during training, so we adopt a hybrid approach:

we sample “intrinsic”parameters in advance and save associated waveform polarizations $h^{(i)}_{+,\times}$; at train time we sample “extrinsic” parameters, project onto detectors, and add noise. We used $10^6$ sets of intrinsic parameters, which was sufficient to avoid overfitting.

Prior

The authors consider $\left(m_{i}, \phi_{c}, a_{i}, \theta_{i}, \phi_{12}, \phi_{J L}, \theta_{J N}\right)$ to be intrinsic (10 parameters ($i=1,2$)), so they are sampled in advance of training (see figure):
- detector-frame masses
  - $m_i \sim U[10,80] \text{ M}_\odot$
- reference phase
  - $\phi_c\sim U[0,2\pi]$
- Spin magnitudes
  - $a_i\sim U[0,0.88]$
- spin angles
  - $\cos(\theta_{i})\sim U[1,-1], \phi_{12}\sim U[0,2\pi], \phi_{J L}\sim U[0,2\pi]$
- inclination angle
  - $\cos(\theta_{JN})\sim U[-1,1]$
During the training, they take a uniform prior over the “extrinsic” for sampling:
- time of coalescence
  - $t_{c,\text{geocent}}\sim U[-0.1,0.1]$
  - (taking $t_{c,\text{geocent}}=0$ to be the GW150914 trigger time)
- luminosity distance
  - $d_L\sim U(100,1000)\text{ Mpc}$
- polarization angle
  - $\psi\sim U[0,\pi]$
- sky position
  - $\alpha\sim U[0,2\pi],\sin(\delta)\sim U[-1,1]$.
All parameters are rescaled to have zero mean and unit variance before training (see figure).

Strain data

Assuming stationary Gaussian noise
IMRPhenomPv2 frequency-domain precessing model.
A frequency range of $[20, 1024]$ Hz
A waveform duration of 8 s, in which the trigger time at 6th s.
whiten $h^{(i)}_{+,\times}$ using the noise PSD estimated from 1024 s of detector data prior to the GW150914 event.
The whitened waveforms are compressed to a reduced-order representation using singular value decomposition. The authors keep the first $n_{SVD}=100$ components during training (see figure).
The authors pre-prepared a grid of time-translation matrix operators that act on vectors of RB coefficients for relative whitening and time shifting.
The data is also standardized to have zero mean and unit variance in each component (see figure).

Results

Coding using PyTorch
500 epochs
a batch size of 512
Initial learning rate: 0.0002 (using cosine annealing)
Reserved 10% of training set for validation
Traning took $\sim6$ days with an NVIDIA Quadro P400 GPU
Testing (see figure) produced samples at a rate of 5,000 per second.
The authors said that varied $n_{\text{SVD}}$ can be used to imporve the performance.

The authors show their benchmarked result comparing with bilby:
- Both distributions are clearly in very close agreement.
- Minor differences in the inclination angle $\theta_{JN}$.
- FYI: I have reproduced the result and see major differences in the inclination angle.

The authors claim that their model can be trained to generate any posterior consistent with the prior and the given noise PSD.
- FYI: I can’t reproduce a evidence to support this claim with other GW events for now.
- A P-P plot is shown to support this claim.
  - My comment: more artificial strain data (>200) and confidence intervals (eg: 95%) around the diagonal are needed.
  - FYI: I have shown 1-$\sigma$ confidence intervals for the P-P plot, and it seems not good enough.

Remark

One approach would be to condition the model on PSD information:

During training, waveforms would be whitened with respect to a PSD drawn from a distribution representing the variation in detector noise from event to event, and (a summary of) this PSD information would be passed to the network as additional context.

(PSD samples can be obtained from detector data at random times.)
As new and more powerful normalizing flows are developed by computer scientists in the future, they will be straight-forward to deploy to further improve the performance and capabilities of deep learning for gravitational-wave parameter estimation.

Appendix for 2002.07656

One slide for all points:

Appendix for 2008.03312

Research notes for the source code. (

Download the PDF )

Particle Swarm Optimization From Scratch Using Python

IPhysResearch

2021年3月7日 08:00

Demo script (Python) of particle swarm optimization (PSO) partly translated from SDMBIGDAT19 (MATLAB).

GitHub: https://github.com/iphysresearch/PSO_python_demo
Slide: https://slides.com/iphysresearch/pso
Video: https://www.bilibili.com/video/BV1kv411h7sC/

Mock Data Challenge

Given
- Training data containing only noise: TrainingData.mat
- Noise is Gaussian, stationary
- Data to be analysed: analysisData.mat
- Signal: Quadratic chirp
- Parameter search ranges:
  - 40 < a1 < 100
  - 1 < a2 < 50
  - 11 < a3 < 15
Detection and Estimation
- Detection: Is there a signal in the analysis data?
- Estimation: If so, estimate its parameters
- Use PSO to obtain generalized likelihood ratio test (GLRT) and maximum likelihood estimate (MLE)
- Recommended PSO parameters:
  - Best of 8 runs
  - Termination at 2000 iterations

Pre-requirements

Python 3.x
Numpy
Scipy
tqdm

How to use it

Just run demo.py script or demo.ipynb.

$ python demo.py

Reference

2021 Gravitational Wave Data Analysis School in China (Soumya D. Mohanty)
- https://github.com/mohanty-sd/GWSC
- https://github.com/BNUGW/GWSC21
Particle Swarm Optimization Visually Explained

Licence

Unit 3: Structure & Paragraphs（学术写作）

IPhysResearch

2020年12月1日 08:00

这是一门来自埃米编辑 (AiMi Editor)的《SCI 论文写作视频课程》的详细学习笔记。该公开课程的原英文名称是《Writing in the sciences》 on Coursera，共六个单元，由来自斯坦福大学 (Stanford University) 的 Dr. Kristin Sainani 老师主讲。

About the course:

This course teaches scientists to become more effective writers, using practical examples and exercises. Topics include: principles of good writing, tricks for writing faster and with less anxiety, the format of a scientific manuscript, peer review, grant writing, ethical issues in scientific publication, and writing for general audiences.

经过之前的两个单元（Unit1 Unit2）讲解，本讲就会谈谈如何改善句子结构，以及如何逐步建立强有力的段落。

3.1 Experiment with punctuation

首先，究竟如何使用一些关键的标点符号呢？

dash 破折号、colon 冒号、semicolon 分号、parenthesis 括号

Use them to vary sentence structure!

Sainani 老师在之前的课程中，一直在教如何从句子中去掉所有多余的单词，但这并不意味着是只希望你写简短的简单句子。通篇都是简短的简单句子是很单调，很无聊的。Sainani 老师鼓励我们的改变句子结构，写出来的句子有可能是简短的，但也还需要包括一些更长更复杂的句子。

如果只限于使用逗号和时态的话，多样化你的句子结构是困难的。为了使你的句子结构富有创造性和趣味性，以及复杂性，你就需要使用破折号、冒号、分号和括号。

例子：

But what really grabbed me about the film is that it shows how humans-through our ingenuity, our commitment to fact and reason, and ultimately our faith in each other - can science the heck out of just about any problem.

电影 (The Martian) 里真正吸引我的是它显示了人类——如何通过我们的创造力、我们对事实的承诺和理性、最终我们对彼此的信仰——使我们能从任何问题中解脱出来。

https://www.wired.com/2016/10/president-obama-guest-edits-wired-essay/
- 上面句子中把 science 当做动词来使用是很棒很可爱的。
- 这个句子有一个复杂而引人注目的结构：破折号。你可以在破折号之间抛出一个额外的想法、列表或小道消息，或者在这样一个句子的中间插入描述就行。
例子：

Original: Many types of cells and tissues develop a kind of directionality. Certain events happen toward one end of the cell or tissue or the other. It’s a phenomenon called cell polarity.

组织中的许多类型的细胞会发展出一种方向性。某些事件朝着细胞或者组织的一个或另一个方向发展。这是一种叫做细胞极性的现象。（这三个句子有些单调和简单，都有着相同的结构）

Using a colon: Many cells and tissues develop a kind of directionality called cell polarity: certain events happen toward one end of the cell or tissue.

许多细胞和组织发展出一种方向性，叫做细胞极性：某些事件发展朝向细胞或组织的一端。（用冒号来给出极性的定义，句子更有趣，更有效，更优雅）

Increasing power to separate:

逐步强化分隔的方向：

Strunk and White, The Elements of Style

Comma（逗号的分隔能力最小，停顿时间最短）
Colon（冒号）
Dash（破折号）
Parentheses（括号用来把一些多余的东西塞进句子中）
Semicolon（分号几乎是完全停止，因为它将两个相关的句子）
Period（空白有着最强烈的分隔效应，因为这表示完全停止了）

Increasing formality:

逐步更加正式的方向：

Strunk and White, The Elements of Style

Dash
Parentheses
The Others (Comma, Colon, Semicolon, Period)

这也就是为啥过去会有人不鼓励你使用破折号和括号，但其实可以使用貌似稍显不够正式的破折号和括号，这是因为稍微不那么正式也就意味着，你会更谨慎地使用它们，你不想做得太过分，比方说你不应该在每个句子中都有破折号或括号。

Semicolon

The semicolon connects two independent clauses.

分号是用来连接两个独立的句子（基本上是两个小句子）。

(Note: a clause always contains a subject and predicate; an independent clause can stand alone as a complete sentence.)

从句一般都包含有主语和谓语，即主语和谓语动词；一个独立的从句就可以表达一个完整的思想。

Example:

Kennedy could be a cold and vain man, and he led a life of privilege. But he knew something about the world; he also cared about it.

肯尼迪可能是一个冷漠虚荣的人，他过着特权般的生活。但是他对这个世界有所了解，他也很关心这个世界。

这个句子中的分号，换成逗号的话，会过分弱化 cared。换成句号的话，又会把 knew 和 cared 两个句子生生分隔开来。

Example:

It was the best of times; it was the worst of times.

这是最好的时候；这也是最糟糕的时候。

在上面的句子中，两个对立的观点用分号联系了起来。
Semicolons are also used to separate items in lists that contain internal punctuation.

分号还可以用来分隔列表中的项目，其中某项目里包含内部的标点符号时。

Example:

It happened because people organized and voted for better prospects; because leaders enacted smart, forward-looking policies; because people’s perspectives opened up, and with them, societies did too.

这是因为人们组织并投票支持更好的前景；因为领导人制定了明智的，前瞻性的政策；因为人们的观点开放了，社会也开放了。

注意到列表中的最后两项中有逗号，这就意味着你不能再使用逗号分隔列表中的项目，需要分号加以区别。

Parentheses

Parenthesis (parenthetical expression):

Use parentheses to insert an afterthought or explanation (a word, phrase, or sentence) into a passage that is grammatically complete without it.

括号用于插入事后思考，解释或者一些额外的细节。使用的关键是即使没有括号所呈现的材料，也要保证句子的语法是完整的。

$\rightarrow$ If you remove the material within the parentheses, the main point of the sentence should not change.

即使你完全去除这些括号中的材料，它也不应该改变句子的主旨。
$\rightarrow$ Parentheses give the reader permission to skip over the material.

如果读者愿意的话，他们是可以跳过括号内容部分的。

由此可见，括号里的内容是你可以输入的额外信息，或者是一个有趣但不重要的小道消息给读者。

Example:

They also have a specialized tail, kind of like a monkey’s tail, that allows them to cling to a piece of grass (or a lucky diver’s finger).

(Deborah Netburn, Seahorses are some of the strangest fish in the sea. Can their genome tell us why?, LA Times)
Example:

This is troubling because, while there are plausible biological stories to connect red meat with cancer and heart disease, it seems unlikely that eating too much red meat could directly cause accidents and injuries. (Unless, as one of my students quipped, red meat eaters are swerving to avoid cows!)

Colon

Use a colon after an independent clause to introduce a list, quote, explanation, conclusion, or amplification.

冒号总是会出现在一个分句之后。这也就意味着在冒号之前出现的内容是必须同时有主语和谓语动词的。冒号是在介绍什么东西，可以介绍一个列表，一个引言，一个解释，一个结论或者一个扩展。

老师安利说：

“The colon has more effect than the comma, less power to separate than the semicolon, and more formality than the dash.” —— Strunk and White

斯特伦克和怀特说，冒号比逗号更有效，分隔的能力比分号要小，形式比破折号更正式。

Example: (list or explanation)

The hydrogen bonds are made as follows: purine position 1 to pyrimidine position 1 ; purine position 6 to pyrimidine position 6 .

氢键的形成过程如下：从嘌呤位置 1 到嘧啶位置 1；从嘌呤位置 6 到嘧啶位置 6。

From: “A structure for Deoxyribose Nucleic Acid’-Watson and Crick 1953
Example: (explanation or amplification)

That’s one reason why I’m so optimistic about the future: the constant churn of scientific progress.

这就是我为什么对未来这么乐观：不断的科学进步。

The woman suffers from lack of experience and a chronic Democratic disease: compound sentences.

这个女人缺乏经验和慢性民主病：并列句。

上面这个例子中，通过使用冒号，我们可以提高读者的预期。我们正在为他们准备一个“(笑话最后的)画龙点睛之句”(punchline)。
Example: (quote, list of quotes)

冒号还可以引入列表和引号。

The “Ask not” line follows right after an exhortation modeled on Franklin Roosevelt’s “rendezvous with destiny”: “In the long history of the world, only a few generations have been granted the role of defending freedom in its hour of maximum danger. I do not shrink from this responsibility-I welcome it.” The note throughout is one of alarm: “The trumpet summons us again”; “the burden of a iong twilight struggle”; “that uncertain balance of terror.”

要留意上面例句中的第二个句子里的冒号，接了三个例子。

当对如何分享示例的时候，选三个例子往往是不错的。
- NOTE: The “rule of three’s” for lists and examples.
  
  Example:
  
  It happened because people organized and voted for better prospects; because leaders enacted smart, forwardlooking policies; because people’s perspectives opened up, and with them, societies did too.
Example: (to amplify or extend)

Use a colon to join two independent clauses if the second amplifies or extends the first.

如何你要用冒号来连接两个独立的从句，那么第二个独立的从句应该是在第一个从句的基础上进行扩展建立起来的。

Companies use Marsh for the same reason that home sellers use real-estate agents: The agent’s knowledge and experience is supposed to help the client get the right deal at the right price.

留意到冒号后的句子首字母是大写了的，为了让读者知道这是一个完整的句子。
EXAMPLE, what not to do!:

Two aspects of alcohol use are related to brain injuries: as a factor associated with risk of an injury such as a motor vehicle crash, and as a factor in TBI diagnosis, recovery, or survival after injury.

$\rightarrow$

Two aspects of alcohol use are related to brain injuries: its association with risk of injury, such as motor vehicle crash, and its post-injury influences on TBI diagnosis, recovery, or survival after injury.

酒精的使用与脑损伤有关的两个方面：与损伤风险和损伤后对诊断，恢复和生存的影响。

aspects 这次词是在暗示冒号后面是要罗列列表，所以冒号后面应该看到的是名词，而不是介词 as。
EXAMPLE, what not to do!:

In one project we have a nutritionist, a psychologist, statisticians, a computer specialist, and dietitians: a whole range of specialties.

$\rightarrow$

In one project we have a whole range of specialties: a nutritionist, a psychologist, statisticians, a computer specialist, and dietitians.

冒号前后的内容颠倒了。

Dash

Use the dash to add emphasis or to insert an abrupt definition or description almost anywhere in the sentence. Just don’t overuse it, or it loses its impact.

这是老师最喜欢的标点符号：破折号。你可以用破折号来增加强调，或者插入突然的定义或描述。你基本上可以把任何你想要的东西放在句子中间，把它用破折号隔开，读者是买账的。不过不要过度使用破折号，因为毕竟这个标点符号和其他相比稍显不正式的。如果过度使用，就会失去它本该有的作用。

老师继续安利到：

“A dash is a mark of separation stronger than a comma, less formal than a colon, and more relaxed than parentheses."——Strunk and White

斯特伦克和怀特说，破折号是一个比逗号更强的分隔符号，不像冒号那么正式，但比括号要轻松。

“Use a dash only when a more common mark of punctuation seems inadequate."——Strunk and White

使用破折号的时机：只有当一个更常见的标点符号似乎不够用的时候。

i.e. Reserve this tool for the really tough jobs!

换句话说，遇到困难再打开此锦囊！

Example

But my fellow Americans, whatever mix of motives led us to create an Electoral College majority for Donald Trump to become President —— and overlook his lack of preparation, his record of indecent personal behavior, his madcap midnight tweeting, his casual lying about issues like “millions” of voters casting illegal votes in this election, the purveying of fake news by his national security advisor, his willingness to appoint climate change deniers without even getting a single briefing from the world’s greatest climate scientists in the government hell soon lead, and his cavalier dismissal of the C.I.A.’s conclusions about Russian hacking of our election —— have no doubt about one thing: We as a country have just done something incredibly reckless.

但是我的美国同胞们，无论什么样交错复杂的动机导致了我们创造了唐纳德·特朗普成为选举团多数票下的美国总统——忽视了他缺乏准备，他的不良个人行为记录，他疯狂的午夜推特，关于他数百万个问题的随意谎言，在这次选举中投出非法选票的选民，关于他的国家安全顾问的假新闻，他认命气候委员会的…——毫无疑问说明了关于一件事：我们作为一个国家刚刚做了难以置信的鲁莽的事情。

(Thomas Friedman, New York Times)

这个例子说明了破折号有多神奇，读起来多么的顺畅。最后的冒号也很棒，把重点放在最后一个想法上。

之前的课程中提到，不要让句子的主语和谓语动词之间的间隔太大，然而上面的句子是违反这个原则的。如果用破折号的话，这是没有关系的。因为读者是可以在破折号后面很明显的找到动词的，这就是传说中的例外了。
Example: (emphasis)

The drugs did more than prevent new fat accumulation. They also triggered overweight mice to shed significant amounts of fat——up to half their body weight.

这些药物不仅能防止新的脂肪积累。它们还促使超重的老师大量减掉脂肪，高达他们体重的一半。
Example: (emphasis and added information)

Researchers who study shipworms say these mislabeled animals——they’re clams, not worms——are actually a scientific treasure.

研究船虫的研究人员说，这些贴错标签的动物——它们是蛤，不是蠕虫——实际上是一种科学宝藏。

上面的句子中解释了“贴错标签”是什么意思，并且强调了一个小事实。

What would happen if I used commas or parentheses rather than dashes in these two examples?

如果上面两个例子用的是括号而不是破折号呢？

Commas instead…

The drugs did more than prevent new fat accumulation. They also triggered overweight mice to shed significant amounts of fat, up to half their body weight. (loss of emphasis, more clunky)

Researchers who study shipworms say these mislabeled animals, they’re clams, not worms, are actually a scientific treasure. (commas aren’t strong enough to set off a clause)

Parentheses instead…

The drugs did more than prevent new fat accumulation. They also triggered overweight mice to shed significant amounts of fat (up to half their body weight). (buries the information)

Researchers who study shipworms say these mislabeled animals (they’re clams, not worms) are actually a scientific treasure. (buries the information)

老师强行安利了一波红袜队棒球。。。。终于给了如下的例子：

Example

Baseball is the only game that’s played every day, which is why its season often seems endless, right up to the inning and the out——the little toss over to first base——when, wow, it ends.

References/citations

Strunk and White’s classic, The Elements of Style, http://www.bartleby.com/141/
Examples from:
- Barrack Obama, Watson & Crick, Dickens, Michael Tomasky, Deborah Netburn, Fareed Zakaria, James Suroweiki, Nathan Seppa, Louis Menand, Joe Klein, Roger Angell

3.2 Practice, colon and dash

Colon: Practice 冒号的练习

Evidence-based medicine teaches clinicians the practical application of clinical epidemiology, as needed to address specific problems of specific patients. It guides clinicians on how to find the best evidence relevant to a specific problem, how to assess the quality of that evidence, and perhaps most difficult, how to decide if the evidence applies to a specific patient.

循证医学教会临床医生实践临床流行病学的应用，根据需要解决特定患者的特定问题。它知道临床医生如何找到特定问题，如何评估证据的质量，也许是最困难的。如何确定证据是否适用于特定患者。

要注意的是，当你想要用同义词的时候问自己：我真的需要这个词的第二个实例么？其实可以不用重复，我们可以把这两个句子合并在一起。后句中的列表用冒号来引出更好。

Evidence-based medicine teaches clinicians the practical application of clinical epidemiology: how to find the best evidence relevant to a specific problem, how to assess the quality of that evidence, and how to decide if the evidence applies to a specific patient.

循证医学教会临床医生实际应用临床流行病学：如何发现….，如何评估…..和如何决定…..。

Evidence-based medicine teaches clinicians how to find the best evidence relevant to a specific problem, how to assess the quality of that evidence, and how to decide if the evidence applies to a specific patient.

Dash：Practice 破折号的练习

Finally, the lessons of clinical epidemiology are not meant to be limited to academic physicianepidemiologists, who sometimes have more interest in analyzing data than caring for patients. Clinical epidemiology holds the promise of providing clinicians with the tools necessary to improve the outcomes of their patients.

所以这篇文章最后指出，临床流行病学的教训仅限于学术医师流行病学家，他们有时对分析数据比关心病人更感兴趣。临床流行病学有向临床医师提供改善患者预后所需的工具。

下划线是一个很长的描述性从句，用破折号来过渡解决。

Finally, clinical epidemiology is not limited to academic physician-epidemiologists——who are sometimes more interested in analyzing data than caring for patients——but provides clinicians with the tools to improve their patients’ Outcomes.

上面的句子更平滑更有效的传递了信息。

3.3 Parallelism

用平行结构写句子。

Pairs of ideas joined by “and”, “or”, or “but” should be written in parallel form.

这意味着 “and”，“or” 和 “或” 两侧的观点部分必须遵循相同的语法结构。

Eg:

The velocity decreased by $50 %$ but the pressure decreased by only $10 %$.

SVX but SVX

老师力荐了一本书：Mimi Zeiger, Essentials of Writing Biomedical Research Papers, McGraw-Hill。说这是一本很好的资料，专门针对科学写作。

Eg：

We aimed to increase the resolution and to improve picture quality.

Infinitive phrase and infinitive phrase.

Lists of ideas should be written in parallel form.

观点列表也需要以并行形式编写。

Unparallel:

Locusts denuded fields in Utah, rural Iowa was washed away by torrents, and in Arizona the cotton was shriveled by the placing heat.

Parallel:

Locusts denuded fields in Utah, torrents washed away rural Iowa, and blazing heat shriveled Arizona’s cotton.

From: Strunk and White. The Elements of Style

Make a choice and stick to it!

选择好了以后就一定要坚持使用下去！

Parallel example:

NASA’s intrepid Mars rover, Curiosity, has been through a lot in the past year. It flew 354 million miles, blasted through the Mars atmosphere, deployed a supersonic parachute, unfurled a giant sky crane, and touched down gently on the surface of Mars.

Citation: Jenny Marder. “Mars Curiosity Rover Gets ‘Brain Transplant,’ Prepares for Mountain Trek”, pbs.org

Eg：

Not Parallel:

If you want to be a good doctor, you must study hard, critically think about the medical literature, and you should be a good listener.

Parallel:

If you want to be a good doctor you must study hard, listen well, and think critically about the $^{*}$ medical literature. (imperative, imperative, imperative)

Parallel:

If you want to be a good doctor, you must be a good student, a good listener, and a critical thinker about the medical literature. (noun, noun, noun)

Eg：

Not Parallel:

This research follows four distinct phases: (1) establishing measurement instruments (2) pattern measurement (3) developing interventions and (4) the dissemination of successful interventions to other settings and institutions.

Parallel:

This research follows four distinct phases: (1) establishing measurement instruments (2) measuring patterns (3) developing interventions and (4) disseminating successful interventions to other settings and institutions.

3.4 Paragraphs

Paragraph-level tips

1 Paragraph = 1 idea

把“段落”看作是你手稿的基本组成单位，每一段应该包含一个重要思想。当你整理手稿的时候，你应该分段思考。科学家们常常试图把太多的想法塞进一个段落中，结果就是些很难读懂的长段落。

用短段落文字向读者描绘和展现。当你转向一个新的想法时候，你的段落应该是简短的。如果你想像期刊杂志那样学习专业的写作，你可能会发现他们的一个段落中有 2，3，4 或 5 句话。短段落不仅可以更加专注，而且还可以在页面上提供大量的空白空间。要知道读者是喜欢留白的。如果让一个读者看到一个聚到的文本块而没有任何间隔停顿，那真的会很糟糕。作为一个读者的时候，你会发现这样的文章读起来是很乏味，也很难读懂的。在整个页面上，读者面对短段落和页面空白是心情愉悦的。
Give away the punch line early.

要早点给出你的核心要点/重点结论。

科学家通常有自己的想法，他们喜欢呈现细节，数据，支撑数据，还有结论等。科学家就是这样想的。但是当你写东西的时候，建议你反其道而行之。

首先让读者知道你要进行到哪里去。当他们不知道重点是什么时候，读者是很难先把所有看到的细节删掉的。在新闻界 (journalism)，我们称之为倒金字塔式。你从最重要的一个点开始，通常这是重要的信息，然后你要用支持性的观点逐步开始向下展开。这似乎很像是要提前给出主题句。而这个核心要点的意思是希望每一段都能准确地说明了该段的目的，所以不要极端的走到写主题句的方法上，但是你需要知道你段落的重点在哪里，而且你确实需要尽早把你的读者引向这一点。
Paragraph flow is helped by:
- Logical flow of ideas
  
  就段落流来说，主要是依靠良好的逻辑来使你的段落流畅起来。你的读者应该能够通过你的引导而跟随着你的段落流动，了解到你有条理和合乎逻辑的想法。如果你使用逻辑来阐述，你就不需要给你的读者很多标记和指向。
  
  想要让你的句子自然地从一个流向下一个的话，不放参考下面的方法：
  
  logical flow of ideas:
  - Sequential in time (avoid the Memento (记忆碎片) approach!)
    
    通常按照时间顺序进行。这是可预测的并且易于理解的。
  - General $\rightarrow$ specific (take-home message first!)
    
    从宏观抽象到微观具体的顺序。
  - Logical arguments (if a then b ; a; therefore b)
    
    也可以考虑形式逻辑参数的顺序来组织提出的逻辑论点。
- parallel sentence structures
  
  也可以使用平行句子结构来帮助实现段落流。一种策略就是给相邻的句子一个匹配结构。
- if necessary, transition words
  
  不要依赖过渡词。有些科学家总是会过渡使用过渡词，有时是用一个过渡词来开始每个句子，也会把过渡词当做拐杖来弥补他们有缺陷的逻辑。这是不管用的。要知道过渡词并没有强大到可以修复根本不合理的逻辑。另外，不要对你的过渡词太陌生，你会注意到很多专业人士都非常爱写 “but”这个过渡词。这其实是一个很好的方式企图去“换挡”。你并不需要太花哨的词语，或者另一方面，就使用 “but” 就好。
  
  Sainani 老师讲她倾向于只用两个过渡词：
  - “but”：为了向读者表明我正在切换模式；
  - “and”：表明我正在填充一些附加信息。
Your reader remembers the first sentence and the last sentence best. Make the last sentence memorable. Emphasis at the end!

记住你的读者会记住你的第一句和最后一句。所以，如果你想让这些句子令人难忘，那么你就要对最后一句要强调的话多费点心才行，这样可以让整个段落出彩。

Good example

(From Wired)

This kind of progress hasn’t happended on its own. It happened because people organized and voted for better prospects; because leaders enacted smart, forward-looking policies; because people’s perspectives opened up, and with them, societies did too. But this progress also happened because we scienced the heck out of our challenges. Science is how we were able to combat acid rain and the AIDS epidemic. Technology is what allowed us to communicate across oceans and empathize with one another when a wall came down in Berlin or a TV personality came out. Without Norman Borlaug’s wheat, we could not feed the world’s hungry. Without Grace Hopper’s code, we might still be analyzing data with pencil and paper.

过渡词 But 开始转换模式，引出了此段的主要观点 (下划线)。整体段落流畅，不依赖过渡词。
总体上非常流畅，从一般到具体引出。（世界进步的一般原因->一个特殊的原因：科学）
斜体部分是非常棒的两组平行/并行结构，非常优雅。

3.5 Paragraphs Editing I

Example:

Most scents remain constant in their quality over orders of magnitude of concentration (12). Nevertheless, at high concentrations, quality tends to be negatively correlated with intensity, as was the case, for example, for the cinnamon oil used in this study. Hence, reliability of absolute scorings was achieved by calibrating the amount of perfume ingredients with initial ratings for intensity against a reference substance of known concentration. The final concentrations were in principal chosen in a way such that individual ratings showed variance among participants within the sliding scale between 0 and 10 (meaning that people could decide whether they liked a scent or not). This procedure seemed successful for most scents; however, the concentrations for bergamot (highest average ratings) and vetiver (lowest average rating) could probably been reduced even more, as both scents did not show any discriminating power at the level of common alleles (people agreed largely on the quality of these two scents) (see Table 2 ). Interestingly, the pooled rare alleles showed discriminating power for…

Word count: 212

注意到所有以过渡词 (Nevertheless, Hence, however, Interestingly) 开头的句子。似乎是一直在告诉读者你要引导读者去何处，这通常表示底层逻辑有问题。
拼写错误：in principal 应该是 in principle。
下划线的括号内容是很有趣的，简单明了。
What’s the paragraph trying to convey? (outline)…
1. Were the perfume concentrations in the experiment appropriate? (Main idea of the paragraph)
  
  A. If the concentration is too high, the smell may be too overpowering and this may affect quality ratings.
  - This is not a problem here because we standardized intensity.
  B. The concentrations are appropriate if they produce sufficient variability in quality ratings.
  - This appeared true for most scents, with two exceptions.

我们要去除所有那些分散关键点的杂乱句子，只需要了解上面的要点。

Perfume intensity and quality are negatively correlated at high concentrations: If the scent is too strong, people will rate it unfavorably. Hence, we chose the final concentration of each perfume ingredient so that it had similar intensity to a reference scent (1butanol). The resulting concentrations appeared appropriate for most scents, as participants’ preferences varied along the sliding scale between 0 and $10 .$ However, participants largely agreed on bergamot (highest average ratings) and vetiver (lowest average rating), so lower or higher concentrations may have been needed for these scents.

Word count: 91

香水浓度和质量在高浓度时呈负相关。如果气味太浓，人们会认为它不好喝。因此，我们选择了每种香水成分的最终浓度，它的强度与参考气味相似。产生的浓度似乎适合大多数气味，当参与者的偏好在 0 到 10 之间的滑动范围弄变化时。然而，参与者基本同意佛手柑和香根草，所以这些气味可能需要更低或更高的浓度。

Example (略)

Although the methodological approaches are similar, the questions posed in classic epidemiology and clinical epidemiology are different. In classic epidemiology, epidemiologists pose a question about the etiology of a disease in a population of people. Causal associations are important to identify because, if the causal factor identified can be manipulated or modified, prevention of disease is possible. On the other hand, in clinical epidemiology, clinicians pose a question about the prognosis of a disease in a population of patients. Prognosis can be regarded as a set of outcomes and their associated probabilities following the occurrence of some defining event or diagnosis that can be a symptom, sign, test result or disease.

Word count: 111

What’s the paragraph trying to convey? (outline)…
1. Classic and clinical epidemiology differ (Main idea of the paragraph)
  
  A. Classic epidemiology is about disease etiology and preventing disease
  - Etiology is about this.
  (Supporting ideas $\rightarrow$ specifies of how they differ)
  
  B. Clinical epidemiology is about improving prognosis.
  - Prognosis is about this.
  (Sub-supporting ideas $\rightarrow$ definitions)

Despite methodologic similarities, classic epidemiology and clinical epidemiology differ in aim. Classic epidemiologists pose a question about the etiology of disease in a population of people; etiologic factors can be manipulated to prevent disease. Clinical epidemiologists pose a question about the prognosis of a disease in a population of patients; prognosis is the probability that an event or diagnosis will result in a particular outcome.

Word count: 65

尽管在方法上很相似，经典流行病学和临床流行病学的目的不同。经典的流行病学家提出了一个关于病因的问题在这里的人群中。病因可以被控制以预防疾病。临床流行病学家提出一个关于疾病预后的问题在一个病人群体中。人后我会改变预后的定义。预后是指事件或诊断将导致特定的结果的概率

3.6 Paragraphs Editing II

Example (略)

The concept of chocolate having potential therapeutic benefits for people with diabetes mellitus, especially type 2 diabetes mellitus, presents a number of intellectual challenges, from both clinical and sociological perspectives. It seems almost counterintuitive to suggest an energy-dense food that is high in sugar, and often seen as a treat or a “dietary $\sin ^{\prime \prime}$, could offer such promise. However, a large volume of mechanistic and animal model studies has been undertaken demonstrating the potential benefits of cocoa and chocolate for both glucose regulation and modification of complications associated with diabetes. Cesar Fraga in the American Journal of Clinical Nutrition first proposed the potential of chocolate for people with diabetes in $2005 .$ It was suggested that we should consume more cocoa and chocolate to reduce the burdens of hypertension and diabetes. (1) Grassi and colleagues (2) further reinforced this potential for its antihypertensive and insulin-sensitizing effect with the mechanistic data. However, the hypothesis of chocolate having a beneficial effect remains counterintuitive to the average consumer and has yet to gain support among the wider medical and healthcare community.

Word count: 177

Many mechanistic and animal studies suggest health benefits for cocoa and chocolate, particularly for patients with hypertension and type 2 diabetes mellitus. These studies suggest that cocoa and chocolate can lower blood pressure, improve glucose regulation, improve insulin sensitivity, and reduce complications from diabetes. But the idea of chocolate as medicine has yet to gain widespread support among consumers or among the wider medical and healthcare community. It seems counterintuitive that a high-sugar, energy-dense food-one often seen as a treat or “dietary sin” could promote health.

Word count: 87

Example (略)

Headache is an extraordinarily common pain symptom that virtually everyone experiences at one time or another. As a pain symptom, headaches have many causes. The full range of these causes were categorized by the International Headache Society (IHS) in 1988 . The IHS distinguishes two broad groups of headache disorders: primary headache disorders and secondary headache disorders. Secondary headache disorders are a consequence of an underlying condition, such as a brain tumor, a systemic infection or a head injury. In primary headache disorders, the headache disorder is the fundamental problem; it is not symptomatic of another cause. The two most common types of primary headache disorders are episodic tension-type headache (ETTH) and migraine. Although IHS is the most broadly used/recognized classification system used, a brief comment on others would be appropriate - especially if there are uses that have epidemiologic advantages.

Word count: 139

Headache is a pain symptom that almost everyone experiences. The International Headache Society (IHS) groups headaches into two types based on cause: primary headache disorders and secondary headache disorders. In primary headache disorders, the headache itself is the main complaint. The two most common types of primary headache disorder are episodic tension-type headache (ETTH) and migraine. Secondary headache disorders result from an underlying condition, such as a brain tumor, a systemic infection, or a head injury.

Word count: 76

3.7 A few more tips: repetition, key words, and acronyms

重复，关键词，首字母缩写词。

A note on repetition…

When you find yourself reaching for the thesaurus to avoid using a word twice within the same sentence or even paragraph, ask:

Is the second instance of the word even necessary?

当你发现自己在为了避免重复而搜寻一个词的同义词库时，这种情况下你可能根本不需要这个词的第二个实例。比方说：Challenges/difficulties, illustrate/demonstrate, teaches clinicians/guides clinicians.
If the word is needed, is a synonym really better than just repeating the word?

有时候你确实需要重复，那就要重复你论文中的任何关键词！比如: names of comparison groups, variables, or instruments….

Needless synonyms!

下面是一些有趣的不必要同义词的例子：

To avoid repetition, writers have needlessly (and amusingly) come up with the following synonyms:

Banana $\rightarrow$ “the elongated yellow fruit”

Beaver $\rightarrow$ “the furry, paddle-tailed mammal”

Mustache $\rightarrow$ “under-nose hair crops”

Milk from a cow $\rightarrow$ “the vitamin-laden liquid” from a “bovine milk factory”

Skis $\rightarrow$ “the beatified barrel staves”

Examples compiled in: “The Press: Elongated Fruit - TIME.” Time. 10 Aua. 1953. Web. 19 Feb. 2012.

For more, see: Henry W. Fowler on “Elegant Variation”: https://www.bartleby.com/116/302.html

Disastrous synonyms!

Whereas it’s just amusing or inelegant in some types of writing, in scientific writing it’s a disaster.

The reader may think you are referring to a different instrument, model, group, variable, etc.

在科学写作中，如果你用同义词替换一个关键词，那不仅好笑，实际上是灾难性的。因为读者会认为你说的是不同的东西。

Acronyms/Initialisms

It’s OK to repeat words. Resist the temptation to abbreviate words simply because they recur frequently! (recall: miR instead of microRNA)

Use only standard acryonyms/initialisms (e.g., RNA). Don’t make them up!

If you must use acronyms, define them separately in the abstract, each table/figure, and the text. For long papers, redefine occasionally (as readers don’t typically read start to finish).

只使用标准的众所周知的缩写词。

Unnecessary acronyms/initialisms

Eg:

Spinal muscle fatigue is common in people with LLA, because decreased spinal muscle endurance and strength has been reported in persons with TFA and TTA with LBP.

Unit 1: Introduction; principles of effective writing（学术写作）

IPhysResearch

2020年11月23日 08:00

About the course:

1.1 Introduction

首先，我们先来问一个问题：

What makes good writing?

怎样才能写出好文章？

Good writing communicates an idea clearly and effectively.

好的写作需要清晰有效地传达一个想法

显然这很重要，因为科学写作的重点是把你的研究成果传达给其他科学家，政策制定者，有时甚至对公众。要点总结如下：

Takes having something to say and clear thinking.

清晰地写作只需要有话要说和清晰的思考。

作为一个合格的科学研究者，这一部分显然应该是容易做到的。还有另一个优秀的写作要素：

Good writing is elegant and stylish.

好的写作的文字很漂亮，文雅时尚。

要注意的是，这是人们通常实际写作过程中会担心的部分，花了太多时间关注这一部分，以至于忘记了只是想把他们的想法清楚有效地表达出来，这会导致写作中的各种问题。要点总结如下：

Takes time, revision, and a good editor!

事实上，这个优雅而时尚的部分不会出现在初稿上。即使是专业作家，优雅而时尚的写作也会在修订中才出现。

综上，当你写初稿的时候，只需担心以一种清晰、合乎逻辑和高效的方式传达这个想法。

What makes a good writer?

究竟是什么造就了一个好作家呢？

以下是人们对于这个问题的一些常见的误解和迷思：

Inborn talent？需要天生的天赋？
Years of English and humanities classes? 需要花好几年的时间在英语和人文学科上？
An artistic nature? 需要某种艺术性？
The influence of alcohol and drugs? 需要收到酒精和毒品的影响？
Divine inspiration? 需要某种神圣的灵感？

事实上，Sainani 老师讲以下要素才是一个好作家所需要的：

Having something to say. (你需要说些什么)

你需要有一些你对它充满激情的东西去表达出来，要搞清楚自己想要说什么。
Logical thinking. (清晰逻辑的思考)

你必须能够以合乎逻辑的方式提出你的论点，尤其是在科学写作中。

显然，这对于科学家来说，并不困难。
A few simple, learnable rules of style (the tools you’ll learn in this class)

一些简单的可习得的写作风格规则。

Sainani 老师尤为强调的是：

Good writing is a skill. Good writing can be learned!

好的写作不过是一种技巧。好的写作是可以被学习的！

Steps to becoming a better writer

In addition to taking this class, other things you can do to become a better writer:

Read, pay attention, and imitate.

要多阅读，注意留意专业作家的写作方式和技巧，并尝试模仿它们。

阅读是学习成为一个更好的作家的好办法。多阅读专业优秀的作品，如杂志和非小说类书籍，不一定是科学文献。注意专业作家的写作方式和使用的一些技巧，并尝试模仿它们。总之，就是在科学文献之外尽可能多地阅读。
Write in a journal.

写日记。

写作时一种技能，你练习的越多，你会变得越好。如果你有一点时间，在每一天的开始或结束时，试着写日记。不管是老式日记还是电子日记，试着多花几分钟时间来练习一些学到的写作技巧。

我计划在 Medium 上开博写日记！
Let go of “academic” writing habits (deprogramming step!)

放弃在学术界呆的太久可能养成的一些坏习惯（去编程阶段）
Talk about your research before trying to write about it.

写你的研究之前，先把你的研究说出来。

在坐下来写你的研究之前，最好的建议是向某个人，一个不一定在你的学科范围内的朋友，说出来。通常，当我们谈论我们的研究时，我们会用更对话的语调和更简单的术语来表达。事实上，这比我们坐下来写作的时候更能表达我们的想法。
Write to engage your readers —— try not to bore them!

当你坐下来写手稿时，能积极尝试不要让读者感到厌烦。

你自己阅读文献也有感到害怕和很乏味的经历，文章可能很难被读懂而变得枯燥。其实，可以用一种更有趣、更生动和有趣的方式来写作。
Stop warting for “inspiration”.

停止等待灵感。

这就是一种拖延症罢了。不要矫情！立马坐下来写！
Accept that writing is hard for everyone.

要意识到写作对每个人都很难。

即使是专业作家，也是如此的。
Revise. Nobody gets it perfect on the first try.

要重视修改和校对。没有人第一次初稿就能写的很好。

很多科学家没有花足够的时间进行修改，即使真的很担心初稿，并试图把它写得完美，但是没有给修改和校对以足够的权重。要快速的先写出来，然后把重点都放在修改上。优雅的部分就是在修订版上出现的，不是初稿上。
Learn how to cut ruthlessly. Never become too attached to your words.

学会如何无情地删减。不要过于依恋自己的话。

你必须学会如何做一个无情的编辑。
Find a good editor!

找一个好的编辑！

身边的任何人，只要他们愿意，都可以成为好的编辑。最好是你学科之外的人，他们可以看你的工作，给出一些反馈，告诉你它是不是写在他们能理解的水平上。如果无聊的话，告诉你哪里很混乱。
Take risks.

在写作中冒险。

去写些有趣的东西，放一些挑衅的（provocative）东西。作为一名作家，要冒点险找到自己的声音。

1.2 Examples of what not to do

如下是一些具体的例子，是科学文献的代表，也是反例。

Case 1

This was the first sentence of an article in the Journal of Clinical Oncology (Introduction section):

“Adoptive cell transfer (ACT) immunotherapy is based on the ex vivo selection of tumor-reactive Iymphocytes, and their activation and numerical expression before reinfusion to the autologous tumor-bearing host.”

上面👆这句话好难读的。对读者太不友好。如何判断某句话是否容易理解呢？

Ask Yourself:
- Is this sentence easy to understand?
  
  这句话容易理解吗？
- Is this sentence enjoyable and interesting to read?
  
  这句话读起来有趣嘛？

上面例子用了笨重名词（加粗的单词），这在学术写作中非常普遍，但它使文字难以阅读。

要记住：

Verbs drives sentences whereas nouns slow them down.

句子是用动词来驱动起来的，而名词会拖累句子信息的表达。

Case 2

“These findings imply that the rates of ascorbate radical production and its recycling via dehydroascorbate reductatse to replenish the ascorbate pool are equivalent at the lower irradiance, but not equivalent at higher irradiance with the rate of ascorbate radical production exceeding its recycling back to ascorbate.” (from Photochemistry and Photobiology…)

Ask Yourself:
- Is this sentence readable?
  
  这句话可读吗？
- Is it written to inform or to obscure?
  
  这写的是传递了信息还是晦涩了信息？（含糊其辞？装模作样？）

这个例子同样的问题，难读啊，笨重的名词是可以改为动词的，如下面修改后的例子：

“These findings imply that, at low irradiation, ascorbate radicals are produced and recycled at the same rate, but at high irradiation, they are produced faster than they can be recycled back to ascorbate.”

这样就容易理解多了，而且可以知道作者想说的要点。而且上面句子长度还变短了。

Themes of this course

Complex ideas don’t require complex language.

即使我们写的是关于科学的，复杂的和技术上的东西，但并不意味着我们必须使用复杂的语言。及使用简单的语言，我们也可以表达复杂和技术的思想。
Scientific writing should be easy and even enjoyable to read!

我们的目标是写一些容易理解的东西，以及对读者来说是愉悦的。

Sainani 老师进一步价值观输出：

“My professor friend told me that in his academic world, ‘publish or perish’ is really true. He doesn’t care if nobody reads it or understands it as long as it’s published.”

From: Anne Ku. “The joys and pains of writing and editing,” Le Bon Journal, 2003

http://www.bonjournal.com/volume2/issue1writing.pdf

由于有出版的压力，你写作时候会不在乎你的工作是否有人读，这对科学发展的目标是不好的，是不对的。要在乎读者的阅读感受，这样才有可能读者了解到你文章中的想法，更有可能引用你的工作，也更有可能推动科学的发展。

1.3 Overview: Key principles of effective writing

One more example!

Dysregulation of physiologic microRNA (miR) activity has been shown to play an important role in tumor initiation and progression, including gliomagenesis. Therefore, molecular species that can regulate miR activity on their target RNAs without affecting the expression of relevant mature miRs may play equally relevant roles in cancer.

From an article in Cell.

这段是很难读的，必须努力弄清楚作者想说什么。

Dysregulation of physiologic microRNA (miR) activity has been shown to play an important role in tumor initiation and progression, including gliomagenesis. Therefore, molecular species that can regulate miR activity on their target RNAs without affecting the expression of relevant mature miRs may play equally relevant roles in cancer.

Note the use of nouns instead of verbs .

首先，可以看到有些名词是可以用动词表达的情况（下划线的单词）。要留意之前讲的要点：

Verbs move sentences along, whereas nouns slow the reader down.

动词可以使句子动起来，而名词会让读者读的更慢。

Note the use of vague words.

这个句子还有一些含糊其辞的问题（加粗的单词），读者脑中难以画出作者所说的具体情况。这些含糊其辞的词没有任何补充说明。
Note the use of unnecessary jargon and acronyms.

还有句子中有不必要的行话和缩写词的情况（斜体的单词）。缩略语的问题在于，除非它们是标准术语，每个人都很熟悉，大多数读者不会知道你的缩写，读者不得不停下来查一下出处，减慢了读者的阅读速度。
Note the passive voice.

使用被动语态（粗斜体的部分）很难读懂，因为它不是我们说话的方式。
Note the distance between the subject and the main verb of this sentence.

Dysregulation of physiologic microRNA (miR) activity has been shown to play an important role in tumor initiation and progression, including gliomagenesis. Therefore, molecular species that can regulate miR activity on their target RNAs without affecting the expression of relevant mature miRs may play equally relevant roles in cancer.

主语和谓语之间的描述性定语太长，读者阅读的时候会等待谓语动词的出现。直到读者等到了你的动词，读者也还是不知道这句话的意思。所以句子的主语和谓语动词的距离太长了，也是个问题。

下面，我们来看一下 Sainani 老师是如何改写这个段落的（未必完全 recover 作者的意图）：

Changes in microRNA expression play a role in cancer, including glioma. Therefore, events that disrupt microRNAs from binding to their target RNAs may also promote cancer.

显然，这段话更短而且更容易理解，同时传达了相同的理解。

Principles of effective writing

Cut unnecessary words and phrases; learn to part with your words!

删掉不必要的单词和短语，避免混乱！
Use the active voice (subject + verb + object)

要用主动语态，而不是被动语态。
Write with verbs: use strong verbs, avoid turning verbs into nouns, and don’t bury the main verb!

要用强动词，避免把动词变成名词，也不要把主谓语动词埋没了！

1.4 Cut the clutter

去除杂乱

这一讲，我们将了解到如何从写作中去除杂乱。Sainani 老师引用了下面来自威廉·津瑟的古典写作书籍《好好写》：

“The secret of good writing is to strip every sentence to its cleanest components. Every word that serves no function, every long word that could be a short word, every adverb that carries the same meaning that’s already in the verb, every passive construction that leaves the reader unsure of who is doing what —— these are the thousand and one adulterants that weaken the strength of a sentence. And they usually occur in proportion to the education and rank.”

—— William zinsser in On Writing Well, 1976

好的写作的秘诀是把每一个句子都剥得很干净。有一千零一种减弱句子力度的累赘物：每一个无用的词，每一个可被简化的词，每一个已由动词表达其义的副词，每一个要让读者猜测施动者的被动结构。地位和文化水平越高的，越容易犯那些个毛病。

——威廉·津瑟《好好写》，1976

这是本很不错的书，如果有时间的话就读读看。

Example 1

“This paper provides a review of the basic tenets of cancer biology study design, using as examples studies that illustrate the methodologic challenges or that demonstrate successful solutions to the difficulties inherent in biological research.”

这一部分 Sainani 老师讲的超级细，我来一点一点的解释：

This paper provides a review of : review 是一个有趣的动词，但是当无聊的名词用了，还要配上无聊的动词 provides。直接写：The paper reviews 多好？
of the basic tenets of：这是个含糊不清的词组。它不会给读者增加任何东西，也并不能帮助读者理解这里发生了什么。所以果断删掉。
using as examples studies that illustrate ：“举例说明”，这句话说的好尴尬😓。examples 的含义和 studies 是一模一样的，二者重复了。所以放弃 studies，改写为：using examples that illustrate。
methodologic challenges：methodologic “方法论上讲” 这个又是模棱两可的词。这太宽泛，对读者来说毫无意义。前文已经说了我们在讨论研究设计，所以"方法论"是隐含的。所以直接删掉 methodologic。
demonstrate 和 illustrate 的含义是相同的。作者可能是为了避免重复而特意取了一个同义词在这里，但是没有必要的，因为在这里是不需要这个词的第二个实例的。在这里的情况下，illustrate 这个词可以延续到 challenges 和 solutions 的。所以可以写成：illustrate both challenges and solutions。
successful 这个词被删掉了，这是因为不存在不成功的 solution啦，这个词含义已经在 solution 里了。
to the difficulties inherent in biological research：这里还是重复了，因为上文已经说了 challenges 那么自然就是 difficulties 了。这里可能是作者又自己 YY 出来的一个避免重复的同义词在作祟。同时，inherent in biological research 这部分其实是不必要的，并不会增加任何实质信息和内容，因为前文我们知道一直都是在讨论生物学研究。所以，这句整个都删掉好了。

综上，我们有如下修改后的新例句：

This paper reviews cancer biology study design, using examples that illustrate specific challenges and solutions.

Example 2

“As it is well known, increased athletic activity has been related to a profile of lower cardiovascular risk, lower blood pressure levels, and improved muscular and cardio-respiratory performance.”

这一部分 Sainani 老师讲的超级细，我来一点一点的解释：

As it is well known,：开头这句是一个你根本不需要的引言，这只是作者在一句话的开头清了清嗓子。如果你想表明某件事是众所周知的，只需要放上引文，在句尾加上参考文献。所以，果断删掉。
has been related to：这部分在风格上，Sainani 老师更喜欢说 is associated with。
a profile of 是一个模棱两可的词组，不添加任何信息，也不会失去任何东西。所以，果断删掉。
lower cardiovascular risk, lower blood pressure levels：levels 这个词是没必要的，因为已经说了 lower blood pressure。所以，果断删掉。
improved muscular and cardio-respiratory performance：“改善肌肉和心肺功能”，这是一个 fitness 的花哨说法。所以换成 fitness。

综上，我们有如下修改后的新例句：

Increased athletic activity is associated with lower cardiovascular risk, lower blood pressure, and improved fitness.

我们甚至也可以进一步地直接去表达为“我们有足够的证据表明…”：

Increased athletic activity lowers cardiovascular risk and blood pressure, and improves fitness. (stronger level of evidence)

当然，上面这句是需要更强的证据 evidence 才行的，但是我们基本可以这么自信地说。

Example 3

“The experimental demonstration is the first of its kind and is a proof of principle for the concept of laser driven particle acceleration in a structure loaded vacuum.”

这一部分 Sainani 老师讲的超级细，我来一点一点的解释：

The experimental demonstration：这里说的“实验演示”，可以直接说是“实验” experiment。所以，替换为 The experiment。
is the first of its kind and is a proof of principle：这句话很诡异，有两个 is。这是无聊的动词，我们可以在里面放一个更好的动词。再来就是，first of its kind 和 a proof of principle 是重复的，基本上是同样的事情。所以，可以浓缩为：provides the first proof of principle。
the concept of：这是个额外的不必要的词。

综上，我们有如下修改后的新例句：

The experiment provides the first proof of principle of laser-driven particle acceleration in a structure-loaded vacuum.

Cut unnecessary words

要养成剪掉不必要的字的习惯。

Be vigilant and ruthless

保持警惕和无情
After investing much effort to put words on a page, we often find it hard to part with them.

删减你自己写的话真的很难，因为你已经付出了所有的努力，把自己的话写了下来。这些话扔掉就好像是在否定自己的努力。另外，你可能已经在你的脑海里读完了这句话，所以很多时候听起来不错，它开始听起来像是这就是它本应该的样子。总之，你必须要与这种惰性和自满做斗争，要积极训练自己回去把不必要的话都删掉。

But fight their seductive pull…
Try the sentence without the extra words and see how it’s better - conveys the same idea with more power

把所有多余的单词都删掉后，要好好读读看，看看它如何以更强大的力量传达了相同的想法。

要知道，你可以随时控制你的 undo 键来控制你删掉不必要字词和信息后的新版本。就要经常尝试性的去掉一些你以为自己喜欢的单词或词组，来体会下没有它总是更好。

Cutting extra words

再来个例子瞧瞧：

“Brain injury incidence shows two peak periods in almost all reports: rates are the highest in young people and the elderly.”

经过修剪后，我们可以得到更加有力量感的表达：

“Brain injury incidence peaks in the young and the elderly.”

Common clutter

下面是一些你应该注意的杂乱性的常见来源：

Dead weight words and phrases (要死不活的一些词组和说法)
- As it is well known
- As it has been shown
- It can be regarded that
- It should be emphasized that
这些都是作者们清了清嗓子，它们可以全部删除，可以提供引文来证明它是众所周知的。
Empty words and phrases (没啥实意的单词和说法)
- basic tenets of
- methodologic
- important
这些字眼没有添加任何内容，因为它们是如此的含糊其辞和空洞不已。正如威廉·津瑟说过的：

“Some words and phrases are blobs.” （有些单词和搭配，简直就是屎。）

—— William Zinsser in On Writing Well, 1976
Long words or phrases that could be short (可以换作短单词的长单词和短语)
- muscular and cardio-respiratory performance
Unnecessary jargon and acronyms (不必要的行话和缩写词)
- muscular and cardiorespiratory performance
- Gliomagenesis
- miR
我们希望避免使用缩写词，除非它们是完全标准的，而且在科学界享有盛名。
Repetitive words or phrases (重复的单词或说法)
- studies/examples
- illustrate/demonstrate
- challenges/difficulties
- successful solutions
要消除任何额外的重复。
Adverbs (一些副词)
- very, really, quite, basically, generally, etc.
在邮件和初稿中，这些副词很常见，因为平日口语中很习惯使用，但是在写作中，请把它们都拿出来，因为它们几乎从来没有什么用，它们只是你句子中的多余部分。

你并不会通过添加了副词，而使你的想法和陈述更有力。 事实上，还会起到反效果。

Long words and phrases that could be short…

下面是一些可以变短的长单词和短语的例子：

Wordy version	Crisp version
A majority of	most
A number of	many
Are of the same opinion	agree
Less frequently ocurring	rare
All three of the	the three
Give rise to	cause
Due to the fact that	because
Have an effect on	affec

More examples

Long words and phrases that could be short…

The expected prevalence of mental retardation, based on the assumption that intelligence is normally distributed, is about $2.5 %$.

The expected prevalence of mental retardation, if intelligence is normally distributed, is $2.5 %$.
Repetitive words or clauses

A robust cell-mediated immune response is necessary, and deficiency in this response predisposes an individual towards active TB.

Deficiency in T-cell-mediated immune response predisposes an individual to active TB.

Summary

Sainani 老师继续输出价值观，她引用到：

Blaise Pascal on the elegance in brevity:

“I have only made this letter rather long because I have not had time to make it shorter.” ("Je n’ai fait celle-ci plus longue que parceque je n’ai pas eu le loisir de la faire plus courte.")

—— Lettres provinciales, 16, Dec.14,1656 (though reference also attributed to St. Augustine, and Cicero….

过去的你可能养成了“多写字充数”的坏习惯，现在要改掉这个习惯，学会把不必要的东西都删掉，把重点放在关键的思想上。当你用最少的语言表达你的想法时，你的写作更具有可读性、吸引力和力量感。

1.5 Cut the clutter, more tricks

A few other small tricks…

Eliminate negatives

消除 negative 结构的句子

举例：

She was not often right.

She was usually wrong.

每当你在写作中遇到 not 时，看看你能不能把这个句子变成 positive。一般来说是总能做到的。当你使用 positive 的结构时，通常会更清楚。

再来几个例子：

She did not want to perform the experiment incorrectly.

She wanted to perform the experiment correctly.

They did not believe the drug was harmful.

They believed the drug was safe.

这似乎很简单，找到要否定的反义词就行了，比如说：

Not honest	dishonest
Not harmful	safe
Not important	unimportant
Does not have	lacks
Did not remember	forgot
Did not pay attention to	ignored
Did not succeed	failed

Eliminate superfluous uses of “there are/there is”

来个例子体会一般：

There are many ways in which we can arrange the pulleys.

We can arrange the pulleys in many ways.

看上去好像 There are 好像其实很必要的，但还是有更精炼的说法。

来看看另一个例子：

There was a long line of bacteria on the plate.

Bacteria lined the plate.

没想到吧，这里的 There was 居然也可以被精简掉！继续上例子：

There are many physicists who like to write.

Many physicists who like to write.

这里的 There are 是可以直接删掉的，完全不违和。类似的例子还有：

The data confirm that there is an association between vegetables and cancer. The data confirm an association between vegetables and cancer.

Omit needless prepositions

摆脱不必要的介词

For example, “that” and “on” are often superfluous: (通常像 that 和 on 这样的介词是完全不必要的)

The meeting happened on Monday.

The meeting happened Monday.

They agreed that it was true. They agreed it was true.

1.6 Practice cutting clutter

更多的例子来了：

Anti-inflammatory drugs may be protective for the occurrence of Alzheimer’s Disease.

Anti-inflammatory drugs may protect against Alzheimer’s Disease.

Clinical seizures have been estimated to occur in $0.5 %$ to $2.3 %$ of the neonatal population.

Clinical seizures occur in $0.5 %$ to $2.3 %$ of newborns.

Ultimately $\mathrm{p} 53$ guards not only against malignant transformation but also plays a role in developmental processes as diverse as aging, differentiation, and fertility.

Besides preventing cancer, $\mathrm{p} 53$ also plays roles in aging, differentiation, and fertility.

Injuries to the brain and spinal cord have long been known to bè among the most devastating and expensive of all injuries to treat medically.

Injuries to the brain and spinal cord are among the most devastating and expensive.

An IQ test measures an individual’s abilities to perform functions that usually fall in the domains of verbal communication, reasoning, and performance on tasks that represent motor and spatial capabilities.

An IQ test measures an individual’s verbal, reasoning, or motor and spatial abilities.

As we can see from Figure $2,$ if the return kinetic energy is less than $3.2 \mathrm{U}_{\mathrm{p}},$ there will be two electron trajectories associated with this kinetic energy.

Figure 2 shows that a return kinetic energy less than $3.2 \mathrm{U}_{\mathrm{p}}$ yields two electron trajectories.

1.7 Demo Edit 1

略

傅里叶变换算法及其 python 实现

IPhysResearch

2020年2月4日 08:00

此文基于自己最初学习引力波数据处理的时候，研究 Pyhon/MATLAB 等编程语言和程序包中的 fft 具体含义的科学笔记。

首先，我们要清楚频率是什么，以及如何该从一个信号序列中提取出频率信息呢？我们正是要用傅里叶变换 (Fourier Transform) 来提取一段信号的频率信息。那么，从时域的信号 $x(t)$ 变换到频域的 $\tilde{x}(\omega)$，服从如下公式：($\omega = 2\pi f$)

$$ \tilde{x}(\omega) = \int^{+\infty}_{-\infty}e^{-i\omega t}x(t)dt = \int^{+\infty}_{-\infty}e^{-2\pi i f t}x(t)dt \quad. $$

这个方程是可逆的，于是就有了所谓的从频域到时域的逆变换：

$$ x(t)= \int^{+\infty}_{-\infty}e^{i\omega t}\tilde{x}(\omega) \frac{d\omega}{2\pi} = \int^{+\infty}_{-\infty}e^{2\pi ift}\tilde{x}(f) df \quad. $$

更多的数学，暂时不多说，就引用这么一句话（来自一本我特别喜欢的书 Introduction to Signal Processing ¹，本系列文都参考此书中的符号体系）

The physical meaning of $\tilde{x}(\omega)$ is brought out by the inverse Fourier transform, which expresses the arbitrary signal $x(t)$ as a linear superposition of sinusoids of different frequencies.

简而言之，傅里叶变换就是在不损失信息的情况下，用不同的频域 basis 的线性组合来表示一个时域信号，可以看做是一个基底展开。在信号处理中，我们的信号数据都是离散的，于是就要考虑离散傅里叶变换 (discrete Fourier transform)：

$$ \tilde{x}[k]=\sum^{N-1}_{n=0}e^{\frac{-2\pi i}{N}k\cdot n}x[n] \quad,\\ x[n]=\frac{1}{N}\sum^{N-1}_{k=0}e^{\frac{2\pi i}{N}k\cdot n}\tilde{x}[k] \quad. $$

其中，信号采样点的数目是 $N$, ${k,n}=0,\dots,N-1$。显然，根据上述傅里叶变换规则，可以计算出频域里对应的第一个采样点的值，就是时域信号所有采样点的平均和: $\tilde{x}[0]=\sum^{N-1}_{n=0}x[n]$。特别要注意的是，上面的公式仅是在 Python 中各个的 library 中 fft() 的定义，与实际近似对应的连续时域信号是有差别的！它们之间应该是如下关系：($dt=T/N=1/fs$)

$$ \tilde{x}[\omega] = \int^{+\infty}_{-\infty}e^{-i\omega t}x(t)dt = dt\left[\sum^{N-1}_{n=0}e^{\frac{-2\pi i}{N}k\cdot n}x[n] \right] , $$ $$ x[t]= \int^{+\infty}_{-\infty}e^{i\omega t}\tilde{x}(\omega) \frac{d\omega}{2\pi} = \frac{1}{dt}\left[\frac{1}{N}\sum^{N-1}_{k=0}e^{\frac{2\pi i}{N}k\cdot n}\tilde{x}[k]\right]. $$

这其实也可以理解，因为 Library 里定义的傅里叶变换函数是仅依赖于一个有限采样的时域序列 $x[n]$ 就可以计算，并不一定要知道该序列所代表的时长 $T$ 或采样率 $df$ 为何，所以可以使代码得到更好的一般普适性。

接下来，就终于进入到代码环节了。

我们通过对比手撸 FFT 代码和 Python 程序包自带的 fft 函数来探求背后的计算原理，理解程序包提供的函数的真正含义。程序包主要选取的是： numpy.fft 和 scipy.fftpack （pyfftw.FFTW 暂略）

FFT from scratch (`DFT_slow`)

我们先根据定义，纯手撸一个傅里叶变换 DFT_slow 看看效果如何（基于 numpy）。借此还可以验证一些程序包中的 fft 函数的含义。其实很简单，直接利用矩阵与向量的乘积，就可以给出矢量化的定义：

$$ \begin{align} \vec{X} &= M \cdot \vec{x} \\ M_{kn} &= e^{-2\pi i k\cdot n/N} \end{align} $$

import numpy as np
def DFT_slow(x):
 """Compute the discrete Fourier Transform of the 1D array x"""
 x = np.asarray(x, dtype=float)
 N = x.shape[0]
 n = np.arange(N)
 k = n.reshape((N, 1))
 M = np.exp(-2j * np.pi * k * n / N)
 return np.dot(M, x)

上面的代码其实很清楚了。输入一个序列，输出一个序列，对应于上面公式👆不考虑 $dt$ 的信息的。

很快你就知道，其实上面算法的执行效率是很低的。于是就有了下面你的改进版本：

FFT from scratch (`FFT`)

根据对称性，

$$ \vec{X}_{k+i\cdot N} = \vec{X}_{k} \quad, \text{for any integer i}, $$

可以给出一个更高效的递归算法方案。

def FFT(x):
 """A recursive implementation of the 1D Cooley-Tukey FFT"""
 x = np.asarray(x, dtype=float)
 N = x.shape[0]

 if N % 2 > 0:
 raise ValueError("size of x must be a power of 2")
 elif N <= 32: # this cutoff should be optimized
 return DFT_slow(x)
 else:
 X_even = FFT(x[::2])
 X_odd = FFT(x[1::2])
 factor = np.exp(-2j * np.pi * np.arange(N) / N)
 return np.concatenate([X_even + factor[:N // 2] * X_odd,
 X_even + factor[N // 2:] * X_odd])

FFT from scratch (`FFT_vectorized`)

将上一个算法进一步矢量化，就可以进一步提高代码的执行效率。

def FFT_vectorized(x):
 """A vectorized, non-recursive version of the Cooley-Tukey FFT"""
 x = np.asarray(x, dtype=float)
 N = x.shape[0]

 if np.log2(N) % 1 > 0:
 raise ValueError("size of x must be a power of 2")

 # N_min here is equivalent to the stopping condition above,
 # and should be a power of 2
 N_min = min(N, 32)

 # Perform an O[N^2] DFT on all length-N_min sub-problems at once
 n = np.arange(N_min)
 k = n[:, None]
 M = np.exp(-2j * np.pi * n * k / N_min)
 X = np.dot(M, x.reshape((N_min, -1)))

 # build-up each level of the recursive calculation all at once
 while X.shape[0] < N:
 X_even = X[:, :X.shape[1] // 2]
 X_odd = X[:, X.shape[1] // 2:]
 factor = np.exp(-1j * np.pi * np.arange(X.shape[0])
 / X.shape[0])[:, None]
 X = np.vstack([X_even + factor * X_odd,
 X_even - factor * X_odd])

 return X.ravel()

Fast FFT from scratch (`FFT_fast`)

快速傅里叶变换(FFT)——有史以来最巧妙的算法？: Bilibili

Verification of correctness for the code

首先，我们要先验证一下我们的手撸算法。

x = np.random.random(1024)
print(np.allclose(DFT_slow(x), np.fft.fft(x)))
print(np.allclose(FFT(x), np.fft.fft(x)))
print(np.allclose(FFT_vectorized(x), np.fft.fft(x)))

from scipy.fftpack import fft
print(np.allclose(np.fft.fft(x), fft(x)))
# -----------Output---------------
True
True
True
True

验证得知，我们定义的 FFT 算法与 np.fft.fft(x) 和 scipy.fftpack.fft(x) 的结果是一致的。

Efficiency of the algorithms

然后，再验证下所有手撸算法和程序包函数的执行效率。

x = np.random.random(1024)
%timeit DFT_slow(x)
%timeit FFT(x)
%timeit FFT_vectorized(x)
%timeit np.fft.fft(x)
%timeit fft(x)
# It may take 30 seconds
# -----------Output---------------
66 ms ± 3.19 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
3.7 ms ± 1.01 ms per loop (mean ± std. dev. of 7 runs, 100 loops each)
467 µs ± 72.2 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
63 µs ± 2.6 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)
19.2 µs ± 1.25 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

可以看到程序包的执行速率是远比手撸的代码执行速度快的，在数量级上的效率提升，此后都将只用程序包中的函数计算 FFT。

Reference

Understanding the FFT Algorithm (such informative post by an astronomer — Jake VanderPlas)
Any idea of what a frequency is? Playing with the Fourier Transform (a post from Signal Processing for Dummies created by ELENA CUOCO who is a data scientist at European Gravitational Observatory)
Matched filter and signal-to-noise for a periodic template (posted by Michał Bejger who is an astrophysicist from LIGO-Virgo)
Discrete Fourier transform - Wikipedia
Lecture 7 - The Discrete Fourier Transform (a concise lecture note written by Prof. Stephen Roberts)
Fourier expansions and impedance (a jupyter notebook illustrated that “Every function can be constructed by adding up a bunch of sines and cosines.”)
FINDCHIRP: An algorithm for detection of gravitational waves from inspiraling compact binaries (a pre-print paper filled with direct notations and explanations about FFT, discrete Matched Filter and power spectral estimation)

Introduction to Signal Processing, Sophocles J. Orfanidis ↩︎

普通视图

Recap: IAIFI Summer School Day 1 - August 1, 2022

Recap: IAIFI Summer School Day 2 - August 2, 2022

Recap: IAIFI Summer School Day 3 - August 3, 2022

Recap: IAIFI Summer School Day 4 - August 4, 2022

Recap: IAIFI Summer School Day 5 - August 5, 2022

Recap: IAIFI Summer Workshop - August 8, 2022

Recap: IAIFI Summer Workshop - August 9, 2022

1. 查找域名对应的 ip 地址

2. 修改 hosts 文件

3. 刷新 DNS 缓存

4. 参考文献

One-sentence Summary

Code

Background

Model: normalizing flows

Training

Prior

Strain data

Results

Remark

Appendix for 2002.07656

Appendix for 2008.03312

Mock Data Challenge

Pre-requirements

How to use it

Reference

Licence

3.1 Experiment with punctuation

Increasing power to separate:

Increasing formality:

Semicolon

Parentheses

Colon

Dash

References/citations

3.2 Practice, colon and dash

3.3 Parallelism

3.4 Paragraphs

Paragraph-level tips

Good example

3.5 Paragraphs Editing I

Example:

Example (略)

3.6 Paragraphs Editing II

Example (略)

Example (略)

3.7 A few more tips: repetition, key words, and acronyms

A note on repetition…

Needless synonyms!

Disastrous synonyms!

Acronyms/Initialisms

Unnecessary acronyms/initialisms

1.1 Introduction

What makes good writing?

What makes a good writer?

Steps to becoming a better writer

1.2 Examples of what not to do

Case 1

Case 2

Themes of this course

1.3 Overview: Key principles of effective writing

One more example!

Principles of effective writing

1.4 Cut the clutter

Example 1

Example 2

Example 3

Cut unnecessary words

Cutting extra words

Common clutter

Long words and phrases that could be short…

More examples

Summary

1.5 Cut the clutter, more tricks

Eliminate negatives

Eliminate superfluous uses of “there are/there is”

Omit needless prepositions

1.6 Practice cutting clutter

1.7 Demo Edit 1

1. 查找域名对应的 `ip` 地址

2. 修改 `hosts` 文件

3. 刷新 `DNS` 缓存

FFT from scratch (`DFT_slow`)

FFT from scratch (`FFT`)

FFT from scratch (`FFT_vectorized`)

Fast FFT from scratch (`FFT_fast`)